Re: [R] Vim-R-plugin (new version)
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vim-R-plugin (new version)
Dear R users, The author of Tinn-R (Jose Claudio Faria) now is co-author of Vim-R-plugin2, a plugin that makes it possible to send commands from the Vim text editor to R. We added many new key bindings, restructured the menu and created new Tool Bar buttons. The new version is available at: http://www.vim.org/scripts/script.php?script_id=2628 NOTES: (1) Some old key binding changed, including the shortcuts to start R. (2) The plugin doesn't work on Microsoft Windows yet. snip With apologies. I think I just sent a blank email to the list. In an ironic twist of fate (since I'm writing about a modal text editor), I hit the wrong button. I've been playing with this for a couple of days and while I'm still getting used to it, this plug-in does offer a compelling alternative to emacs-ess. ESS still has some advantages, but this is a very interesting plugin. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ubuntu, Revolutions, R
For those who don't follow Ubuntu development carefully, the first Beta for the next Ubuntu was recently released, so I took my home system and upgraded to help out with filing bugs, etc. Just to be clear, I am not looking for help with the upgrade process. I've had R, and a few miscellaneous CRAN packages installed on this computer for years. Today, when I loaded an R session I had developed before the upgrade, I saw something new in my R welcome message. R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. This is REvolution R version 3.0.0: the optimized distribution of R from REvolution Computing. REvolution R enhancements Copyright (C) REvolution Computing, Inc. Checking for REvolution MKL: - REvolution R enhancements not installed. For improved performance and other extensions: apt-get install revolution-r The last part, about this being the enhanced version of R was . . . unexpected. I have heard of this company before and now I've spent some time on their website. Looking at my installation, Ubuntu did not install any of the REvolution Computing components, although R now basically thows a warning every time I start it. My question(s) for the community is this (pick any question(s) you like to answer: Should I install the REvolution Computing packages? Do these packages really make R faster? Are these packages stable? What are your experiences with REvolution Computing software? I am interested in hearing from members of the community, REvolution Computing employees/supporters (although please ID yourself as such) and most anyone else. I can see what they say on their website, but I'm interested in getting other opinions too. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems in Recommending R
I agree with those who would like to see the R-Project's site redone. If/when it is redone, I think there should be more emphasis on providing links / access to useful materials for new users. I find it interesting that this discussion has been very focused on the technologies that should be used, rather than on the content should be provided. Although I think it is important to assess the relevant technologies that exist and choose a framework that will work well with R, I think there should also be some thought / discussion on the layout and content of any new site. In that spirit, I would like to make a suggestion / request. Currently, the website has a page dedicated to manuals: http://cran.r-project.org/manuals.html This is a good page and the manuals are very very helpful. However, there are a lot of good resources that are not (to the best of my knowledge) listed on the r-project's site. A few examples would include: * Quick-R - http://www.statmethods.net/ * The R Inferno - www.burns-stat.com/pages/Tutor/R_inferno.pdf * Rseek.org There are others, but these are the three that I have found to be _most_ useful to me as a relatively new R user. I believe any redesigned site should really try to present more resources to new R users. Before I learned about Rseek (on this list), I wasted epic amounts of time trying to Google for R related information. Although it is possible to use Google to answer R related questions, it's not as easy as search for how to do something in perl or python. I think a new r-project site should include a page / wiki focused on informing new users about the myriad or resources that exist. This certainly won't eliminate all of the repetitive questions on the list, but I think it could help. I suggest a wiki format, because an open wiki would enable the R community to update the information and provide links to new resources as the become available and let the web-team focus on improving and maintaining the site. Others may disagree with me regarding an open wiki, but I want to keep my comment focused on the idea of helping new users find useful material, and not get side tracked in a discussion about wikis or other technologies. There are others here far more knowledgeable about web-design than I am, I just know that there could be more done to present information to new users. That's my 10 cents. -- This is the price and the promise of citizenship. -- Barack Obama, 44th President of the United States __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to delete specific rows in a data frame where the first column matches any string from a list
Interesting. Thanks. On Sat, 2009-02-07 at 02:36 +0100, Wacek Kusnierczyk wrote: Andrew Choens wrote: I regularly deal with a similar pattern at work. People send me these big long .csv files and I have to run them through some pattern analysis to decide which rows I keep and which rows I kill off. As others have mentioned, Perl is a good candidate for this task. Another option would be a quick SQL query. It should be a snap to pull this into something like Access or OOo Base . . . . or better yet, a real database like Postgres, MySQL, etc. In case you aren't too familiar with SQL, this query could be done by deleting the rows using a self join (syntax varies by product). But, if the pattern is as simple as it sounds and / or this is a one-time job, using SQL is over-kill for the situation. I often use sed in places where Perl is over-kill, but I can't think of any way to match from row to row with sed. If anyone knows how to do this with sed, it would (probably) be easier than trying to learn how to use perl. And, I would like to know how to do this with sed too. (this is actually off-topic, but since it may be interesting for the general public, i keep the response cc: to r-help) yes, you can do this with sed. suppose you have two files, one (say, sample.txt) with the data to be filtered, record fields separated by, e.g., a tab character, and another (say, filter.txt) with patterns to be matched. a row from the first is passed to output only of its second field does not match any of the patterns -- this corresponds to (a simplified version of) the original problem. then, the following should do: sed $(sed 's/^/\/^[^\\t]\\+\\t/; s/$/\/d/' filter.txt) sample.txt filtered-sample.txt (unless the patterns contain characters that interfere with the shell or sed's syntax, in which case they'd have to be appropriately escaped.) vQ -- This is the price and the promise of citizenship. -- Barack Obama, 44th President of the United States __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to delete specific rows in a data frame where the first column matches any string from a list
I regularly deal with a similar pattern at work. People send me these big long .csv files and I have to run them through some pattern analysis to decide which rows I keep and which rows I kill off. As others have mentioned, Perl is a good candidate for this task. Another option would be a quick SQL query. It should be a snap to pull this into something like Access or OOo Base . . . . or better yet, a real database like Postgres, MySQL, etc. In case you aren't too familiar with SQL, this query could be done by deleting the rows using a self join (syntax varies by product). But, if the pattern is as simple as it sounds and / or this is a one-time job, using SQL is over-kill for the situation. I often use sed in places where Perl is over-kill, but I can't think of any way to match from row to row with sed. If anyone knows how to do this with sed, it would (probably) be easier than trying to learn how to use perl. And, I would like to know how to do this with sed too. On Fri, 2009-02-06 at 16:04 -0500, Laura Rodriguez Murillo wrote: yep, it definitely sounds like a work for perl, but I don't know perl (unfortunately). I'm still stuck with this so I'm giving more details in case it helps: I have file A with 382 columns and 30 rows. There are rows where only the entry in first column is duplicated in other rows. In these cases, I need to delete the entire row. I also have a file B (one column and around 28 rows) with a list of the entries that are repeated. So I was trying to look for the ones that match and get rid of the entire row. Thank you! Laura 2009/2/6 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no: Laura Rodriguez Murillo wrote: Thank you. I think grep would do it, but the list of expressions I need to match is too long so they are stored in a file. what does 'too long' mean? So the question would be how I can tell R to look into that file to look for the expressions that I want to match. i guess you may still successfully use r for this, but to me it sounds like a perfect job for perl. let me know if you need more help. note, in the below, you'd use 'data[,2]' instead of 'd[,2]' (or 'd' instead of 'data'). sorry for the typo. mark, thanks for pointing this out -- the more obvious the mistake, the less visible ;) vQ Thank you again for your help Laura 2009/2/6 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no: Laura Rodriguez Murillo wrote: Hi, I'm new in the mailing list but I would appreciate if you could help me with this: I have a big matrix from where I need to delete specific rows. The second entry on these rows to delete should match any string within a list (other file with just one column). Thank you so much! here's one way to do it, illustrated with dummy data: # dummy character matrix data = matrix(replicate(20, paste(sample(letters, 20), collapse=)), ncol=2) # filter out rows where second column does not match 'a' data[-grep('a', d[,2]),] this will work also if your data is actually a data frame: data = as.data.frame(data) data[-grep('a', d[,2]),] note, due to a known issue with grep, this won't work correctly if there are *no* rows that do *not* match the pattern: data[-grep('1', d[,2]),] # should return all of data, but returns an empty matrix with the upcoming version of r, grep will have an additional argument which will make this problem easy to fix: data[grep('a', d[,2], invert=TRUE),] vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This is the price and the promise of citizenship. -- Barack Obama, 44th President of the United States __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time Series
If I want to make a numerical series, I can do so easily with: series.numbers - 1:10 But, I don't seem to be able to do the same with time. I want to create a vector with 480 points that corresponds to the 480 minutes in a 8 hour work day. Thus I want series.time to look something like this: 9:00 9:01 9:02 9:03 etc. Last night I managed to build this by concatenating a series of strings, and converting them to datetime format with as.Date() or strptime(), but my method seems overly complex. hour - 0:59 day - c(9, 10, 11, 12, 13, 14, 15, 16) hours - c(rep(9, 60), rep(10, 60), rep(11, 60), rep(12, 60), rep(1, 60), rep(2, 60), rep(3, 60), rep(4, 60)) one.day - paste(hours, :, hour, sep = ) strptime(one.day, %H:%M) # OR # as.Date(one.day, %H:%M) Is there any way to do something similar to: strptime(09:00, %H:%M) : strptime(11:00, %H:%M) When I try this, I get the following error: numerical expression has 9 elements: only the first used Thanks. --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The R Inferno
now if only i could get tips to sort a 5 column * 1 million rows dataset in less than ..eternity May I suggest mySQL, postgreSQL, etc.? If what you need to do is a basic sort, a database is going to be faster than R. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
Unfortunately, that type of FUD issued by the SAS marketing person still works. I see it at my employer (a large healthcare company.) It's a battle to change a culture, but ironically the recession helps. People are now taking notice of the obscene licensing fees for SAS. Darin I agree. I work for a consulting firm (human services) and my boss prefers us to use SPSS, rather than R. It's painful. I have version 11 installed on my Windows laptop. Next year, the license expires! For someone coming from a SPSS background, R is a little mind-blowing, simply because it is so much more powerful. But, perseverance pays off. Once I master Sweave and such, I'll be able to churn out reports much more quickly than I ever could with SPSS. I do wish the author of the article had included comments from SPSS, in addition to the humorous FUD from the SAS spokesperson. Newer versions of SPSS actually have the option of using R for data analysis, in addition to the SPSS engine. It would have been interesting to compare the corporate responses of the two companies. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble installing plyr
I want to learn how to use the reshape package. The reshape package is not included in the Ubuntu repositories, so I attempted to install reshape with: install.packages(reshape) This is what I got for output: Warning in install.packages(reshape) : argument 'lib' is missing: using '/home/andy/R/i486-pc-linux-gnu-library/2.7' --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.cnr.Berkeley.edu/src/contrib/reshape_0.8.2.tar.gz' Content type 'application/x-gzip' length 38137 bytes (37 Kb) opened URL == downloaded 37 Kb * Installing *source* package 'reshape' ... ** R ** data ** moving datasets to lazyload DB ** inst ** preparing package for lazy loading Loading required package: plyr Warning in library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'plyr' Error: package 'plyr' could not be loaded Execution halted ERROR: lazy loading failed for package 'reshape' ** Removing '/home/andy/R/i486-pc-linux-gnu-library/2.7/reshape' The downloaded packages are in /tmp/RtmpMZhsTp/downloaded_packages Warning messages: 1: In install.packages(reshape) : dependency ‘plyr’ is not available 2: In install.packages(reshape) : installation of package 'reshape' had non-zero exit status I tried this, using several different mirrors, hoping that the problem was isolated to the PA mirrors. But, no matter which mirror I use, I get the same error. My central problem appears to be an inability to install plyr. install.packages(plyr) Warning in install.packages(plyr) : argument 'lib' is missing: using '/home/andy/R/i486-pc-linux-gnu-library/2.7' Warning message: In install.packages(plyr) : package ‘plyr’ is not available Again, the mirror does not matter. I can not install plyr. I am pretty confident that my syntax is correct, but I do not understand why I can not install plyr. I don't think the problem is related to my connection, since this computer is able to surf the Internet and send email (such as this one). When I googled for problems installing plyr, I got a zillion instructions on how to install the player. Ideas are appreciated. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make a banner table.
I have a dataframe with the following variables: idnum areagender raceetc. I would like to make a table that looks like areagender race M FB W A 1 4 53 5 1 2 6 74 6 3 etc. Basically, I want to make a single broad table with a number of sub-set tables. I have tried: cbind(table(area, gender), table(area, race)) But, when I do this, I lose the labels gender / race. This makes it a lot harder to understand my factor labels. when I use cbind, I get this: M F B W A 1 4 5 3 5 1 2 6 7 4 6 3 Although, it is technically correct, I really want to keep my factor labels. I also tried this with xtabs and get the same results. Any ideas? I saw a relatively recent thread asking a similar question, but the proposed solution did not work for me, so I thought I would ask the questions again. If I am missing someting terribly obvious, I apologize. thanks. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chi-Square Test Disagreement
I was asked by my boss to do an analysis on a large data set, and I am trying to convince him to let me use R rather than SPSS. I think Sweave could make my life much much easier. To get me a little closer to this goal, I ran my analysis through R and SPSS and compared the resulting values. In all but one case, they were the same. Given the matrix [,1] [,2] [1,] 110 358 [2,] 71 312 [3,] 29 139 [4,] 31 77 [5,] 13 32 This is the output from R: chisq.test(test29) Pearson's Chi-squared test data: test29 X-squared = 9.593, df = 4, p-value = 0.04787 But, the same data in SPSS generates a p value of .051. It's a small but important difference. I played around and rescaled things, and tried different values for B, but I never could get R to reach .051. I'd like to know which program is correct - R or SPSS? I know, this is a biased place to ask such a question. I also appreciate all input that will help me use R more effectively. The difference could be the result of my own ignorance. thanks --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote: Chuck explained already the reason for this small difference. I just take issue about it being an important difference. In my opinion, this difference is not important at all. It would only be important to people who are still sticking to arbitrary cut-off points that are mainly due to historical coincidences and the lack of computing power at those time in history. If somebody tells you that this difference is important, ask him or her whether he or she will be willing to finance you a room full of calculators (in the sense of Pearson's time) and whether he or she wants you to do all your calculations and analyses with these calculators in future. Alternatively, you could ask the person whether he or she would like the anaesthetist during his or her next operation to use chloroform given his or her nostalgic penchant for out-dated rituals/methods. Yes he did and when I realized the source of my confusion I was appropriately chastised. I felt like a bit of a fool. Of course, I should try comparing apples to apples. Oranges are another thing entirely. As to the importance of the difference, I am of two minds. On the one hand I fully agree with you. It is an anachronistic approach. On the other hand we don't all have the pleasure of working in a math department where such subtleties are well understood. I work for a consulting firm that advises state and local governments (USA). I personally do try to expand my understanding on statistics and math (I do not have a degree in math), but my clients do not. When I'm working with someone from the government, it is sometimes easier to simply tell them that relationship x is significant at a certain level of certainty. Although I doubt they could really explain the details, they have some basic understanding of what I am talking about. Subtleties are sometimes lost on our public servants. And, since I do work for government, if I ask for a roomful of calculators, I might just get them. And really, what am I going to do with a roomful of calculators? --andy -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted ftable
I didn't know that. That is exactly what I need. On Tue, 2008-11-25 at 13:00 -0800, Thomas Lumley wrote: On Mon, 24 Nov 2008, Andrew Choens wrote: I need to do some fairly deep tables, and ftable() offers most of what I need, except for the weighting. With smaller samples, I've just used replicate to let me have a weighted data set, but with this data set, I'm afraid replicate is going to make my data set too big for R to handle comfortably. That being said, is there some way to weight my table (similar to wtd.table) but offer the nuanced control and depth of ftable? xtabs() will take a weight as the left-hand side of the formula, and its output can then be processed by ftable(). -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-Square Test Disagreement
Next time the launch of an incoming nuclear strike is detected, set them to work as follows (following Karl Pearson's historical precedent): Anti-aircraft guns all day long: Computing for the Ministry of Munitions JUNE BARROW GREEN (Open University) From January 1917 until March 1918 Pearson and his staff of mathematicians and human computers at the Drapers Biometric Laboratory worked tirelessly on the computing of ballistic charts, high-angle range tables and fuze-scales for AV Hill of the Anti-Aircraft Experimental Section. Things did not always go smoothly -- Pearson did not take kindly to the calculations of his staff being questioned -- and Hill sometimes had to work hard to keep the peace. If you have enough of them (and Pearson undoubtedly did, so you can quote that in your requisition request), then you might just get the answer in time! [ The above excerpted from http://tinyurl.com/6byoub ] Good luck! Ted. That is absolutely classic. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
On Wed, 2008-11-26 at 12:25 -0800, Applejus wrote: Hi, I have a code in R. Could anyone give me the best possible way (or just ways!) to integrate it in SPSS? Thanks! You will need a SPSS registration, but go here and get the SPSS r plugin. http://www.spss.com/devcentral/ It lets you access R from within SPSS. Best of both worlds. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted ftable
I need to do some fairly deep tables, and ftable() offers most of what I need, except for the weighting. With smaller samples, I've just used replicate to let me have a weighted data set, but with this data set, I'm afraid replicate is going to make my data set too big for R to handle comfortably. That being said, is there some way to weight my table (similar to wtd.table) but offer the nuanced control and depth of ftable? thanks. -- Insert something humorous here. :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.