Re: [R] nested if/else very slow, more efficient ways?
One way that might do what you want is to change the character column to a factor, and then apply as.numeric. resultsfuzzy$x-as.numeric(factor(resultsfuzzy$x,levels=c(5a,5b,5c,5d,5e))) This assumes, of course, that you know that the levels are going to be in the set {5a,5b,5c,5d,5e}. However, it may be better to just leave it as a factor, depending upon what you intend to do with it later. Hope this helps. Regards, Mike On 10/23/06, Kim Milferstedt [EMAIL PROTECTED] wrote: Hello, in the data.frame resultsfuzzy I would like to replace the characters in the second column (5a, 5b, ... 5e) with numbers from 1 to 5. The data.frame has 39150 entries. I seems to work on samples that are nrow(resultsfuzzy) but it takes suspicously long. Do you have any suggestions how to make the character replacing more efficient? Code: for (i in 1:nrow(resultsfuzzy)) { if (resultsfuzzy[i,2] == 5a){resultsfuzzy[i,2] - 1} else if (resultsfuzzy[i,2] == 5b){resultsfuzzy[i,2] - 2} else if (resultsfuzzy[i,2] == 5c){resultsfuzzy[i,2] - 3} else if (resultsfuzzy[i,2] == 5d){resultsfuzzy[i,2] - 4} else resultsfuzzy[i,2] - 5 } Thanks, Kim version platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor2.1 year 2005 month12 day 20 svn rev 36812 language R __ Kim Milferstedt University of Illinois at Urbana-Champaign Department of Civil and Environmental Engineering 4125 Newmark Civil Engineering Laboratory 205 North Mathews Avenue MC-250 Urbana, IL 61801 USA phone: (001) 217 333-9663 fax: (001) 217 333-6968 email: [EMAIL PROTECTED] http://cee.uiuc.edu/research/morgenroth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discarding 'levels'
From TFM of read.table: as.is: the default behavior of 'read.table' is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable 'as.is' controls the conversion of columns not otherwise specified by 'colClasses'. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. You may have some blanks in the third column. Factor levels whose character representation happens to be a numeral don't necessarily compare equal to the integer with the same character representation (if you get my drift...). You can use as.numeric, but better would be to use colClasses in read.table. Regards, Mike On 10/4/06, hoopz [EMAIL PROTECTED] wrote: Ok, so I am using read.table to read a .txt file and put it into a matrix. There are some values that are 'NA'. If I use read.table with as.is =FALSE, then some of the entries in the matrix return this: data[1,3] [1] 0 Levels: 0 1 NA and if I do data[1,3]==0 it returns FALSE. It's a zero, it's not false! If I set as.is=TRUE, I don't get the levels problem, but in those entries where I did get the levels problem, this happens: data[1,3] [1] 0 This time, it keeps it as a string. I can use as.numeric to fix it now, but I'm just curious as to why this happens. Thanks -- View this message in context: http://www.nabble.com/discarding-%27levels%27-tf2384152.html#a6645474 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creation of new variables
You may not have told us quite enough to be able to help you. It may be worth your while investing some time in describing the problem you are trying to solve a little bit more comprehensively. The posting guide http://www.R-project.org/posting-guide.html can be useful in helping you frame a question that stands a better chance of receiving help. Regards, Mike On 9/26/06, nalluri pratap [EMAIL PROTECTED] wrote: Hello All, I have 8 variables named a b c d e f g h I need to create four variables from these 8 vraibles in R. the new variables are ab,cd,ef,gh. Can anyone pleas help me thanks, Pratap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing R connection as argument to a shell command on Windows
No, the cut command won't understand that z is an R connection and not a file in the current working directory: there is no overlap between the R object name space and the Windows object name space. Unfortunately, you may be forced to unzip to a temporary file, and then read from that. One thing that you might want to try, if you're using cygwin, is to create a named pipe, and use shell() with wait=FALSE to unzip and pipe into cut and then output to the named pipe. Open an R connection for reading from the named pipe. This leaves open the question of how to deal with failures, and whether you can invoke a command pipeline from R under Windows... I haven't tried this, so if you manage to make it work, it may be something that's of interest to the list in general. Regards, Mike On 9/25/06, Anupam Tyagi [EMAIL PROTECTED] wrote: Hello, is there a way to pass a connection to a file in a zipped archive as argument (instead of a file name of unzipped file) to shell command cut. In general, is it possible to pipe output of a R function to a shell command? How? I want to do something like: z = unz(zipArchive.zip, fileASCII.ASC) # open connection open(z) # cut lines of the ASCII file in zipped archive at specific postions and send results to another file. shell(cut -c2-3,5-8 z test2.dat) Anupam. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner Loop Question with dynamic variable names
Is this what you had in mind? j-data.frame(q1=rnorm(10),q2=rnorm(10)) j q1 q2 1 -0.9189618 -0.2832102 2 0.9394316 1.1345975 3 -0.6388848 0.6850255 4 0.4938245 -0.5825715 5 -1.2885257 -0.2654023 6 -0.5278295 0.2382791 7 0.6517268 0.8923375 8 0.4124178 1.1231630 9 -0.1604982 0.2285672 10 -0.2369713 0.6130197 for(i in 1:3){j[,paste(sep=,res,i)]-with(j,q1+q2)} j q1 q2res1res2 res3 1 -0.9189618 -0.2832102 -1.20217207 -1.20217207 -1.20217207 2 0.9394316 1.1345975 2.07402913 2.07402913 2.07402913 3 -0.6388848 0.6850255 0.04614073 0.04614073 0.04614073 4 0.4938245 -0.5825715 -0.08874699 -0.08874699 -0.08874699 5 -1.2885257 -0.2654023 -1.55392802 -1.55392802 -1.55392802 6 -0.5278295 0.2382791 -0.28955044 -0.28955044 -0.28955044 7 0.6517268 0.8923375 1.54406433 1.54406433 1.54406433 8 0.4124178 1.1231630 1.53558084 1.53558084 1.53558084 9 -0.1604982 0.2285672 0.06806901 0.06806901 0.06806901 10 -0.2369713 0.6130197 0.37604847 0.37604847 0.37604847 Regards, Mike On 9/25/06, Peter Wolkerstorfer - CURE [EMAIL PROTECTED] wrote: Dear all, I have another small scripting-beginner problem which you hopefully can help: I compute new variables with: # Question 1 results$q1 - with(results, q1_1*1+ q1_2*2+ q1_3*3+ q1_4*4+ q1_5*5) # Question 2 results$q2 - with(results, q2_1*1+ q2_2*2+ q2_3*3+ q2_4*4+ q2_5*5) # Question 3 results$q3 - with(results, q3_1*1+ q3_2*2+ q3_3*3+ q3_4*4+ q3_5*5) # Question 4 results$q4 - with(results, q4_1*1+ q4_2*2+ q4_3*3+ q4_4*4+ q4_5*5) This is very inefficient so I would like to do this in a loop like: for (i in 1:20) {results$q1 - with(results, q1_1*1+ q1_2*2+ q1_3*3+ q1_4*4+ q1_5*5)} My question now: How to replace the 1-s (results$q1, q1_1...) in the variables with the looping variable? Here like I like it (just for illustration - of course I still miss the function to tell R that it should append the value of i to the variable name): # i is the number of questions - just an illustration, I know it does not work this way for (i in 1:20) {results$qi - with(results, qi_1*1+ qi_2*2+ qi_3*3+ qi_4*4+ qi_5*5)} Help would be greatly appreciated. Thanks in advance. Peter ___CURE - Center for Usability Research Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] [EMAIL PROTECTED] [Web] http://www.cure.at __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with function
Take the case of i==1. Ct[i]-1/Bq*Bt[i]*Cerr # Assign Ct[1] using Bt[1] Rt[i]-Bt[i]/(a+b*Bt[i]) # Assign Rt[1] using Bt[1] Bt[i+2]-(1-m)*Bt[i+1]+Rt[i] *Rerr-Ct[i+1] # Assign Bt[3] using Bt[2] and Rt[1] and **Ct[2]** You're reading Ct[i+1] before you ever assign it, hence NA. OSISTM Hope this helps, Regards, Mike On 9/20/06, Guenther, Cameron [EMAIL PROTECTED] wrote: Hello everyone, I have a function here that I wrote but doesn't seem to work quite right. Attached is the code. In the calib funcion under the for loop Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i+1] returns NA's for everything after years 1983 and 1984. However the code works when it reads Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i]. I don't quite understand why since it should be calculating all of the necessary inputs prior to calculating Bt[i+2]. Any help would be greatly appreciated. Thanks #Model parameters B0-7500 m-0.3 R0-B0*m z-0.8 a-B0/R0*(1-(z-0.2)/(0.8*z)) b-(z-0.2)/(0.8*z*R0) dat-data.frame(years=seq(1983,2004),cobs=c(19032324,19032324,17531618,2 0533029,20298099,20793744,23519369,23131780,19922247,17274513,17034419,1 2448318,4551585,4226451,7183688,7407924,7538366,7336039,8869193,7902341, 6369089,6211886)) stdr-runif(100,0,0.5) stdc-runif(100,0,0.5) BC-runif(1000,0,100) #model calibration calib-function(x){ v-sample(stdr,1) cr-sample(stdc,1) N-rnorm(1) Bq-sample(BC,1) Rerr-exp(N*v-(v^2/2)) Cerr-exp(N*cr-(cr^2/2)) Bt-vector();Bt[1]=B0;Bt[2]=B0 Rt-vector() Ct-vector() for (i in 1:length(x$years)){ Ct[i]-1/Bq*Bt[i]*Cerr Rt[i]-Bt[i]/(a+b*Bt[i]) Bt[i+2]-(1-m)*Bt[i+1]+Rt[i]*Rerr-Ct[i+1] } out-new.env() out$yr-x$years[1:length(x$years)] out$Bt-Bt[1:length(x$years)] out$Rt-Rt[1:length(x$years)] out$Ct-Ct[1:length(x$years)] out$stdr-v out$stdc-cr out$Bq-Bq out$Rerr-Rerr out$Cerr-Cerr return(as.list(out)) } test-calib(dat) Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statitics Textbook - any recommendation?
Excellent characterization. MASS is a very good book, but I'm not sure I would describe it as a statistics textbook, much less one of the basic variety. While I certainly wouldn't presume to speak for Prof. Ripley and Dr. Venables, it seems unlikely their intent in writing MASS was to teach statistics, but rather, as the name of the book might suggest, to explain how S+ (and R) can be applied to modern statistical techniques. My experience with this book is that it assumes considerable background knowledge. By all means, buy MASS, but if you need guidance on the how and why of statistical techniques, you may wish to shop Amazon to find a supplement. Regards, Mike On 9/20/06, Berton Gunter [EMAIL PROTECTED] wrote: Not withstanding Prof. Heiberger's admirable enthusiasm, I think the canonical answer is probably MASS (Modern Applied Statistics with S) by Venables and Ripley. It is very comprehensive, but depending on your background, you may find it too telegraphic. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Iuri Gavronski Sent: Wednesday, September 20, 2006 1:22 PM To: r-help@stat.math.ethz.ch Subject: [R] Statitics Textbook - any recommendation? I would like to buy a basic statistics book (experimental design, sampling, ANOVA, regression, etc.) with examples in R. Or download it in PDF or html format. I went to the CRAN contributed documentation, but there were only R textbooks, that is, textbooks where R is the focus, not the statistics. And I would like to find the opposite. Other text I am trying to find is multivariate data analysis (EFA, cluster, mult regression, MANOVA, etc.) with examples with R. Any recommendation? Thank you in advance, Iuri. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
On 18 Sep 2006 19:53:59 +0200, Peter Dalgaard [EMAIL PROTECTED] wrote: Sarosh Jamal [EMAIL PROTECTED] writes: Hi there, I was updating the R-cmdr add-on (v.1.1-6 to the latest v.1.2) for R (v.2.2.0) in a SunOS9 environment and came across some warnings during my installation - it seems to download the dependencies but runs into the following during install: * Installing *source* package 'acepack' ... ** libs /opt/sfw/R/R-2.2.0/bin/SHLIB: make: not found ERROR: compilation failed for package 'acepack' /opt/sfw/R/R-2.2.0/bin/INSTALL: test: argument expected ERROR: failed to lock directory '/opt/sfw/R/R-2.2.0/library' for modifying Try removing '/opt/sfw/R/R-2.2.0/library/00LOCK' I don't see why I would have to remove the 00LOCK file since it seems to have been created by the very session of R I use to run install.packages(). I'm attaching the complete log. Any insight or feedback will be much appreciated. Notice the _first_ issue reported: make: not found without a functioning make command, you're not likely to get anything to work. Presumably, since you have a functioning R, make is there somewhere, but you need to adjust your PATH. The rest could well just be consequences. And please do use a meaningful subject line. If you're the type who likes to look through the list archives to try to solve problems, you'll find that good subject lines are most helpful. Thank you, Sarosh Jamal Geo Computing IT Specialist, Department of Geography University of Toronto at Mississauga e: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] opening files in directory
R won't do variable interpolation inside quotation marks as perl does. You could try amending your code with, for e.g. file.name-paste(sep=/,data_files,files[[i]]) x-read.table(file.name) Regards, Mike On 9/4/06, Ffenics [EMAIL PROTECTED] wrote: Hi there I want to be able to take all the files in a given directory, read them in one at a time, calculate a distance matrix for them (the files are data matrices) and then print them out to separate files. This is the code I thought I would be able to use (all files are in directory data_files) for(i in 1:length(files)) + { + x-read.table(data_files/files[[i]]) + dist-dist(x, method=euclidean, diag=TRUE) + mat-as.matrix(dist) + write.table(mat, file=files[[i]]) + } But I get this error when I try to open the first file using read.table Error in file(file, r) : unable to open connection In addition: Warning message: cannot open file 'data_files/files[[i]]' if I try the read.table command without the quotation marks like so x-read.table(data_matrix_files/files[[i]]) I get the error Error in read.table(data_matrix_files/files[[i]]) : Object data_matrix_files not found But if I go to the directory where the files are kept before starting up R, the read.table command without the quotation marks works. I don't want to start up R in the same directory as the where the files I will be using reside though so how do I rectify this? Any help much appreciated [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can R compute the expected value of a random variable?
Yes. On 8/26/06, Paul Smith [EMAIL PROTECTED] wrote: Dear All Can R compute the expected value of a random variable? Thanks in advance, Paul __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string-to-number
Marc, Thanks very much for this. I hadn't really looked at Rprof in the past; now I have a new toy to play with! I have formulated an hypothesis that the reason parse/eval is quicker lies in the pattern-matching code: strsplit is using regular expressions, whereas perhaps parse is using some more clever (but possibly less general) matching algorithm. It will be interesting to inspect the source code to get to the bottom of it. Thanks again for your interest and efforts in this, and for pointing out Rprof! Regards, Mike Nielsen On 8/20/06, Marc Schwartz [EMAIL PROTECTED] wrote: On Sat, 2006-08-19 at 10:25 -0600, Mike Nielsen wrote: Wow. New respect for parse/eval. Do you think this is a special case of a more general principle? I suppose the cost is memory, but from time to time a speedup like this would be very beneficial. Any hints about how R programmers could recognize such cases would, I am sure, be of value to the list in general. Many thanks for your efforts, Marc! Mike, I think that one needs to consider where the time is being spent and then adjust accordingly. Once you understand that, you can develop some insight into what may be a more efficient approach. R provides good profiling tools that facilitate this process. In this case, almost all of the time in the first two examples using strsplit(), is in that function: repeated.measures.columns - paste(1:10, collapse = ,) library(utils) Rprof(tmp - tempfile()) res1 - as.numeric(unlist(strsplit(repeated.measures.columns, ,))) Rprof() summaryRprof(tmp) $by.self self.time self.pct total.time total.pct strsplit 23.68 99.7 23.68 99.7 as.double.default 0.06 0.3 0.06 0.3 as.numeric 0.00 0.0 23.74 100.0 unlist 0.00 0.0 23.68 99.7 $by.total total.time total.pct self.time self.pct as.numeric 23.74 100.0 0.00 0.0 strsplit 23.68 99.7 23.68 99.7 unlist 23.68 99.7 0.00 0.0 as.double.default 0.06 0.3 0.06 0.3 $sampling.time [1] 23.74 Contrast that with Prof. Ripley's approach: Rprof(tmp - tempfile()) res3 - eval(parse(text=paste(c(, repeated.measures.columns, Rprof() summaryRprof(tmp) $by.self self.time self.pct total.time total.pct parse 0.42 87.5 0.42 87.5 eval 0.06 12.5 0.48 100.0 $by.total total.time total.pct self.time self.pct eval0.48 100.0 0.06 12.5 parse 0.42 87.5 0.42 87.5 $sampling.time [1] 0.48 To some extent, one could argue that my initial timing examples are contrived, in that they specifically demonstrate a worst case scenario using strsplit(). Real world examples may or may not show such gains. For example with Charles' initial query, the initial vector was rather short: repeated.measures.columns [1] 3,6,10 So if this was a one-time conversion, we would not see such significant gains. However, what if we had a long list of shorter entries: repeated.measures.columns - paste(1:10, collapse = ,) repeated.measures.columns [1] 1,2,3,4,5,6,7,8,9,10 big.list - replicate(1, list(repeated.measures.columns)) head(big.list) [[1]] [1] 1,2,3,4,5,6,7,8,9,10 [[2]] [1] 1,2,3,4,5,6,7,8,9,10 [[3]] [1] 1,2,3,4,5,6,7,8,9,10 [[4]] [1] 1,2,3,4,5,6,7,8,9,10 [[5]] [1] 1,2,3,4,5,6,7,8,9,10 [[6]] [1] 1,2,3,4,5,6,7,8,9,10 system.time(res1 - t(sapply(big.list, function(x) as.numeric(unlist(strsplit(x, ,)) [1] 1.972 0.044 2.411 0.000 0.000 str(res1) num [1:1, 1:10] 1 1 1 1 1 1 1 1 1 1 ... head(res1) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]12345678910 [2,]12345678910 [3,]12345678910 [4,]12345678910 [5,]12345678910 [6,]12345678910 Now use Prof. Ripley's approach: system.time(res3 - t(sapply(big.list, function(x) eval(parse(text=paste(c(, x, ))) [1] 1.676 0.012 1.877 0.000 0.000 str(res3) num [1:1, 1:10] 1 1 1 1 1 1 1 1 1 1 ... head(res3) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]12345678910 [2,]12345678910 [3,]12345678910 [4,]12345678910 [5,]12345678910 [6,]12345678910 all(res1 == res3) [1] TRUE We do see a notable reduction in time with strsplit(), while a notable increase in time using eval
Re: [R] string-to-number
Wow. New respect for parse/eval. Do you think this is a special case of a more general principle? I suppose the cost is memory, but from time to time a speedup like this would be very beneficial. Any hints about how R programmers could recognize such cases would, I am sure, be of value to the list in general. Many thanks for your efforts, Marc! Regards, Mike On 8/19/06, Marc Schwartz [EMAIL PROTECTED] wrote: On Sat, 2006-08-19 at 13:30 +0100, Prof Brian Ripley wrote: On Sat, 19 Aug 2006, Marc Schwartz wrote: On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote: Greetings, Amigos: I have been trying without success to convert a character string, repeated.measures.columns [1] 3,6,10 into c(3,6,10) for subsequent use. as.numeric(repeated.measures.columns) doesn't work (likely because of the commas) [1] NA Warning message: NAs introduced by coercion I've tried many things including strsplit(repeated.measures.columns, split = ,) which produces a list with only one element, viz: [[1]] [1] 3 6 10 as.numeric() doesn't like that either. Clearly: 1) I cannot be the first person to attempt this, and 2) I've made this WAY harder than it is. Would some kind soul please instruct me (and perhaps subsequent searchers) how to convert the elements of a string into numbers? Thank you. One more step: as.numeric(unlist(strsplit(repeated.measures.columns, ,))) [1] 3 6 10 Use unlist() to take the output of strsplit() and convert it to a vector, before coercing to numeric. Or, more simply, use [[1]] as in as.numeric(strsplit(repeated.measures.columns, ,)[[1]]) Also, eval(parse(text=paste(c(, repeated.measures.columns, looks competitive, and is quite a bit more general (e.g. allows spaces, works with complex numbers), or you can use scan() from an anonymous file or a textConnection. I would say more than competitive: repeated.measures.columns - paste(1:10, collapse = ,) str(repeated.measures.columns) chr 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,4| __truncated__ system.time(res1 - as.numeric(unlist(strsplit(repeated.measures.columns, , [1] 24.238 0.192 26.200 0.000 0.000 system.time(res2 - as.numeric(strsplit(repeated.measures.columns, ,)[[1]])) [1] 24.313 0.196 26.471 0.000 0.000 system.time(res3 - eval(parse(text=paste(c(, repeated.measures.columns, ) [1] 0.328 0.004 0.395 0.000 0.000 str(res1) num [1:10] 1 2 3 4 5 6 7 8 9 10 ... str(res2) num [1:10] 1 2 3 4 5 6 7 8 9 10 ... str(res3) num [1:10] 1 2 3 4 5 6 7 8 9 10 ... all(res1 == res2) [1] TRUE all(res1 == res3) [1] TRUE Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Autocompletion
I mostly use R under Linux and Xemacs with the truly wonderful ESS (Emacs Speaks Statistics). It has numerous features, one of which is a pretty comprehensive auto-completion facility. The few times I have used R under Windows, any auto-completion feature that may be there did not fall readily to hand (translate: I poked a few keys and didn't notice anything auto-completing), and I didn't spend any time looking for one. As there are several platforms on which R runs, and a number of interactive interfaces, you may need to be a bit more specific in the framing of your question to get a more informative response. Regards, Mike On 8/16/06, Óttar Ísberg [EMAIL PROTECTED] wrote: Hi there! I may be guilty of not doing my homework, but still, I've searched. I'm a relative newcomer to R (my forte is at present MATLAB, but for various reasons I'm trying to get literate in R). My question is: Is there an autocompletion feature buried somewhere in R? All the best Óttar [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random section of samples based on group membership
Well, how you do it might be a matter of taste with respect to how you want the results. You could try using by with sample by(x,x[,3],function(y){y[sample(nrow(y),1),]}) This will return a list with one list element for each sample group. You can the combine the list back into a matrix. That's my naive solution; no doubt there will be half a dozen better ways to go about it. Also, some of the clustering functions I have seen will sample for you. On 7/24/06, Wade Wall [EMAIL PROTECTED] wrote: Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I basically want to combine the randomly selected samples from the 120 groups into a new matrix in order to perform a cluster analysis. Thanks, Wade __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Calculating Mean
I'd hazard a guess that data.linear$Weight may have a Not Available data point (ie. missing data). f - c(NA,rnorm(10)) mode(f) [1] numeric mean(f) [1] NA If you'd like to compute the mean anyway, you can use mean(f,na.rm=TRUE) [1] 0.3433036 On 7/9/06, justin rapp [EMAIL PROTECTED] wrote: I have a vector containing players' weights. When I enter mode(data.linear$Weight) numeric is returned. When I type mean(data.linear$Weight) NA is returned. Any ideas as to why this may be the case? I am trying to calculate this ultimately so I can superimpose a normal density line over a histogram containing the weights? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Combining a list of similar dataframes into a single dataframe
I would be very grateful to anyone who could point to the error of my ways in the following. I have a dataframe called net1, as such: str(net1) `data.frame':114192 obs. of 9 variables: $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr 2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 ... $ instance : Factor w/ 22 levels 1,2,Compaq Ethernet_Fast Ethernet Adapter_Module,..: 4 4 4 4 4 4 4 4 4 4 ... $ instanceno : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1 1 1 1 1 ... $ perftime : num 3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ... $ perffreq : num 6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ... $ perftime100nsec: num 1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ... $ countername: Factor w/ 4 levels Bytes Received/sec,..: 1 3 2 4 1 3 2 4 1 3 ... $ countervalue : num 6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ... What I am trying to do is subset this thing down by server, instance, instanceno, countername and then apply a function to each subsetted dataframe. The function performs a calculation on countervalue, essentially collapsing instanceno and instance down to a single value. Here is a snippet of my code: t1 - by(net1, list( net1$server, factor(as.character(net1$countername))),# get rid of unused levels of countername for this server function(x){ g - by(x, list(factor(as.character(x$instance)), # get rid of unused levels of instance for this server factor(as.character(x$instanceno))), # same with instanceno function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))}) data.frame(server=x$server, ts=x$ts, countername = x$countername, countervalue = apply(sapply(g[!sapply(g,is.null)],I),1,sum)) }) So t1 then is a list of dataframes, each with an identical set of columns) str(t1[[1]]) `data.frame': 149 obs. of 4 variables: $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr 2006-06-30 12:31:44 2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ... $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1 1 1 1 1 1 ... $ countervalue: numNA 938 816 4213 906 ... What I'd dearly love to do, without looping or lapply-ing through t1 and rbinding (too much data for this to finish quickly enough -- this is about 10% of what I'm eventually going to have to manage), is convert t1 to one big dataframe. On the other hand, I admit that I may be going about this wrongly from the start; perhaps there's a better approach? Any pointers would be most gratefully received. Many thanks! -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Combining a list of similar dataframes into a single dataframe
Well, this worked, and rather more quickly than I had expected. Many thanks to the dogs, who told me the answer in return for walking them and feeding them! jj - eval(parse(text=paste(sep= ,rbind(,paste(sep= ,t1[[,1:length(t1),]],collapse=,), str(jj) `data.frame': 85644 obs. of 4 variables: $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr 2006-06-30 12:31:44 2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ... $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1 1 1 1 1 1 ... $ countervalue: numNA 938 816 4213 906 ... On 7/8/06, Mike Nielsen [EMAIL PROTECTED] wrote: I would be very grateful to anyone who could point to the error of my ways in the following. I have a dataframe called net1, as such: str(net1) `data.frame':114192 obs. of 9 variables: $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr 2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 2006-06-30 12:31:44 ... $ instance : Factor w/ 22 levels 1,2,Compaq Ethernet_Fast Ethernet Adapter_Module,..: 4 4 4 4 4 4 4 4 4 4 ... $ instanceno : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1 1 1 1 1 ... $ perftime : num 3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ... $ perffreq : num 6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ... $ perftime100nsec: num 1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ... $ countername: Factor w/ 4 levels Bytes Received/sec,..: 1 3 2 4 1 3 2 4 1 3 ... $ countervalue : num 6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ... What I am trying to do is subset this thing down by server, instance, instanceno, countername and then apply a function to each subsetted dataframe. The function performs a calculation on countervalue, essentially collapsing instanceno and instance down to a single value. Here is a snippet of my code: t1 - by(net1, list( net1$server, factor(as.character(net1$countername))),# get rid of unused levels of countername for this server function(x){ g - by(x, list(factor(as.character(x$instance)), # get rid of unused levels of instance for this server factor(as.character(x$instanceno))), # same with instanceno function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))}) data.frame(server=x$server, ts=x$ts, countername = x$countername, countervalue = apply(sapply(g[!sapply(g,is.null)],I),1,sum)) }) So t1 then is a list of dataframes, each with an identical set of columns) str(t1[[1]]) `data.frame': 149 obs. of 4 variables: $ server : Factor w/ 122 levels AB93-99,AMP93-1,..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr 2006-06-30 12:31:44 2006-06-30 12:32:58 2006-06-30 12:34:46 2006-06-30 12:36:55 ... $ countername : Factor w/ 4 levels Bytes Received/sec,..: 1 1 1 1 1 1 1 1 1 1 ... $ countervalue: numNA 938 816 4213 906 ... What I'd dearly love to do, without looping or lapply-ing through t1 and rbinding (too much data for this to finish quickly enough -- this is about 10% of what I'm eventually going to have to manage), is convert t1 to one big dataframe. On the other hand, I admit that I may be going about this wrongly from the start; perhaps there's a better approach? Any pointers would be most gratefully received. Many thanks! -- Regards, Mike Nielsen -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html