Re: [R] Merging two data frames, but keeping NAs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/05/13, 16:11 , Sarah Goslee wrote: Adding the argument all.x=TRUE to merge() will retain the NA values, but the only reliable way I've found to preserve order with NA values in a merge is to add an index column to x, merge the data, sort on the index column, then delete it. Thanks Sarah - that works nicely, although it is a not so nice workaround 0 there should be an argument in merge to keep NA... Cheers, Rainer Sarah On Thu, Dec 5, 2013 at 9:56 AM, Rainer M Krug rai...@krugs.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi My brain is giving up on this... I have the following two data.frames: x - data.frame(ref=c(NA, NA, NA, 10:5, NA, 1:5)) y - data.frame(id = c(2, 3, 4, 6, 7, 9, 8), val = 101:107) Which look as follow: x ref 1 NA 2 NA 3 NA 4 10 59 68 77 86 9 5 10 NA 11 1 12 2 13 3 14 4 15 5 y id val 1 2 101 2 3 102 3 4 103 4 6 104 5 7 105 6 9 106 7 8 107 Now I want to merge y into x, but that a) the sort order of x stays the same (sort=FALSE in merge()) and b) the NAs stay The result should look as follow (column id only here for clarity): result ref id val 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 10 NA NA 59 9 106 68 8 107 77 7 105 86 6 104 95 NA NA 10 NA NA NA 11 1 NA NA 12 2 2 101 13 3 3 102 14 4 4 103 15 5 NA NA merge(x, y, by.x=ref, by.y=id, sort=FALSE) leaves out the NA, but otherwise it works: merge(x, y, by.x=1, by.y=id, sort=FALSE) ref val 1 9 106 2 8 107 3 7 105 4 6 104 5 2 101 6 3 102 7 4 103 Is there any way that I can tell merge() to keep the NA, or how can I achieve what I want? Thanks, Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSoYwnAAoJENvXNx4PUvmCTjwH/2s8NdixLDI7uWvZ0p90wFxK OMq9IcOTQ/VEK6ksYzN5e8Q6ukGCgMPW2OKqrLkqr9xhtt49toWR64CgXGgqnKYu Vu5BT8MldwvtLYLWjyGGlrsz4VXFBixTQxfPPltSXakT742Wno7T0OLIL7V8FBgk AqdRZpN6+QfBiQGFO7doXWndvnvXXD3uOqEAe89xwV3PBNHLCNDcMKY74HQ+t4F+ RrBzKZRvBOrwyfHFGFGfvEluewpcsPY2ooR/TqcO1XaLz94A5F2RcHdedqkIcdln tEcOWZq9j9RWQo/9Af4pdxv9CClt8molP3rG4JRYA4x9JiSj4GNYNNF5wnofTAw= =nxjF -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging two data frames, but keeping NAs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/05/13, 16:37 , arun wrote: Hi, Try ?join() library(plyr) Well - what would we do without Hadley ... He solved many problems we didn't know we would have soon... Cheers, Rainer y$ref - y$id join(x,y,by=ref) ref id val 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 10 NA NA 59 9 106 68 8 107 77 7 105 86 6 104 95 NA NA 10 NA NA NA 11 1 NA NA 12 2 2 101 13 3 3 102 14 4 4 103 15 5 NA NA A.K. On Thursday, December 5, 2013 9:58 AM, Rainer M Krug rai...@krugs.de wrote: Hi My brain is giving up on this... I have the following two data.frames: x - data.frame(ref=c(NA, NA, NA, 10:5, NA, 1:5)) y - data.frame(id = c(2, 3, 4, 6, 7, 9, 8), val = 101:107) Which look as follow: x ref 1 NA 2 NA 3 NA 4 10 59 68 77 86 95 10 NA 11 1 12 2 13 3 14 4 15 5 y id val 1 2 101 2 3 102 3 4 103 4 6 104 5 7 105 6 9 106 7 8 107 Now I want to merge y into x, but that a) the sort order of x stays the same (sort=FALSE in merge()) and b) the NAs stay The result should look as follow (column id only here for clarity): result ref id val 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 10 NA NA 59 9 106 68 8 107 77 7 105 86 6 104 95 NA NA 10 NA NA NA 11 1 NA NA 12 2 2 101 13 3 3 102 14 4 4 103 15 5 NA NA merge(x, y, by.x=ref, by.y=id, sort=FALSE) leaves out the NA, but otherwise it works: merge(x, y, by.x=1, by.y=id, sort=FALSE) ref val 1 9 106 2 8 107 3 7 105 4 6 104 5 2 101 6 3 102 7 4 103 Is there any way that I can tell merge() to keep the NA, or how can I achieve what I want? Thanks, Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSoYxtAAoJENvXNx4PUvmC8JMIANWUXBhCFgKv+wZs2oKv1jMm qGLcd31a55j8NSoZZRf5v6coG+UEdVGhBu4cLlt1+0BRAhYIK9AnLvV9KXbt5zbI PKySevB3box1ILbwsr8JH2YyOtlgjjint4LcGuEr4doNy0uo7a3G9J3ctxZgDFeE QrmDH8EFc55lX76gzp41xUaAxvBP72GlgwK9O4jyO4f19LFcJ87C68s7Gwm2Qs4x Ysc3JmZ8tC4BlD4H5FV/Pf6cLCxoX3CgQERGD+NNe5HCW/XSXOYsKzreamPr7ayd bAuTDLRpPqUSYKG/nbcvjj0HMs06YNTYP4LTnwp08QUJ2VH98viQkTBF8OxDGgI= =mK8w -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to concatenate the results from parallelized nested foreach loops
Hi all, I am working with data.table objects within nested foreach loops and I am having trouble creating the results object the way I would prefer. Code below with sample data: library(iterators) library(data.table) library(foreach) #generate dummy data set.seed(1212) sample1 - data.frame(parentid=round((runif(5, min=1, max=5))), childid=round(runif(10, min=1, max=10))) length(unique(sample1$parentid)) #get unique parents sample1uniq - as.data.frame(unique(sample1$parentid)) names(sample1uniq) - parentid #convert original dataset to data.table sample1 - data.table(sample1) setkey(sample1,parentid) #convert unique ids to data.table sample1uniq - data.table(sample1uniq) setkey(sample1uniq,parentid) #a random sample of 5K to users to scan against sample2uniq_idx - sample(1:nrow(sample1uniq), size=5000) sample2uniq - sample1uniq[sample2uniq_idx] sample2uniq - data.table(sample2uniq) setkey(sample2uniq,parentid) #construct iterators sample1uniq_iter - iter(sample1uniq) sample2uniq_iter - iter(sample2uniq) outerresults - foreach (x = sample1uniq_iter, .combine=rbind, .packages=c('foreach','doParallel', 'data.table')) %dopar% { b - sample1[J(x)] #ith parent b2 - as.data.frame(b)[,2] #ith parent's children foreach (y = sample2uniq_iter, .combine=rbind) %dopar% { c - sample1[J(y)] #jth parent c2 - as.data.frame(c)[,2] #jth parent's children common - length(intersect(b2, c2)) if (common0) { uni - length(union(b2, c2)) results - list(u1=x, u2=y, inter=common, union=uni) } } } Note that all tasks can be done in parallel with no dependency issues. I was expecting the results to come out like this (made up): u1 u2 inter union 1 2 10 20 1 3 410 1 4 715 1 5 610 2 3 10 20 2 4 410 3 5 710 4 5 610 But they don't. Do I need to implement a different combine function? Any other ideas/help will be appreciated. thx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate a smoothed surface for a three dimensional dataset?
The following question is inspired by Jun's problem, which resembles some of my own problems, but goes off on a tangent about applying plot3D from Karline Soetart. On Thu, Dec 5, 2013 at 11:52 PM, Bert Gunter gunter.ber...@gene.com wrote: Your comment that: I can see the critical point here is to find a right function to make the prediction. is what indicates to me that your critical point is that you have insufficient knowledge and need help. Feel free to disagree, of course. I don't know if it's true for Jun, but it's definitely true for me - I have insufficient knowledge! I'm out of my depth with surface estimation, but I have to learn how to do it, one way or the other. Currently I'm reading the docs for plot3d. I loaded the package into rstudio and ran some of the examples. The image2D example seems to get its data from a data.frame called volcano with a small v. imag2D nr - nrow(volcano) imag2D nc - ncol(volcano) imag2D image2D(volcano, x = 1:nr, y = 1:nc, lighting = TRUE, imag2D+main = volcano, clab = height, m) The objects() command shows a Volcano with a big V. The small-v and big-V volcanoes are not the same, because the str command shows: [69] mtcars myf n nam [73] nc nms nr o ... [117] V V2 Volcano volcx [121] volcy VV Vy w [125] warm.palweight width wombat [129] x x.atxx xyz.fit [133] y y1 y2 y3 [137] y.atyearyy z [141] z0 zi z.predict zz [145] zzz str(Volcano) num [1:29, 1:21] 100 103 105 108 110 116 120 122 123 118 ... str(volcano) num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ... I don't understand how the volcano object works well enough to power the image2D command, but doesn't show up in objects(). At first I thought there was some kind of secret smuggling compartment in memory space, and nr and nc and volcano were all hidden in that secret place. But in fact, nr and nc show up in objects(). So ... I am even less educated than the other newbies on the list, and I'm following along, and I really don't see how R is doing what it's doing. Should I be reading the plot3D .pdf textbooks, or should I give up and go back to some much more basic textbook? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generating restricted numbers
Hello everyone, I'm trying to generate a sequence that consists of random numbers and the following algorithm works well ### a - 0.08 b - 0.01 T - 90 t - 0:T alpha - 1 e - rnorm(T, mean = 0, sd = 0.1) d - c( runif(1,0, a*T), rep(0, T-1) ) for (i in 2:T) { d[i] - alpha * d[i-1] + e[i] } plot(d, type=l) ## But I have to add this restriction each d on day t must satisfy to belong to the time-dependent interval [0, a*(T-t)] . For example, d on day 50 can be minimal 0 and maximal a*40. I hope, somebody helps me. Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] model selection with step()
I am using the step() function to select a model using backward elimination, with AIC as the selection criterion. The full regression model contains three predictors, plus all the second order terms and two-way interactions. The full model is fit via lm() using two different model formulae. One formula uses explicitly defined variables for the second-order and interaction terms and the other formula uses the I(x^2) and colon operators. The fit generated by lm() is exactly the same for both models, but when I pass these fitted models to the step() function, I get two different results. Apparently, step() does not recognize the three main predictors unless the second order and interaction terms are explicitly defined as separate variables. I assigned this problem to my first-year graduate students, not realizing that R would give two different answers. Now I have to re-grade their homework, but I would really like to give them a reasonable explanation for the discrepancy. The complete code is given below. Could anyone shed some light on this mystery? Thanks in advance, Karen Keating Kansas State University # Exercise 9.13, Kutner, Nachtsheim, Neter Li temp- scan() 49.0 45.0 36.0 45.0 55.0 30.0 28.0 40.0 85.0 11.0 16.0 42.0 32.0 30.0 46.0 40.0 26.0 39.0 76.0 43.0 28.0 42.0 78.0 27.0 95.0 17.0 24.0 36.0 26.0 63.0 80.0 42.0 74.0 25.0 12.0 52.0 37.0 32.0 27.0 35.0 31.0 37.0 37.0 55.0 49.0 29.0 34.0 47.0 38.0 26.0 32.0 28.0 41.0 38.0 45.0 30.0 12.0 38.0 99.0 26.0 44.0 25.0 38.0 47.0 29.0 27.0 51.0 44.0 40.0 37.0 32.0 54.0 31.0 34.0 40.0 36.0 dat- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T) colnames(dat)-c('Y','X1','X2','X3') dat - data.frame(dat) attach(dat) # second order terms and interactions X12-X1*X2 X13-X1*X3 X23-X2*X3 X1sq - X1^2 X2sq - X2^2 X3sq - X3^2 fit1 - lm(Y~ X1sq + X2sq + X3sq +X1+X2+X3+ X12 + X13 + X23 ) fit2 - lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3) sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same dim(model.matrix(fit1)) # 19 x 10 dim(model.matrix(fit2)) # 19 x 10 dim(fit1$model) # 19 x 10 dim(fit2$model) # 19 x 7 -- could this cause the discrepancy? back1 - step(fit1,direction='backward') back2 - step(fit2,direction='backward') # Note that 'back1' considers the three primary predictors X1, X2 and X3, # while 'back2' does not. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Error Bar
Uh, no. You are forgetting to take the square root of 10, and to divide by the square root of 12. The variance of Y is (exactly) (56^2 - 1)/12, so the variance of Y-bar is this quantity over 10, so the standard deviation of Y-bar is sqrt((56^2 - 1)/12)/sqrt(10). Which is approximately (ignoring the -1) 56/sqrt(12) * 1/sqrt(10). cheers, Rolf On 12/06/13 20:26, Jim Lemon wrote: On 12/06/2013 04:16 PM, mohan.radhakrish...@polarisft.com wrote: Hi, Basic question with basic code. I am simulating a set of 'y' values for a standard 'x' value measurement. So here the error bars are very long because the number of samples are very small. Is that correct ? I am plotting the mean of 'y' on the 'y' axis. Thanks, Mohan x- data.frame(c(5,10,15,20,25,30,35,40,50,60)) colnames(x)- c(x) y- sample(5:60,10,replace=T) y1- sample(5:60,10,replace=T) y2- sample(5:60,10,replace=T) y3- sample(5:60,10,replace=T) y4- sample(5:60,10,replace=T) z- data.frame(cbind(x,y,y1,y2,y3,y4)) z$mean- apply(z[,c(2,3,4,5,6)],2,mean) z$sd- apply(z[,c(2,3,4,5,6)],2,sd) z$se- z$sd / sqrt(5) Hi Mohan, As your samples seem to follow a discrete uniform distribution, the standard deviation is approximately the number of integers in the range (56) divided by the number of observations (10). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Error Bar
The latest code I have put together is this. Could you point out what is missing here ? #Reference values plotted on x-axis. These are constant. #These values could be time of day. So every day at the same #time we could collect other measurements referenceset - data.frame(c(5,10,15,20,25,30,35,40,50,60)) colnames( referenceset) - c(reference) #These are the sets of measurements. So every day at the same #time we could collect several samples. This is simulated now. sampleset - data.frame( matrix(sample(1:2, c(2), replace = TRUE), ncol = 2000) ) sampleset - cbind( sampleset, referenceset ) #Calculate mean sampleset$mean - apply(sampleset[,1:10],2,mean) #Calculate Standard Deviation sampleset$sd - apply(sampleset[,c(1:10)],2,sd) #Calculate Standard Error sampleset$se - sampleset$sd / sqrt(2000) #print(sampleset) plot( sampleset$reference, sampleset$mean, las=1, ylab=Mean of 'y' values, xlab=x, ); arrows(sampleset$reference, sampleset$mean-sampleset$se, sampleset$reference, sampleset$mean+sampleset$se, code = 3, angle=90, length=0.2) Thanks. From: Rolf Turner r.tur...@auckland.ac.nz To: Jim Lemon j...@bitwrit.com.au Cc: mohan.radhakrish...@polarisft.com, r-help@r-project.org Date: 12/06/2013 02:53 PM Subject:Re: [R] Simple Error Bar Uh, no. You are forgetting to take the square root of 10, and to divide by the square root of 12. The variance of Y is (exactly) (56^2 - 1)/12, so the variance of Y-bar is this quantity over 10, so the standard deviation of Y-bar is sqrt((56^2 - 1)/12)/sqrt(10). Which is approximately (ignoring the -1) 56/sqrt(12) * 1/sqrt(10). cheers, Rolf On 12/06/13 20:26, Jim Lemon wrote: On 12/06/2013 04:16 PM, mohan.radhakrish...@polarisft.com wrote: Hi, Basic question with basic code. I am simulating a set of 'y' values for a standard 'x' value measurement. So here the error bars are very long because the number of samples are very small. Is that correct ? I am plotting the mean of 'y' on the 'y' axis. Thanks, Mohan x- data.frame(c(5,10,15,20,25,30,35,40,50,60)) colnames(x)- c(x) y- sample(5:60,10,replace=T) y1- sample(5:60,10,replace=T) y2- sample(5:60,10,replace=T) y3- sample(5:60,10,replace=T) y4- sample(5:60,10,replace=T) z- data.frame(cbind(x,y,y1,y2,y3,y4)) z$mean- apply(z[,c(2,3,4,5,6)],2,mean) z$sd- apply(z[,c(2,3,4,5,6)],2,sd) z$se- z$sd / sqrt(5) Hi Mohan, As your samples seem to follow a discrete uniform distribution, the standard deviation is approximately the number of integers in the range (56) divided by the number of observations (10). This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I print predicted effect sizes in forest plot?
The model you are fitting is a random-effects model and does not include any potential moderators/covariates. Therefore, the estimated intercept of that model is *the* estimated/predicted (average) effect and it applies to each study. That is why the predict function also just gives you that value. That value is also included in the forest plot (at the bottom). The predicted (average) effect will no longer be the same for each study only if you include covariates in the model. Best, Wolfgang -- Wolfgang Viechtbauer, Ph.D., Statistician Department of Psychiatry and Psychology School for Mental Health and Neuroscience Faculty of Health, Medicine, and Life Sciences Maastricht University, P.O. Box 616 (VIJV1) 6200 MD Maastricht, The Netherlands +31 (43) 388-4170 | http://www.wvbauer.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alma Wilflinger Sent: Thursday, December 05, 2013 22:06 To: R help r-help@r-project.org Subject: [R] How do I print predicted effect sizes in forest plot? Hi, I am struggling a bit with creating a forest plot containing the predicted effect size. As seen in other studies these effect sizes are shown per study usually as a light grey diamond - which is what I want to achieve. The calls I use are: iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS) summary.rma(iat_result) #not sure how to use it or if needed #predict(iat_result) forest(iat_result) At the end I am getting the forest plot as is without the predicted values. I am not sure if I need the predict function and how to use it? - the predict function deliveres the same values as already computed in the rma object. I checked the manual for package metafor but was not able to find out how to print the predicted values per study. kind regards, Alma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I print predicted effect sizes in forest plot?
One more thing ... You used the command: iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS) This probably does not do what you want it to do. First of all, if you specify vi, there is no need to specify sei (or vice-versa). One is sufficient. But more crucially, I assume 'Mean' is what it says it is - a mean of a certain variable X. And I assume that 'Variance_rounded' is the variance of said variable X. But vi is used to specify the *sampling variance* of yi (or sei is used to specify the standard error), which, for a mean, is the variance divided by N (and the standard error is the SD divided by the square root of N): http://en.wikipedia.org/wiki/Standard_error_of_the_mean#Standard_error_of_the_mean So, my hunch is that you are not supplying the right information to the rma() function. Best, Wolfgang -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alma Wilflinger Sent: Thursday, December 05, 2013 22:06 To: R help r-help@r-project.org Subject: [R] How do I print predicted effect sizes in forest plot? Hi, I am struggling a bit with creating a forest plot containing the predicted effect size. As seen in other studies these effect sizes are shown per study usually as a light grey diamond - which is what I want to achieve. The calls I use are: iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS) summary.rma(iat_result) #not sure how to use it or if needed #predict(iat_result) forest(iat_result) At the end I am getting the forest plot as is without the predicted values. I am not sure if I need the predict function and how to use it? - the predict function deliveres the same values as already computed in the rma object. I checked the manual for package metafor but was not able to find out how to print the predicted values per study. kind regards, Alma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tune an support vector machine
Hej all, actually i try to tune a SVM in R and use the package e1071 wich works pretty well. I do some gridsearch in the parameters and get the best possible parameters for classification. Here is my sample code type-sample(c(-1,1) , 20, replace = TRUE ) weight-sample(c(20:50),20, replace=TRUE) height-sample(c(100:200),20, replace=TRUE) width-sample(c(30:50),20,replace=TRUE) volume-sample(c(1000:5000),20,replace=TRUE) data-cbind(type,weight,height,width,volume) train-as.data.frame(data) library(e1071) features - c(weight,height,width,volume) (formula-as.formula(paste(type ~ , paste(features, collapse= + svmtune=tune.svm(formula, data=train, kernel=radial, cost=2^(-2:5), gamma=2^(-2:1),cross=10) summary(svmtune) My question is if there is a way to tune the features. So in other words - what i wanna do is to try all possible combinations of features : for example use only (volume) or use (weight, height) or use (height,volume,width) and so on for the SVM and to get the best combination back. Best wishes Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating restricted numbers
What value do you want d to take on if it is outside that interval? Here is an example where if d is outside the interval, it is assigned to be one of the interval endpoints. minx - 0 for(i in 2:T) { x - alpha * d[i-1] + e[i] maxx - a*(T-i) if(x minx) { d[i] - minx } else { if(x maxx) { d[i] - maxx } else d[i] - x } } Jean On Thu, Dec 5, 2013 at 10:19 PM, gncl dzgn guncelduz...@hotmail.com wrote: Hello everyone, I'm trying to generate a sequence that consists of random numbers and the following algorithm works well ### a - 0.08 b - 0.01 T - 90 t - 0:T alpha - 1 e - rnorm(T, mean = 0, sd = 0.1) d - c( runif(1,0, a*T), rep(0, T-1) ) for (i in 2:T) { d[i] - alpha * d[i-1] + e[i] } plot(d, type=l) ## But I have to add this restriction each d on day t must satisfy to belong to the time-dependent interval [0, a*(T-t)] . For example, d on day 50 can be minimal 0 and maximal a*40. I hope, somebody helps me. Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I print predicted effect sizes in forest plot?
At 21:05 05/12/2013, Alma Wilflinger wrote: Hi, I am struggling a bit with creating a forest plot containing the predicted effect size. As seen in other studies these effect sizes are shown per study usually as a light grey diamond - which is what I want to achieve. The calls I use are: iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS) Alma You do not need to specify both vi and sei as one is sufficient and you do not need ni as well. I realise that is not the question you asked (which Wolfgang has already answered). summary.rma(iat_result) #not sure how to use it or if needed #predict(iat_result) forest(iat_result) At the end I am getting the forest plot as is without the predicted values. I am not sure if I need the predict function and how to use it? - the predict function deliveres the same values as already computed in the rma object. I checked the manual for package metafor but was not able to find out how to print the predicted values per study. kind regards, Alma [[alternative HTML version deleted]] Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double Infinite Integration
Aya Anas aanas at feps.edu.eg writes: Hello all, I need to perform the following integration where the integrand is the product of three functions: f(x)g(y)z(x,y) the limits of x are(0,inf) and the limits of y are(-inf,inf). Could this be done using R? There is a saying: Don't ask Can this be done in R?, ask How is it done? Extracting function f(x) from the inner integral may not always be the best idea. And applying package 'cubature' will not work as adaptIntegrate() does not really handle non-finite interval limits. As an example, let us assume the functions are f - function(x) x g - function(y) y^2 h - function(x, y) exp(-(x^2+y^2)) Define a function that calculates the inner integral: F1 - function(x) { fun - function(y) f(x) * g(y) * h(x, y) integrate(fun, -Inf, Inf)$value } F1 - Vectorize(F1) # requested when using integrate() We have to check that integrate() is indeed capable of computing this integrand over an infinite interval. F1(c(0:4)) # looks good ## [1] 0.00e+00 3.260247e-01 3.246362e-02 3.281077e-04 3.989274e-07 Now integrate this function over the second (infinite) interval. integrate(F1, 0, Inf) ## 0.4431135 with absolute error 2.4e-06 Correct, as the integral is equal to sqrt(pi)/4 ~ 0.44311346... If we extract f(x) from the inner integral the value of the integral and the computation times will be the same, but the overall handling will be slightly more complicated. I tried using the function integrate 2 times, but it didn't work: z- function(x,y) { } f-function(x){ rr-put here the function in x *integrate(function(y) z(x, y), -Inf,Inf)$value return(rr) } rr2-integrate(function(x) f(x), 0, Inf)$value print(rr2) I didn't get any output at all!!! Thanks, Aya __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging different columns in one matrix
Hi, May be this helps: dat1 - read.table(text= a a b b c c x y x y x y 12 34 256 25 5 32 5 45 23 452 21 45,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE) mat1 - matrix(0,5,5,dimnames=list(NULL,c(x,letters[1:4]))) mat1[,1]- sort(unique(as.numeric(unlist(dat1[-1,which(dat1==x,arr.ind=TRUE)[,2]] dat1New - dat1[-1,which(dat1==x,arr.ind=TRUE)[,2]] dat2New - dat1[-1,which(dat1==y,arr.ind=TRUE)[,2]] mat1[,2:4] -sapply(seq_len(ncol(dat1New)),function(i) {x1 -dat2New[match(mat1[,1],dat1New[,i]),i] x1[is.na(x1)] -0 as.numeric(x1)}) mat1 # x a b c d #[1,] 5 45 0 32 0 #[2,] 12 34 0 0 0 #[3,] 21 0 0 45 0 #[4,] 23 0 452 0 0 #[5,] 256 0 25 0 0 A.K. Hello everyone, I have a dataframe made as follows: a a b b c c x y x y x y 12 34 256 25 5 32 5 45 23 452 21 45 ... ... ... ... ... ... My intention is to create just one matrix made as follows x a b c d 5 45 0 32 0 12 34 0 0 0 21 0 0 45 0 23 0 452 0 0 256 ... ... ... ... ... As you can see I want on the first column all the values collected from all the x columns and ordered. On the other columns I want the y-values related to every letter (a-b-c...). For example the first value on the x column is 5 (the smallest). It is present in the a x-values (first matrix) so in the second table I report its related y-value (45). However 5 is not present in the b x-values so I report a 0 on the second table. And so on. I don't know if it's a difficult task but I had several problems with the double header handling and the data. I looked for some clues on the internet but documentation is very fragmented and lacking. (So, in addition, any recommendation for good R books?) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
I am having trouble understanding how to use sapply (or similar functions) with a user defined function with multiple parameters. I have the following functions defined q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } q2.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 3 } return (retVal) } q3.ans - function(x) { retVal = 0 if (x == 1) { retVal = 2 } else if (x ==2) { retVal = 3 } return (retVal) } evaluate.questions - function(q.1,q.2,q.3) { a - q1.ans(q.1) b - q2.ans(q.2) c - q3.ans(q.3) retVal = 0 # Set default value to be no preference # The following code only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) if (a == 0) { if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 5 } } else if (b == 2) { if (c == 2) { retVal = 2 # State machine ID 9 } } } else if (a == 1) { if (b == 0) { if (c == 1) { retVal = 1 # State machine ID 11 } } else if (b == 1) { retVal = 1# State machine ID's 13-15, value of C doesn't matter } else if (b == 2) { if (c == 1) { retVal = 1 # State machine ID 17 } else if (c == 2) { retVal = 2 # State machine ID 18 } } } else if (a == 2) { if (b == 0) { if (c == 2) { retVal = 2 # State machine ID 21 } } else if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 23 } else if (c == 2) { retVal = 2 # State machine ID 24 } } else if (b == 2) { retVal = 2# State machine ID's 25-27, value of C doesn't matter } } return (retVal) } And a data set that looks like this: ID,Q1,Q2,Q3 1,2,2,2 2,2,1,1 3,1,1,1 4,1,2,2 5,2,2,1 6,1,2,1 ... I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? This is the error message I am getting: Error in x --1 : Comparison (1) is possible only for atomic and list types In addition: warning messages: In if (x == 1) { : the condition has length 1 and only the first element will be used [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
Hi The warning is due to fact that if takes only single scalar value not an entire vector. Maybe you shall explain more clearly what result do you expect. I bet that there is vectorised solution to your problem but I am lost in your ifs and cannot follow what shall be the output. Please use dput(head(df)) when showing input data and clearly describe intended result. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Walter Anderson Sent: Friday, December 06, 2013 4:44 PM To: r-help@r-project.org Subject: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function I am having trouble understanding how to use sapply (or similar functions) with a user defined function with multiple parameters. I have the following functions defined q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } q2.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 3 } return (retVal) } q3.ans - function(x) { retVal = 0 if (x == 1) { retVal = 2 } else if (x ==2) { retVal = 3 } return (retVal) } evaluate.questions - function(q.1,q.2,q.3) { a - q1.ans(q.1) b - q2.ans(q.2) c - q3.ans(q.3) retVal = 0 # Set default value to be no preference # The following code only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) if (a == 0) { if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 5 } } else if (b == 2) { if (c == 2) { retVal = 2 # State machine ID 9 } } } else if (a == 1) { if (b == 0) { if (c == 1) { retVal = 1 # State machine ID 11 } } else if (b == 1) { retVal = 1# State machine ID's 13-15, value of C doesn't matter } else if (b == 2) { if (c == 1) { retVal = 1 # State machine ID 17 } else if (c == 2) { retVal = 2 # State machine ID 18 } } } else if (a == 2) { if (b == 0) { if (c == 2) { retVal = 2 # State machine ID 21 } } else if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 23 } else if (c == 2) { retVal = 2 # State machine ID 24 } } else if (b == 2) { retVal = 2# State machine ID's 25-27, value of C doesn't matter } } return (retVal) } And a data set that looks like this: ID,Q1,Q2,Q3 1,2,2,2 2,2,1,1 3,1,1,1 4,1,2,2 5,2,2,1 6,1,2,1 ... I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? This is the error message I am getting: Error in x --1 : Comparison (1) is possible only for atomic and list types In addition: warning messages: In if (x == 1) { : the condition has length 1 and only the first element will be used [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
Thank you for your response! I am attempting to determine a preference from the answers to three binomial questions; q.1) 1 or 2q.2) 1 or 3q.3) 2 or 3 However, the questions are coded with either a 1 or 2 (though no answer is also possible) and the first three functions (q#.ans) convert those values to the 1,2,or 3 shown above and generate one of the following result for each row of the table; 0 - no preference, or 1,2,3 which indicates the preference indicated by the question The if's implement the following state conditions: # ID A B C Preference # 1 0 0 0 None # 2 0 0 1 None # 3 0 0 2 None # 4 0 1 0 None # 5 0 1 1 Option 1 # 6 0 1 2 None # 7 0 2 0 None # 8 0 2 1 None # 9 0 2 2 Option 2 # 10 1 0 0 None # 11 1 0 1 Option 1 # 12 1 0 2 None # 13 1 1 0 Option 1 # 14 1 1 1 Option 1 # 15 1 1 2 Option 1 # 16 1 2 0 None # 17 1 2 1 Option 1 # 18 1 2 2 Option 2 # 19 2 0 0 None # 20 2 0 1 None # 21 2 0 2 Option 2 # 22 2 1 0 None # 23 2 1 1 Option 1 # 24 2 1 2 Option 2 # 25 2 2 0 Option 2 # 26 2 2 1 Option 2 # 27 2 2 2 Option 2 The if statement only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) On 12/06/2013 09:59 AM, PIKAL Petr wrote: Hi The warning is due to fact that if takes only single scalar value not an entire vector. Maybe you shall explain more clearly what result do you expect. I bet that there is vectorised solution to your problem but I am lost in your ifs and cannot follow what shall be the output. Please use dput(head(df)) when showing input data and clearly describe intended result. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Walter Anderson Sent: Friday, December 06, 2013 4:44 PM To: r-help@r-project.org Subject: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function I am having trouble understanding how to use sapply (or similar functions) with a user defined function with multiple parameters. I have the following functions defined q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } q2.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 3 } return (retVal) } q3.ans - function(x) { retVal = 0 if (x == 1) { retVal = 2 } else if (x ==2) { retVal = 3 } return (retVal) } evaluate.questions - function(q.1,q.2,q.3) { a - q1.ans(q.1) b - q2.ans(q.2) c - q3.ans(q.3) retVal = 0 # Set default value to be no preference # The following code only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) if (a == 0) { if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 5 } } else if (b == 2) { if (c == 2) { retVal = 2 # State machine ID 9 } } } else if (a == 1) { if (b == 0) { if (c == 1) { retVal = 1 # State machine ID 11 } } else if (b == 1) { retVal = 1# State machine ID's 13-15, value of C doesn't matter } else if (b == 2) { if (c == 1) { retVal = 1 # State machine ID 17 } else if (c == 2) { retVal = 2 # State machine ID 18 } } } else if (a == 2) { if (b == 0) { if (c == 2) { retVal = 2 # State machine ID 21 } } else if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 23 } else if (c == 2) { retVal = 2 # State machine ID 24 } } else if (b == 2) { retVal = 2# State machine ID's 25-27, value of C doesn't matter } } return (retVal) } And a data set that looks like this: ID,Q1,Q2,Q3 1,2,2,2 2,2,1,1 3,1,1,1 4,1,2,2 5,2,2,1 6,1,2,1 ... I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
Hi So first step is over. Anyway, is there any problem with using dput as I suggested? Instead of using your date I need to generate my own. A-sample(0:2, 10, replace=T) B-sample(0:2, 10, replace=T) C-sample(0:2, 10, replace=T) df-data.frame(A,B,C) df[df[,2]==2,2]-3 df$C-as.numeric(as.character(factor(df$C, labels=c(0,2,3 df A B C 1 0 3 3 2 0 1 2 3 0 3 2 4 1 0 3 5 1 0 3 6 2 3 2 7 1 3 2 8 2 3 3 9 1 1 0 10 0 0 3 -Original Message- From: Walter Anderson [mailto:wandrso...@gmail.com] Sent: Friday, December 06, 2013 5:11 PM To: PIKAL Petr; r-help@r-project.org Subject: Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function Thank you for your response! I am attempting to determine a preference from the answers to three binomial questions; q.1) 1 or 2q.2) 1 or 3q.3) 2 or 3 However, the questions are coded with either a 1 or 2 (though no answer is also possible) and the first three functions (q#.ans) convert those values to the 1,2,or 3 shown above Instead of those tricky ifs (uff uff) you can use either of these df[df[,2]==2,2]-3 df$C-as.numeric(as.character(factor(df$C, labels=c(0,2,3 df A B C 1 0 3 3 2 0 1 2 3 0 3 2 4 1 0 3 5 1 0 3 6 2 3 2 7 1 3 2 8 2 3 3 9 1 1 0 10 0 0 3 And here I am lost again. Please, can you clearly state the way how do you want to choose preferences based on values in those three columns. Regards Petr and generate one of the following result for each row of the table; 0 - no preference, or 1,2,3 which indicates the preference indicated by the question The if's implement the following state conditions: # ID A B C Preference # 1 0 0 0 None # 2 0 0 1 None # 3 0 0 2 None # 4 0 1 0 None # 5 0 1 1 Option 1 # 6 0 1 2 None # 7 0 2 0 None # 8 0 2 1 None # 9 0 2 2 Option 2 # 10 1 0 0 None # 11 1 0 1 Option 1 # 12 1 0 2 None # 13 1 1 0 Option 1 # 14 1 1 1 Option 1 # 15 1 1 2 Option 1 # 16 1 2 0 None # 17 1 2 1 Option 1 # 18 1 2 2 Option 2 # 19 2 0 0 None # 20 2 0 1 None # 21 2 0 2 Option 2 # 22 2 1 0 None # 23 2 1 1 Option 1 # 24 2 1 2 Option 2 # 25 2 2 0 Option 2 # 26 2 2 1 Option 2 # 27 2 2 2 Option 2 The if statement only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) On 12/06/2013 09:59 AM, PIKAL Petr wrote: Hi The warning is due to fact that if takes only single scalar value not an entire vector. Maybe you shall explain more clearly what result do you expect. I bet that there is vectorised solution to your problem but I am lost in your ifs and cannot follow what shall be the output. Please use dput(head(df)) when showing input data and clearly describe intended result. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Walter Anderson Sent: Friday, December 06, 2013 4:44 PM To: r-help@r-project.org Subject: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function I am having trouble understanding how to use sapply (or similar functions) with a user defined function with multiple parameters. I have the following functions defined q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } q2.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 3 } return (retVal) } q3.ans - function(x) { retVal = 0 if (x == 1) { retVal = 2 } else if (x ==2) { retVal = 3 } return (retVal) } evaluate.questions - function(q.1,q.2,q.3) { a - q1.ans(q.1) b - q2.ans(q.2) c - q3.ans(q.3) retVal = 0 # Set default value to be no preference # The following code only implements those values from the state machine that show a preference (ID's 5,9,11,13-15,17-18,21,23- 27) if (a == 0) { if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 5 } } else if (b == 2) { if (c == 2) { retVal = 2 # State machine ID 9 } } } else if (a == 1) { if (b == 0) { if (c == 1) { retVal = 1 # State machine ID 11 } } else if (b == 1) { retVal = 1# State machine ID's 13-15, value of C
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this Read the documentation more closely: sapply(dataFrame, func) applies func() to each column, not row, of dataFrame. preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Furthermore, sapply(X = dataFrame, FUN = func, extraArgument) calls func(dataFrame[, i], extraArgument) for i in seq_len(ncol(dataFrame). One problem is that FUN=evaluate.questions takes 3 arguments and you give it only 2. Another problem is that the third argument you pass to sapply is a function (of 3 arguments) and FUN is not expecting any of its arguments to be functions. It may be easier for you to not use sapply here, but to use for-loops and come up with something that works. (Write tests that will indicate whether it works or not in a variety of situations.) Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Walter Anderson Sent: Friday, December 06, 2013 7:44 AM To: r-help@r-project.org Subject: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function I am having trouble understanding how to use sapply (or similar functions) with a user defined function with multiple parameters. I have the following functions defined q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } q2.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 3 } return (retVal) } q3.ans - function(x) { retVal = 0 if (x == 1) { retVal = 2 } else if (x ==2) { retVal = 3 } return (retVal) } evaluate.questions - function(q.1,q.2,q.3) { a - q1.ans(q.1) b - q2.ans(q.2) c - q3.ans(q.3) retVal = 0 # Set default value to be no preference # The following code only implements those values from the state # machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27) if (a == 0) { if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 5 } } else if (b == 2) { if (c == 2) { retVal = 2 # State machine ID 9 } } } else if (a == 1) { if (b == 0) { if (c == 1) { retVal = 1 # State machine ID 11 } } else if (b == 1) { retVal = 1# State machine ID's 13-15, value of C doesn't matter } else if (b == 2) { if (c == 1) { retVal = 1 # State machine ID 17 } else if (c == 2) { retVal = 2 # State machine ID 18 } } } else if (a == 2) { if (b == 0) { if (c == 2) { retVal = 2 # State machine ID 21 } } else if (b == 1) { if (c == 1) { retVal = 1 # State machine ID 23 } else if (c == 2) { retVal = 2 # State machine ID 24 } } else if (b == 2) { retVal = 2# State machine ID's 25-27, value of C doesn't matter } } return (retVal) } And a data set that looks like this: ID,Q1,Q2,Q3 1,2,2,2 2,2,1,1 3,1,1,1 4,1,2,2 5,2,2,1 6,1,2,1 ... I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? This is the error message I am getting: Error in x --1 : Comparison (1) is possible only for atomic and list types In addition: warning messages: In if (x == 1) { : the condition has length 1 and only the first element will be used [[alternative HTML version deleted]] __
[R] Gene Ontology Profiling on Single Data Set with Different Species?
Hey everyone, I have a list of genes for which I would like to get Gene Ontology profiles (i.e. what are the most common GO terms). First I had a look at topGO, but since that compares two data sets, which I don’t have, it wasn’t right for this purpose. I then found goProfiles, which seems to do exactly what I wanted, but there is one problem: the genes I have don’t all come from the same organism, so there’s no organism annotation package. Do you know of any other R package that would do the trick if I give it my list of genes and their GO terms? Or do I have to create my own annotation package and then use goProfiles? Regards, Sarah - Sarah Pohl PhD student Helmholtz Centre for Infection Research eMail: sarah.p...@helmholtz-hzi.de Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 Braunschweig | www.helmholtz-hzi.de Das HZI ist seit 2007 zertifiziertes Mitglied im audit berufundfamilie Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, Bundesministerium für Bildung und Forschung Stellvertreter: MinDirig Rüdiger Eichel, Niedersächsisches Ministerium für Wissenschaft und Kultur Geschäftsführung: Prof. Dr. Dirk Heinz Gesellschaft mit beschränkter Haftung (GmbH) Sitz der Gesellschaft: Braunschweig Handelsregister: Amtsgericht Braunschweig, HRB 477 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mixed model ANCOVA
Hi, I want to set up a mixed model ANCOVA but cannot find a way to do it. There is: * 1 subject factor (random, between subjects) called Subject * 3 categorical within subjects factors called Emotion, Sex, Race * 1 continuous covariate (**WITHIN subjects**) called Score and * a continuous dependent variable called logRT I need a nice and clean table with p-values and effect sizes for each factors and relevant interactions. Which function should I use? I am guessing lmer from lme4 but could not find any example on the forums or on my manual from Gaël Millot. Here is a wild guess : ModelRT - lmer(logRT ~ Race + Sex+ Emotion + Score + Race*Sex + Race*Emotion + Sex*Emotion + Race*Sex*Emotion + (1 | Subject)) Would that be correct ? Thank you, laurie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help - Trace of matrices
Dear, I need to calculate the following equation tr(Sigma^-1 %*% D.Sigma) I know only Sigma (positive definite) and D.Sigma (derivative of Sigma), a naive code is sum(diag(solve(Sigma,D.Sigma))) but these matrices are dense and big dimension (1 x 1), and I need to evaluate this equation many times. What is the better way to evaluate this equation in R ? Note that I need only the diagonal, I think is possible to calculate only the diagnonal, but how ?? -- Wagner Hugo Bonat LEG - Laboratório de Estatística e Geoinformação UFPR - Universidade Federal do Paraná [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Easy Uplift Tree Classify Error
Does anyone know if the error being generated when trying to predict test set data in the Easy Uplift Tree package is something fixable by the user or is this a bug in the program making the package essentially non-operable? This is from the package documentation and fails on the last step of applying the model to the test set: install.packages(EasyUpliftTree) library(EasyUpliftTree) library(survival) data(colon) #APPEARS TO WORK sample.data - na.omit(colon[colon$rx != Lev colon$etype == 2, ]) treat - ifelse(sample.data$rx == Lev+5FU, 1, 0) y - ifelse(sample.data$status == 0, 1, 0) x - sample.data[, c(4:9, 11:14)] x$v1 - factor(x$sex) x$v2 - factor(x$obstruct) x$v3 - factor(x$perfor) x$v4 - factor(x$adhere) x$v5 - factor(x$differ) x$v6 - factor(x$extent) x$v7 - factor(x$surg) x$v8 - factor(x$node4) index - 1:nrow(x) train.index - index[(index%%2 == 0)] test.index - index[index%%2 != 0] y.train - y[train.index] x.train - x[train.index, ] treat.train - treat[train.index] y.test - y[test.index] x.test - x[test.index, ] treat.test - treat[test.index] uplift.tree - buildUpliftTree(y.train, treat.train, x.train) print(uplift.tree) #FAILS apply(1:nrow(x.test), function(i) classify(uplift.tree, x.test[i, ])) #Error in match.fun(FUN) : argument FUN is missing, with no default __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help - Trace of matrices
A fast computation I use is based on the following: A - matrix(rnorm(16), ncol = 4) B - matrix(rnorm(16), ncol = 4) C - A %*% B sum(diag(C)) ### This is less expensive to compute when the matrix multiplication is expensive sum(A * t(B)) So, it just uses the elementwise calculations and sums over all cels -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Wagner Bonat Sent: Friday, December 06, 2013 12:02 PM To: r-help@r-project.org Subject: [R] Help - Trace of matrices Dear, I need to calculate the following equation tr(Sigma^-1 %*% D.Sigma) I know only Sigma (positive definite) and D.Sigma (derivative of Sigma), a naive code is sum(diag(solve(Sigma,D.Sigma))) but these matrices are dense and big dimension (1 x 1), and I need to evaluate this equation many times. What is the better way to evaluate this equation in R ? Note that I need only the diagonal, I think is possible to calculate only the diagnonal, but how ?? -- Wagner Hugo Bonat LEG - Laboratório de Estatística e Geoinformação UFPR - Universidade Federal do Paraná [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
On 12/06/2013 10:43 AM, William Dunlap wrote: I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this Read the documentation more closely: sapply(dataFrame, func) applies func() to each column, not row, of dataFrame. I misunderstood. I thought it was apply the func to each row... My mistake preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Furthermore, sapply(X = dataFrame, FUN = func, extraArgument) calls func(dataFrame[, i], extraArgument) for i in seq_len(ncol(dataFrame). One problem is that FUN=evaluate.questions takes 3 arguments and you give it only 2. Another problem is that the third argument you pass to sapply is a function (of 3 arguments) and FUN is not expecting any of its arguments to be functions. I will need to think about this, I am not sure I understand. I really don't seem to understand how any of the apply functions seem to work. It may be easier for you to not use sapply here, but to use for-loops and come up with something that works. (Write tests that will indicate whether it works or not in a variety of situations.) Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. I already have tested my functions by using a for loop, and they work. Here is the for loop I use. for (indx in 1:length(df$ID)) { df$Preference - evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx]) } I understand that such for loops aren't 'best practice' in R and am trying to learn its approach. Thank you for the suggestions! Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixed model ANCOVA
laurie bayet lauriebayet at gmail.com writes: Hi, I want to set up a mixed model ANCOVA but cannot find a way to do it. There is: * 1 subject factor (random, between subjects) called Subject * 3 categorical within subjects factors called Emotion, Sex, Race * 1 continuous covariate (**WITHIN subjects**) called Score and * a continuous dependent variable called logRT I need a nice and clean table with p-values and effect sizes for each factors and relevant interactions. Which function should I use? I am guessing lmer from lme4 but could not find any example on the forums or on my manual from Gaël Millot. Here is a wild guess : ModelRT - lmer(logRT ~ Race + Sex+ Emotion + Score + Race*Sex + Race*Emotion + Sex*Emotion + Race*Sex*Emotion + (1 | Subject)) Would that be correct ? Thank you, laurie * This might be better on r-sig-mixed-mod...@r-project.org * In R '*' indicates main effects plus all interactions (':' is for an interaction only), so you can abbreviate your formula to ModelRT - lmer(logRT ~ Race*Sex*Emotion + (1 | Subject)) or using lme from the nlme package: ModelRT - lme(logRT~Race*Sex*Emotion, random=~1|Subject) * You should strongly consider passing an explicit 'data' argument rather than picking up the variables from the workspace * See ?pvalues in lme4 for some of your choices about getting tables of p-values and effect sizes (e.g. with auxiliary functions from the car, lmerTest, or pbkrtest packages). Beware that lme will give you denominator and degrees of freedom, but the degrees of freedom may very likely be miscalculated for your within-subject continuous covariate * You should strongly consider whether you need to include among-subject variance in the within-subject factors in your model [see the two refs below] @article{barr_random_2013, title = {Random effects structure for confirmatory hypothesis testing: Keep it maximal}, volume = {68}, issn = {0749-{596X}}, shorttitle = {Random effects structure for confirmatory hypothesis testing}, url = {http://www.sciencedirect.com/science/article/pii/S0749596X12001180}, doi = {10.1016/j.jml.2012.11.001}, abstract = {Linear mixed-effects models ({LMEMs)} have become increasingly prominent in psycholinguistics and related areas. However, many researchers do not seem to appreciate how random effects structures affect the generalizability of an analysis. Here, we argue that researchers using {LMEMs} for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that {LMEMs} generalize best when they include the maximal random effects structure justified by the design. The generalization performance of {LMEMs} including data-driven random effects structures strongly depends upon modeling criteria and sample size, yielding reasonable results on moderately-sized samples when conservative criteria are used, but with little or no power advantage over maximal models. Finally, random-intercepts-only {LMEMs} used on within-subjects and/or within-items data from populations where subjects and/or items vary in their sensitivity to experimental manipulations always generalize worse than separate F1 and F2 tests, and in many cases, even worse than F1 alone. Maximal {LMEMs} should be the ‘gold standard’ for confirmatory hypothesis testing in psycholinguistics and beyond.}, number = {3}, urldate = {2013-09-26}, journal = {Journal of Memory and Language}, author = {Barr, Dale J. and Levy, Roger and Scheepers, Christoph and Tily, Harry J.}, month = apr, year = {2013}, keywords = {Generalization, Linear mixed-effects models, Monte Carlo simulation, statistics}, pages = {255--278} } @article{schielzeth_conclusions_2009, title = {Conclusions beyond support: overconfident estimates in mixed models}, volume = {20}, issn = {1045-2249, 1465-7279}, shorttitle = {Conclusions beyond support}, url = {http://beheco.oxfordjournals.org/content/20/2/416}, doi = {10.1093/beheco/arn145}, abstract = {Mixed-effect models are frequently used to control for the nonindependence of data points, for example, when repeated measures from the same individuals are available. The aim of these models is often to estimate fixed effects and to test their significance. This is usually done by including random intercepts, that is, intercepts that are allowed to vary between individuals. The widespread belief is that this controls for all types of pseudoreplication within individuals. Here we show that this is not the case, if the aim is to estimate effects that vary within individuals and individuals differ in their response to these effects. In these cases, random intercept models give overconfident estimates leading to conclusions that are not supported by the data. By allowing individuals to differ in the slopes of their responses, it is possible to account for the nonindependence of data points that pseudoreplicate
Re: [R] Open multiple files using a loop
Hi Chris, May be this helps. #Suppose the working directory is `FirstLevel` D - dir(recursive=TRUE) D #[1] S1/S1data.txt S2/S2data.txt S3/S3data.txt sapply(D,function(x) nrow(read.table(x,sep=,header=TRUE))) #S1/S1data.txt S2/S2data.txt S3/S3data.txt # 20 20 20 res - do.call(rbind,lapply(D,function(x) read.table(x,sep=,header=TRUE))) dim(res) #[1] 60 2 A.K. Dear R/Arun I would like to open 50 text different files (S1data; S2data; S3data etc.) and rbind() them into a single data.frame or matrix. Is there a way doing this with a loop or in some other time-saving manner? `S1data` - read.table(~/fmridata/FirstLevel/S1/S1data, header=T, quote=\) `S2data` - read.table(~/fmridata/FirstLevel/S2/S2data, header=T, quote=\) `S3data` - read.table(~/fmridata/FirstLevel/S3/S3data, header=T, quote=\) etc… to S50 alldata - rbind(S1data, S2data, S3data etc… to 50) This type of idea (assuming each file has 10 rows (x50=500) and 25 columns): subjects - c(S1,S2,S3 etc… to S50) alldata - matrix(nrow = 500, ncol=25, byrow=TRUE) for(i in 1:50) { `subject[i]data` - read.table(~/fmridata/FirstLevel/(subject[i])/subject[i]data, header=T, quote=\) alldata[i,] - subject[i]data } Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with the nested anova formulas
Robert Lynch robert.b.lynch at gmail.com writes: I am modeling grade as a function of membership in various cohorts. There are four cohorts. (NONE, ISE07,ISE08,ISE09) and two times of cohorts coded as ISE = TRUE (ISE0#) or FALSE (NONE). There is clear co-linearity but that is to be expected. running the following code CutOff -0 fit.base - lme(fixed= zGrade ~ Rep + COHORT/ISE + P7APrior + Female + White + HSGPA + MATH + AP_TOTAL + Years + EOP + Course, random= ~1|SID, data = share[share$GRADE = CutOff,]) I get the following error Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 but if I take out the /ISE I get no error, simmilarly if I take out the COHORT/. I want to test for the effects of the different cohorts within the ISE subset and across ISE NONE I can send the data (the whole is too large) if you wish. Please send this to r-sig-mixed-mod...@r-project.org for more discussion. The short answer is that lme can't fit models with rank-deficient fixed effect model matrices -- in other words, there are redundant parameters in your model because COHORT and ISE between them use 6 parameters to model 4 independent quantities. http://stats.stackexchange.com/questions/35071/ what-is-rank-deficiency-and-how-to-deal-with-it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p value for mu: anova()
Rosario Garcia Gil M.Rosario.Garcia at slu.se writes: Hello I have run an anova analysis for the fallowing model: H_obs=mu+REGION+MANAGEMENT + e When I run it in ASRelm I get the p-value for mu, and, of course also for the two dependent variables (REGION and MANAGEMENT) When I run it in R, I do not get the pvalue for mu. Can some one help me to understand why? and if it is possible to estimate the pvalue for mu in anova() in R? You may be wondering why no-one has answered your question ... (1) it's way too vague and (2) the attached file probably got stripped by the mailing list software before anyone saw it. (Even if #2 weren't true, people are unlikely to take the time to answer a really vague question if it means digging into a data file to figure out what's going on.) See for example http://tinyurl.com/reproducible-000 What *exact* code are you running? an anova analysis is too vague. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
I understand that such for loops aren't 'best practice' in R and am trying to learn its approach. sapply() is an encapsulated loop and loops have their place in R. 'Best practice' is a nebulous term, but explicit loops can make code that is hard to understand (by a compiler or by a human) and any loop at the R-code level will generally make code run more slowly. However, depending on your background, explicit loops may be easier for you to write and understand, so you may get an answer faster by using loops. Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. Changing your 'if' statements to calls to the vectorized 'ifelse' will probably make looping unneeded. E.g., your q1.ans() only works on a scalar, forcing you to use sapply (or the superior vapply) to work on vectors: q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } as in q1.ans(1:3) [1] 1 Warning message: In if (x == 1) { : the condition has length 1 and only the first element will be used sapply(1:3, q1.ans) [1] 1 2 0 You can change it to work on a vector by using ifelse: q1a.ans - function(x) { ifelse(x==1, 1, # return 1's where x had 1's ifelse(x==2, 2, # return 2's where x had 2's 0)) # return 0 where x had something else } used as q1a.ans(1:3) [1] 1 2 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Walter Anderson [mailto:wandrso...@gmail.com] Sent: Friday, December 06, 2013 9:58 AM To: William Dunlap; r-help@r-project.org Subject: Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function On 12/06/2013 10:43 AM, William Dunlap wrote: I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this Read the documentation more closely: sapply(dataFrame, func) applies func() to each column, not row, of dataFrame. I misunderstood. I thought it was apply the func to each row... My mistake preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Furthermore, sapply(X = dataFrame, FUN = func, extraArgument) calls func(dataFrame[, i], extraArgument) for i in seq_len(ncol(dataFrame). One problem is that FUN=evaluate.questions takes 3 arguments and you give it only 2. Another problem is that the third argument you pass to sapply is a function (of 3 arguments) and FUN is not expecting any of its arguments to be functions. I will need to think about this, I am not sure I understand. I really don't seem to understand how any of the apply functions seem to work. It may be easier for you to not use sapply here, but to use for-loops and come up with something that works. (Write tests that will indicate whether it works or not in a variety of situations.) Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. I already have tested my functions by using a for loop, and they work. Here is the for loop I use. for (indx in 1:length(df$ID)) { df$Preference - evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx]) } I understand that such for loops aren't 'best practice' in R and am trying to learn its approach. Thank you for the suggestions! Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model selection with step()
Karen, Look at the help for the drop1() function. ?drop1 There you will see, The hierarchy is respected when considering terms to be added or dropped: all main effects contained in a second-order interaction must remain, and so on. So, for fit2, the step() function will only consider dropping a main effect (e.g., X3) if there are no interactions involving that effect in the model. That's why only after X1:X3 and X2:X3 are dropped, do you see X3 being considered for dropping in your example. Jean On Thu, Dec 5, 2013 at 11:27 PM, Karen Keating keati...@ksu.edu wrote: I am using the step() function to select a model using backward elimination, with AIC as the selection criterion. The full regression model contains three predictors, plus all the second order terms and two-way interactions. The full model is fit via lm() using two different model formulae. One formula uses explicitly defined variables for the second-order and interaction terms and the other formula uses the I(x^2) and colon operators. The fit generated by lm() is exactly the same for both models, but when I pass these fitted models to the step() function, I get two different results. Apparently, step() does not recognize the three main predictors unless the second order and interaction terms are explicitly defined as separate variables. I assigned this problem to my first-year graduate students, not realizing that R would give two different answers. Now I have to re-grade their homework, but I would really like to give them a reasonable explanation for the discrepancy. The complete code is given below. Could anyone shed some light on this mystery? Thanks in advance, Karen Keating Kansas State University # Exercise 9.13, Kutner, Nachtsheim, Neter Li temp- scan() 49.0 45.0 36.0 45.0 55.0 30.0 28.0 40.0 85.0 11.0 16.0 42.0 32.0 30.0 46.0 40.0 26.0 39.0 76.0 43.0 28.0 42.0 78.0 27.0 95.0 17.0 24.0 36.0 26.0 63.0 80.0 42.0 74.0 25.0 12.0 52.0 37.0 32.0 27.0 35.0 31.0 37.0 37.0 55.0 49.0 29.0 34.0 47.0 38.0 26.0 32.0 28.0 41.0 38.0 45.0 30.0 12.0 38.0 99.0 26.0 44.0 25.0 38.0 47.0 29.0 27.0 51.0 44.0 40.0 37.0 32.0 54.0 31.0 34.0 40.0 36.0 dat- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T) colnames(dat)-c('Y','X1','X2','X3') dat - data.frame(dat) attach(dat) # second order terms and interactions X12-X1*X2 X13-X1*X3 X23-X2*X3 X1sq - X1^2 X2sq - X2^2 X3sq - X3^2 fit1 - lm(Y~ X1sq + X2sq + X3sq +X1+X2+X3+ X12 + X13 + X23 ) fit2 - lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3) sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same dim(model.matrix(fit1)) # 19 x 10 dim(model.matrix(fit2)) # 19 x 10 dim(fit1$model) # 19 x 10 dim(fit2$model) # 19 x 7 -- could this cause the discrepancy? back1 - step(fit1,direction='backward') back2 - step(fit2,direction='backward') # Note that 'back1' considers the three primary predictors X1, X2 and X3, # while 'back2' does not. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quantiles with approximately the same number of data points within each quantile?
What is a good way to create quantiles with approximately the same number of data points within each quantile? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
Thanks again! Can the ifelse statement be nested like ifelse(condition1, ifelse(condition2,yes,no) ifelse(condition3,yes,no) ) ? On 12/06/2013 12:23 PM, William Dunlap wrote: I understand that such for loops aren't 'best practice' in R and am trying to learn its approach. sapply() is an encapsulated loop and loops have their place in R. 'Best practice' is a nebulous term, but explicit loops can make code that is hard to understand (by a compiler or by a human) and any loop at the R-code level will generally make code run more slowly. However, depending on your background, explicit loops may be easier for you to write and understand, so you may get an answer faster by using loops. Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. Changing your 'if' statements to calls to the vectorized 'ifelse' will probably make looping unneeded. E.g., your q1.ans() only works on a scalar, forcing you to use sapply (or the superior vapply) to work on vectors: q1.ans - function(x) { retVal = 0 if (x == 1) { retVal = 1 } else if (x ==2) { retVal = 2 } return (retVal) } as in q1.ans(1:3) [1] 1 Warning message: In if (x == 1) { : the condition has length 1 and only the first element will be used sapply(1:3, q1.ans) [1] 1 2 0 You can change it to work on a vector by using ifelse: q1a.ans - function(x) { ifelse(x==1, 1, # return 1's where x had 1's ifelse(x==2, 2, # return 2's where x had 2's 0)) # return 0 where x had something else } used as q1a.ans(1:3) [1] 1 2 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Walter Anderson [mailto:wandrso...@gmail.com] Sent: Friday, December 06, 2013 9:58 AM To: William Dunlap; r-help@r-project.org Subject: Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function On 12/06/2013 10:43 AM, William Dunlap wrote: I have been researching and it appears that I should be using the sapply function to apply the evaluate.question function above to each row in the data frame like this Read the documentation more closely: sapply(dataFrame, func) applies func() to each column, not row, of dataFrame. I misunderstood. I thought it was apply the func to each row... My mistake preferences - sapply(df, evaluate.questions, function(x,y,z) evaluate.questions(df['Q1'],df['Q2'],df['Q3'])) Furthermore, sapply(X = dataFrame, FUN = func, extraArgument) calls func(dataFrame[, i], extraArgument) for i in seq_len(ncol(dataFrame). One problem is that FUN=evaluate.questions takes 3 arguments and you give it only 2. Another problem is that the third argument you pass to sapply is a function (of 3 arguments) and FUN is not expecting any of its arguments to be functions. I will need to think about this, I am not sure I understand. I really don't seem to understand how any of the apply functions seem to work. It may be easier for you to not use sapply here, but to use for-loops and come up with something that works. (Write tests that will indicate whether it works or not in a variety of situations.) Then transform it to use things like ifelse() and sapply() to make it more readable and run faster. I already have tested my functions by using a for loop, and they work. Here is the for loop I use. for (indx in 1:length(df$ID)) { df$Preference - evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx]) } I understand that such for loops aren't 'best practice' in R and am trying to learn its approach. Thank you for the suggestions! Unfortunately this doesn't work and the problem appears that the sapply function is not feeding the parameters to the evaluate.questions function as I expect. Can someone provide some guidance on what I am doing wrong? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quantiles with approximately the same number of data points within each quantile?
By default I believe. See http://en.wikipedia.org/wiki/Quantile Others more erudite may correct me. On Dec 6, 2013, at 11:47 AM, Anika Masters anika.mast...@gmail.com wrote: What is a good way to create quantiles with approximately the same number of data points within each quantile? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Don McKenzie Research Ecologist Pacific Wildland Fire Science Lab US Forest Service Affiliate Professor School of Environmental and Forest Sciences University of Washington d...@uw.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quantiles with approximately the same number of data points within each quantile?
Hello, Use function ?quantile. See this example, each group has exactly, not approximately, 25 elements. x - rnorm(100) qnt - quantile(x) tapply(x, findInterval(x, qnt, rightmost.closed = TRUE), length) Hope this helps, Rui Barradas Em 06-12-2013 19:47, Anika Masters escreveu: What is a good way to create quantiles with approximately the same number of data points within each quantile? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate a smoothed surface for a three dimensional dataset?
On Dec 5, 2013, at 9:46 PM, A Xi Ma wrote: The following question is inspired by Jun's problem, which resembles some of my own problems, but goes off on a tangent about applying plot3D from Karline Soetart. On Thu, Dec 5, 2013 at 11:52 PM, Bert Gunter gunter.ber...@gene.com wrote: Your comment that: I can see the critical point here is to find a right function to make the prediction. is what indicates to me that your critical point is that you have insufficient knowledge and need help. Feel free to disagree, of course. I don't know if it's true for Jun, but it's definitely true for me - I have insufficient knowledge! I'm out of my depth with surface estimation, but I have to learn how to do it, one way or the other. Currently I'm reading the docs for plot3d. I loaded the package into rstudio and ran some of the examples. The image2D example seems to get its data from a data.frame called volcano with a small v. Right. the 'volcano'-object is a standard data object for demonstration of R graphics. It resides in the datasets package and has a help file: help(volcano) imag2D nr - nrow(volcano) imag2D nc - ncol(volcano) imag2D image2D(volcano, x = 1:nr, y = 1:nc, lighting = TRUE, imag2D+main = volcano, clab = height, m) The objects() command shows a Volcano with a big V. The small-v and big-V volcanoes are not the same, because the str command shows: snipped superfluous output from an objects()-command. str(Volcano) num [1:29, 1:21] 100 103 105 108 110 116 120 122 123 118 ... str(volcano) num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ... They are both matrices. The Volcano matrix has only one-ninth the number of values. The first small section of the volcano vignette reads: 1. Intro To make this vignette smaller, the size of volcano is reduced: # Reduce the resolution Volcano - volcano[seq(1, nrow(volcano), by = 3), seq(1, ncol(volcano), by = 3)] - So that code just selects every third of the values of the 'volcano' matrix. I don't understand how the volcano object works well enough to power the image2D command, but doesn't show up in objects(). It is accessible by functions although it is not visible in the workspace. str(volcano) num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ... 'volcano' %in% ls() [1] FALSE If you want to get it into the workspace, you just use the data() function: data('volcano') 'volcano' %in% ls() [1] TRUE# now visible At first I thought there was some kind of secret smuggling compartment in memory space, and nr and nc and volcano were all hidden in that secret place. But in fact, nr and nc show up in objects(). So ... I am even less educated than the other newbies on the list, and I'm following along, and I really don't see how R is doing what it's doing. Should I be reading the plot3D .pdf textbooks, or should I give up and go back to some much more basic textbook? I'm thinking you are not yet ready for plot3D. It's unclear what level of effort you have put in to reading and mastering the Introduction to R or whatever text you are using to educate yourself. I certainly do not think that a beginning tutorial in R was the goal that the authors of the plot3D package had in mind. Even before posting to Rhelp you are expected to have studied the available documentation and learned enough R to be able to answer all the questions you posed. So I suggest studying your copy of Introduction to R that is shipped with every binary of R. Thanks. [[alternative HTML version deleted]] And you should learn to post in plain text. Please do read the Posting Guide. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] rms 4.1-0
The rms package has had several updates in version 4.1-0: * Fixed orm.fit to not create penalty matrix if not needed (penalties are not yet implemented anyway) * Added yscale argument to plot.Predict * Added Wald test simulation to orm help file * Added example in help file for plot.anova.rms of adding a line combining the effects of two predictors in dot chart * Fixed grid interpretation error in survplot.survfit * Changed plot.anova.rms to use dotchart3 instead of dotchart2 * Fixed bug in summary.rms - was taking reciprocal of effect ratio with orm even if not loglog family (thanks: Yong Hao Pua puayong...@gmail.com * Removed link to print.lm, summary.lm in ols.Rd * Added ntrans argument to plot.anova.rms * Fixed handling of intercepts in Rq, validate.Rq * Removed residuals.Glm, residuals.rms (also from Rd, NAMESPACE) * Removed other .rms methods and other remnants from fooling S+ dispatcher * Fixed bug in lm.pfit when penalty used (thanks: Yong Hao Pua puayong...@gmail.com) * Fixed bug in calibrate.default for ols (thanks: Andy Bush) * Change print.contrast.rms to insert NA for SE if fun is not the identity function * Added margin argument to plot.anova.rms to print selected stats in right margin of dot chart * Added anova argument to plot.Predict to allow overall association test statistics to be added to panels * Fixed bug in val.prob in which the logistic model was re-fitted instead of fixing coefficients at 0,1. This resulted in model statistics (including c-index) to always be favorable even when predictions were worse than change. Thanks: Kirsen Van Hoorde kirsten.vanhoo...@esat.kuleuven.be * Fixed bug in survdiffplot where conf.int was always overridden by value from survfit. Thanks: Kamil Fijorek kamilfijo...@gmail.com * Fixed bug in grid= for survplot.* and survdiffplot. Thanks: Kamil Fijorek * Fixed rms.s to account for possible offset in names(nmiss). Thanks: Larry Hunsicker * Fixed psm.s to not compute Dxy if simple right censoring is not in effect. Thanks: I.M. Nolte * rcs: respect system option fractied, passed to rcspline.eval; can be used to get old behavior * Gls: as nlme 3.1-113 exports more functions, removed nlme::: -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R survAUC Package
On Dec 4, 2013, at 7:49 AM, kevinod wrote: I have a concern about the survAUC package option AUC.cd. So shouldn't you be sending this to the package authors? They may or may not be regular readers of R help. It's apackage I have never heard of. I am exploring package functionality, specifically AUC statistics for Cox Regression, for a small academic project When utilizing this package on the ovarian data set within that package I obtain an AUC statistic of 0.3322928. When AUC calculations use a dichotomous outcome such as this, see included R code, the result should lie between 0.5 and 1, not 0.33. Please explain this, I am not certain that the algorithm that is being utilized for this package is correct. Thank you Kevin O’Donnell, MS Work Environment, MS Env Eng., MS Const Project Mgmt Graduate Student Department of Biostatistics Boston University School of Public Health 715 Albany Street Boston, MA 617-480-1677 x11(h=8,w=11) fit = survfit(Surv(futime,fustat) ~ rx) plot(fit, mark.time=FALSE, xscale=365.25,main=Plot of Survival Curves by Prescription Status, xlab='Length of Survival', ylab='Proportion of Individuals who have Survived') lines(fit[1], lwd=3,lty=2:3, xscale=365.24,col=2) lines(fit[2], lwd=2,lty=2:2, xscale=365.24,col=3) legend(.2,.2, c(No treatment, treatment), lwd=3,lty = 2:3) TR2 = ovarian[1:16,] TE2 = ovarian[17:26,] train.fit2 = coxph(Surv(futime, fustat) ~ rx, x=TRUE, y=TRUE, method=efron, data=TR) lp2 = predict(train.fit) lpnew2 = predict(train.fit2, newdata=TE2) Surv.rsp2 = Surv(TR2$futime, TR2$fustat) Surv.rsp.new2 = Surv(TE2$futime, TE2$fustat) times2 = seq(10, 1000, 10) AUC_CD2 = AUC.cd(Surv.rsp2, Surv.rsp.new2, lp2, lpnew2, times2) AUC_hc2 = AUC.hc(Surv.rsp2, Surv.rsp.new2, lpnew2, times2) AUC_sh2 = AUC.sh(Surv.rsp2, Surv.rsp.new2, lp2, lpnew2, times2) AUC_Uno2 = AUC.uno(Surv.rsp2, Surv.rsp.new2, lpnew2, times2) -- View this message in context: http://r.789695.n4.nabble.com/R-survAUC-Package-tp4681638.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using assign with mapply
I have a data frame whose first colum contains the names of the variables and whose second colum contains the values to assign to them: : kkk - data.frame(vars=c(var1, var2, var3), vals=c(10, 20, 30), stringsAsFactors=F) If I do : assign(kkk$vars[1], kkk$vals[1]) it works : var1 [1] 10 However, if I try with mapply this is what I get: : mapply(assign, kkk$vars, kkk$vals) var1 var2 var3 10 20 30 : var2 Error: object 'var2' not found Maybe I have not undestand how mapply and assign work. Do you have any comments? Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wrong date fromat?
Hi, Try vec1 - 10958:10963 as.Date(vec1,origin=1960-01-01) #[1] 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05 #[6] 1990-01-06 A.K. I have imported a stata data into R and wanted to convert the date. The format went OK, but the output doesn't represent my data. The head of the imported data is this one head(df$date) [1] 10958 10959 10960 10961 10962 10963 I tried to convert the date using the zoo package: library(zoo) df$date-as.Date(df$date) head(df$date) head(df$date) [1] 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06 2000-01-07 However my date starts with January 1, 1990 and the converted data starts from January 2, 2000. What have I done wrong? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using assign with mapply
On Dec 6, 2013, at 11:27 AM, Julio Sergio Santana wrote: I have a data frame whose first colum contains the names of the variables and whose second colum contains the values to assign to them: : kkk - data.frame(vars=c(var1, var2, var3), vals=c(10, 20, 30), stringsAsFactors=F) If I do : assign(kkk$vars[1], kkk$vals[1]) it works : var1 [1] 10 However, if I try with mapply this is what I get: : mapply(assign, kkk$vars, kkk$vals) var1 var2 var3 10 20 30 : var2 Error: object 'var2' not found Maybe I have not undestand how mapply and assign work. Do you have any comments? I think you will find that the value returned from the mapply call was a three element list with the desired names and values ... except you then gave that enclosing list no name and it will be garbage-collected. If you want to have 'assign' do its magic into the global environment, then you need to supply 'mapply' a MoreArgs argument on the other side of the ellipsis: Usage: mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) So what happens if you try this: mapply(assign, kkk$vars, kkk$vals, MoreArgs = list(envir = .GlobalEnv) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wrong date fromat?
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Friday, December 06, 2013 3:11 PM To: R help Subject: Re: [R] Wrong date fromat? Hi, Try vec1 - 10958:10963 as.Date(vec1,origin=1960-01-01) #[1] 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05 #[6] 1990-01-06 A.K. I have imported a stata data into R and wanted to convert the date. The format went OK, but the output doesn't represent my data. The head of the imported data is this one head(df$date) [1] 10958 10959 10960 10961 10962 10963 I tried to convert the date using the zoo package: library(zoo) df$date-as.Date(df$date) head(df$date) head(df$date) [1] 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06 2000-01-07 However my date starts with January 1, 1990 and the converted data starts from January 2, 2000. What have I done wrong? You need to specify an appropriate value for the origin parameter. It looks like as.Date in the zoo package (which masks the as.Date in base) defaults to the Unix epoch value, origin='1970-01-01'. Your Stata values are based on origin='1960-01-01' as your first example specified. Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans clustering on large but sparse matrix
Hi Lishu, I run into the similar large-scale problems recently. I used a parallel SGD k-means described in this paper for my problem: http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf Let n be the samples, k be the number of clusters, and m be the number of nodes, 1. First, each node reads n / m sample data, and randomly generate enough 'mini batches' (size of mini-batch and SGD iterations must be determined beforehand) 2. Sample k / m centers from the samples on each node 3. Update the centers, by using the mini-batches generated at the first step. Note that at this stage it is not necessary to hold the sample data on each node. 4. Once the centers are optimized by SGD, compute the distance matrix between samples and centers. I used spherical k-means so this step can be divided into a series of block matrix multiplication to save memory. Note that each node only needs to hold partial sample data and partial centers, so this method can work on 'regular' MPI environment and do not need the shared memory architecture. I used pbdMPI to parallelize the algorithm. hope this helps. Wuming On Wed, Jan 18, 2012 at 3:37 PM, Lishu Liu lishu...@gmail.com wrote: Hi, I have a 60k*600k matrix, which exceed the vector length limit of 2^32-1. But it's rather sparse, only 0.02% has value. So I save is as MarketMatrix (mm) file, it's about 300M in size. I use readMM in Matrix package to read it in. If do so, the data type becomes dgTMatrix in 'Matrix' package instead of the common matrix type. The problem is, if I run k-means only on part of the data, to make sure the vector length do not exceed 2^32-1, there's no problem at all. Meaning that the kmeans in R could recognize this type of matrix. If I run the entire matrix, R says too many elements specified. I have considered the 'bigmemory' and 'biganalytics' packages. But to save the sparse matrix as common CSV file would take approx 70G and 99% being 0. I just don't think it's necessary or efficient to treat it as a dense matrix. It there anyway to deal with the vector length limit? Can I split the whole matrix into small ones and then do k-means? Thanks, Lishu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tune an support vector machine
Hi Uwe, It looks SVM in e1071 and Kernlab does not support feature selection, but you can take a look at package penalizedSVM ( http://cran.r-project.org/web/packages/penalizedSVM/penalizedSVM.pdf). Or you can implement a SVM-RFE ( http://axon.cs.byu.edu/Dan/778/papers/Feature%20Selection/guyon*.pdf) by the alpha values returned by svm() in e1071 or ksvm() in Kernlab. Wuming On Fri, Dec 6, 2013 at 7:06 AM, Uwe Bohne balu...@gmx.de wrote: Hej all, actually i try to tune a SVM in R and use the package e1071 wich works pretty well. I do some gridsearch in the parameters and get the best possible parameters for classification. Here is my sample code type-sample(c(-1,1) , 20, replace = TRUE ) weight-sample(c(20:50),20, replace=TRUE) height-sample(c(100:200),20, replace=TRUE) width-sample(c(30:50),20,replace=TRUE) volume-sample(c(1000:5000),20,replace=TRUE) data-cbind(type,weight,height,width,volume) train-as.data.frame(data) library(e1071) features - c(weight,height,width,volume) (formula-as.formula(paste(type ~ , paste(features, collapse= + svmtune=tune.svm(formula, data=train, kernel=radial, cost=2^(-2:5), gamma=2^(-2:1),cross=10) summary(svmtune) My question is if there is a way to tune the features. So in other words - what i wanna do is to try all possible combinations of features : for example use only (volume) or use (weight, height) or use (height,volume,width) and so on for the SVM and to get the best combination back. Best wishes Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.