Re: [R] Error on easy way for JoSAE Package
Thank you very much. After using dput and the easy way ( result - eblup.mse.f.wrap(domain.data = amigo, lme.obj = fit.lme)), i have got the following error: Error in `[.data.frame`(sample.data, , variabs) : undefined columns selected What should I do? -- View this message in context: http://r.789695.n4.nabble.com/Error-on-easy-way-for-JoSAE-Package-tp4625684p4630220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] code to iterate function apply to matrix
I got this code below and i want to repeat the loop for 100 times.. x-rnorm(60) mat1-matrix(x,nrow=15,ncol=4) trim-numeric(ncol(mat1)) win-numeric(ncol(mat1)) ssd-numeric(ncol(mat1)) for(j in 1:ncol(mat1)) { n=length(mat1[,j]) alpha=0.1 k=floor(alpha*n)+1 r=k-(alpha*n) i=k+1 m=n-k y1-sort(mat1[,j]) y-y1[i:m] x.low=(1-r)*y1[k+1]+r*y1[k] x.upp=(1-r)*y1[n-k]+r*y1[n-k+1] trim[j] =1/((1-2*alpha)*n)*(sum(y)+r*(y1[k]+y1[n-k+1])) win[j]=1/n*(sum(y)+k*(x.low+x.upp)) ssd[j]-sum((y-win[j])**2)+k*( (y1[k+1]-win[j])**2 + (y1[n-k]-win[j])**2 ) } trim.mean-matrix(trim, nrow=1) win.mean-matrix(win, nrow=1) sum.sq.dev-matrix(ssd, nrow=1) -- View this message in context: http://r.789695.n4.nabble.com/code-to-iterate-function-apply-to-matrix-tp4630221.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Analysis or Anova?
Hello Andrea, I don't know if I can help you (probably not, I'm a beginner myself), but you that you should make it a lot easier for those that can if you post a self-contained script in this forum that shows what you're trying to do. Use dput() to dump your dataset in text form. Good luck, robert On Tue, May 15, 2012 at 10:49 PM, Andrea Sica aerdna.s...@gmail.com wrote: Dear all, I hope to be the clearest I can. Let's say I have a dataset with 10 variables, where 4 of them represent for me a certain phenomenon that I call Y. The other 6 represent for me another phenomenon that I call X. Each one of those variables (10) contains 37 units. Those units are just the respondents of my analysis (a survey). Since all the questions are based on a Likert scale, they are qualitative variables. The scale is from 0 to 7 for all of them, but there are -1 and -2 values where the answer is missing. Hence the scale goes actually from -2 to 7. What I want to do is to calculate the regression between my Y (which contains 4 variables in this case and 37 answers for each variable) and my X (which contains 6 variables instead and the same number of respondents). I know that for qualitative analyses I should use Anova instead of the regression, although I have read somewhere that it is even possible to make the regression. Until now I have tried to act this way: __ apply(Y, 1, function(Y) mean(Y[Y0])) #calculate the average per rows (respondents) without considering the negative values Y.reg- c(apply(Y, 1, function(Y) mean(Y[Y0]))) #create the vector Y, thus it results like 1 variable with 37 numbers apply(X, 1, function(X) mean(X[X0])) X.reg- c(apply(X, 1, function(X) mean(X[X0]))) #create the vector X, thus it results like 1 variable with 37 numbers reg1- lm(Y.reg~ X.reg) #make the first regression summary(reg1) #see the results Call: lm(formula = Y.reg ~ X.reg) Residuals: Min 1Q Median 3Q Max -2.26183 -0.49434 -0.02658 0.37260 2.08899 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.2577 0.4986 8.539 4.46e-10 *** X.reg 0.1008 0.1282 0.786 0.437 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7827 on 35 degrees of freedom Multiple R-squared: 0.01736, Adjusted R-squared: -0.01072 F-statistic: 0.6182 on 1 and 35 DF, p-value: 0.437 layout(matrix(1:4,2,2)) #graphical approach plot(reg1) please see the pfd() function attached. But as you can see, although I do not use Y as composed by 4 variables and X by 6, and I do not consider the negative values too, I get a very low score as my R^2. If I act with anova instead I have this problem: Ymatrix- as.matrix(Y) Xmatrix- as.matrix(X) #where both this Y and X are in their first form, thus composed by more variables (4 and 6) and with #negative values as well. Errore in UseMethod(anova) : no applicable method for 'anova' applied to an object of class c('matrix', 'integer', 'numeric') To be honest, a few days ago I succeeded in using anova, but unfortunately I do not remember how and I did not save the command anywhere. What I would like to know is: - First of all, am I wrong in how I approach to my problem? - What do you think about the regression output? - Finally, how can I do to make the anova? If I have to do it. I really hope I have been clear. Thank you all for any kind of help. Best, Andrea [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] order a data frame by date with order
Hi I have a rather large data frame (7000 rows with 28 columns) which I want to sort by date. Below I have a example of the data frame. The Date column is called DT, is a factor and looks like this: class(res.merge$DT) [1] factor head(res.merge$DT) [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012 15:01:15 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50 res.merge is the data frame unordered. Now I want to order the data frame with: res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y %H:%M:%S)),] This works in fact, however for some reason there are always two entires that go at the end of the data frame for no obvious reason (see below, 09.05.2012 ist the most recent date). And this is the case for different data.frames. The two entries at the end are always 25.3.2012 02:00:xx and 25.3.2012 02.30.xx. Can anybody tell me what the problem is? Any help is most appreciated. Best Benedikt res.ordered[2545:2549,] DT Typ NOD Day_s DOW_s Time_s Long Lat 2547 9.5.2012 14:30:56 GPS 1893 9.5.2012We 14:30:00 7.452218 46.43579 2548 9.5.2012 15:02:09 GPS 1893 9.5.2012We 15:00:35 7.451983 46.43583 2549 9.5.2012 15:30:50 GPS 1893 9.5.2012We 15:30:00 7.451973 46.43597 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e Time_e BV Temp SOG 2547 1182.8 3 A 1 143 55 9.5.2012We 14:30:56 3735 31 0.09 2548 1182.8 3 A 1 143 94 9.5.2012We 15:02:09 3637 32 0.02 2549 1176.5 3 A 1 143 50 9.5.2012We 15:30:50 3730 29 0.17 1845 1295.2 3 A 1 151 17 25.3.2012So 02:00:18 37157 0.18 1846 1287.3 3 A 1 144 16 25.3.2012So 02:30:16 37208 0.14 Heading SAE HAE BW_2 BW_3 X.. 2547 24.90 3.81 9.47 3666 3625 9.08 25487.86 0.51 7.17 3593 3586 9.11 2549 344.72 2.86 4.10 3662 3623 9.12 1845 335.54 3.53 5.63 3618 3618 0.81 1846 75.37 5.44 8.96 3618 3618 0.81 -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help needed for efficient way to loop through rows and columns
Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called sample: names - c(S1, S2, S3, S4) X - c(BB, AB, AB, AA) Y - c(BB, BB, AB, AA) Z - c(BB, BB, AB, NA) AorB - c(A, A, A, B) sample - data.frame(names, X, Y, Z, AorB) for a given row, if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2 I've been trying to write this using apply and ifelse statements in hopes that my code runs quickly, but I'm afraid I've make a big mess. See below: apply(sample, 1, function(i) { ifelse(sample$AorB[i] == A, (ifelse(sample[i,] == AA, sample[i,] - 2 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 0, sample[i,] - NA )) ) ) , ifelse(sample$AorB[i,] == B), (ifelse(sample[i,] == AA, sample[i,] - 0 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 2, sample[i,] - NA) }) Any Advice? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wrong Q3 + Mean.
Hi. a [1] 13 13 14 14 15 15 16 20 21 26 summary(a) Min. 1st Qu. MedianMean 3rd Qu.Max. 13.014.015.016.719.026.0 mean(a) [1] 16.7 quantile(a) 0% 25% 50% 75% 100% 13 14 15 19 26 Clearly, this is not right. My Instructor and I have no idea why the program does that. I removed the program from the computer , installed it again and it still shows the mistake. It is also strange, that I chose english as installlanguage, but the program is in german (my OS is in german). Pls help, because otherwise i cannot solve any problems with R. Using Win7 and R version 2.15.0 (2012-03-30). Retep -- View this message in context: http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wrong Q3 + Mean.
On Wed, May 16, 2012 at 12:22 AM, Retep32 retepdel...@web.de wrote: Hi. a [1] 13 13 14 14 15 15 16 20 21 26 summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max. 13.0 14.0 15.0 16.7 19.0 26.0 mean(a) [1] 16.7 quantile(a) 0% 25% 50% 75% 100% 13 14 15 19 26 Clearly, this is not right. My Instructor and I have no idea why the program Really? It is not at all clear to me what makes this not right. Have you tried looking at the documentation for quantile? (which you can access by typing ?quantile or help(quantile) ) There are multiple algorithms to calculate quantiles which in practice often yield quite similar results, but, particularly for very small datasets such as are common for class exercises, and a few other cases do behave rather differently. You can caompare the 9 varieties by running this: sapply(1:9, function(i) quantile(a, type = i)) which for me yields: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][,9] 0% 13 13 13 13.0 13 13.00 13 13.0 13. 25%14 14 13 13.5 14 13.75 14 13.91667 13.9375 50%15 15 15 15.0 15 15.00 15 15.0 15. 75%20 20 20 18.0 20 20.25 19 20.08333 20.0625 100% 26 26 26 26.0 26 26.00 26 26.0 26. Perhaps one of those is what you are looking for (rows are quantiles, each column uses a different algorithm, types 1 through 9, respectively). Hope this helps, Josh does that. I removed the program from the computer , installed it again and it still shows the mistake. It is also strange, that I chose english as installlanguage, but the program is in german (my OS is in german). Pls help, because otherwise i cannot solve any problems with R. Using Win7 and R version 2.15.0 (2012-03-30). Retep -- View this message in context: http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order a data frame by date with orderl
Is the a daylight saving time problem? Check your timezone and see when it occurred; these times might not be legal. Sent from my iPad On May 16, 2012, at 3:27, Benedikt Gehr benedikt.g...@ieu.uzh.ch wrote: Hi I have a rather large data frame (7000 rows with 28 columns) which I want to sort by date. Below I have a example of the data frame. The Date column is called DT, is a factor and looks like this: class(res.merge$DT) [1] factor head(res.merge$DT) [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012 15:01:15 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50 res.merge is the data frame unordered. Now I want to order the data frame with: res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y %H:%M:%S)),] This works in fact, however for some reason there are always two entires that go at the end of the data frame for no obvious reason (see below, 09.05.2012 ist the most recent date). And this is the case for different data.frames. The two entries at the end are always 25.3.2012 02:00:xx and 25.3.2012 02.30.xx. Can anybody tell me what the problem is? Any help is most appreciated. Best Benedikt res.ordered[2545:2549,] DT Typ NOD Day_s DOW_s Time_s Long Lat 2547 9.5.2012 14:30:56 GPS 1893 9.5.2012We 14:30:00 7.452218 46.43579 2548 9.5.2012 15:02:09 GPS 1893 9.5.2012We 15:00:35 7.451983 46.43583 2549 9.5.2012 15:30:50 GPS 1893 9.5.2012We 15:30:00 7.451973 46.43597 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e Time_e BV Temp SOG 2547 1182.8 3 A 1 143 55 9.5.2012We 14:30:56 3735 31 0.09 2548 1182.8 3 A 1 143 94 9.5.2012We 15:02:09 3637 32 0.02 2549 1176.5 3 A 1 143 50 9.5.2012We 15:30:50 3730 29 0.17 1845 1295.2 3 A 1 151 17 25.3.2012So 02:00:18 37157 0.18 1846 1287.3 3 A 1 144 16 25.3.2012So 02:30:16 37208 0.14 Heading SAE HAE BW_2 BW_3 X.. 2547 24.90 3.81 9.47 3666 3625 9.08 25487.86 0.51 7.17 3593 3586 9.11 2549 344.72 2.86 4.10 3662 3623 9.12 1845 335.54 3.53 5.63 3618 3618 0.81 1846 75.37 5.44 8.96 3618 3618 0.81 -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change the order of variables in a linear model
Hello, the following lines m - matrix(c(1,1,9,1,2,6,1,3,7,2,1,4,2,2,5,2,3,1,3,1,2,3,2,-1,3,3,-2), 9, 3, byrow = TRUE, dimnames=list(NULL, cbind('A','B','Y'))) md - as.data.frame(m) md$A - as.factor(md$A) md$B - as.factor(md$B) mm - model.matrix(Y~A+B+A:B, data=md) produce mm (Intercept) A2 A3 B2 B3 A2:B2 A3:B2 A2:B3 A3:B3 1 1 0 0 0 0 0 0 0 0 2 1 0 0 1 0 0 0 0 0 3 1 0 0 0 1 0 0 0 0 4 1 1 0 0 0 0 0 0 0 5 1 1 0 1 0 1 0 0 0 6 1 1 0 0 1 0 0 1 0 7 1 0 1 0 0 0 0 0 0 8 1 0 1 1 0 0 1 0 0 9 1 0 1 0 1 0 0 0 1 attr(,assign) [1] 0 1 1 2 2 3 3 3 3 attr(,contrasts) attr(,contrasts)$A [1] contr.treatment attr(,contrasts)$B [1] contr.treatment However, instead of the order (Intercept) A2 A3 B2 B3 A2:B2 A3:B2 A2:B3 A3:B3 | | changed order i'd like to have | | (Intercept) A2 A3 B2 B3 A2:B2 A2:B3 A3:B2 A3:B3 that is, the order of the A:B interaction variables is changed. Is there a way to freely position variables in a model? Thank you, Frank -- View this message in context: http://r.789695.n4.nabble.com/Change-the-order-of-variables-in-a-linear-model-tp4630230.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation among variables in the same subset
Dear all, I have created a subset from my dataset, which contains 6 variables. I need to make the correlation among all of them, possibly, without making it one by one. Is there any command that can permits me to do it directly for all of them in the same time? Thank you so much in advance. Andrea [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wrong Q3 + Mean.
At 08:22 16/05/2012, Retep32 wrote: Hi. a [1] 13 13 14 14 15 15 16 20 21 26 summary(a) Min. 1st Qu. MedianMean 3rd Qu.Max. 13.014.015.016.719.026.0 mean(a) [1] 16.7 quantile(a) 0% 25% 50% 75% 100% 13 14 15 19 26 Clearly, this is not right. My Instructor and I have no idea why the program does that. If you have no idea why R does something you could try reading the documentation which tells you in some detail (in this case) what R is doing. ?quantile I removed the program from the computer , installed it again and it still shows the mistake. It is also strange, that I chose english as installlanguage, but the program is in german (my OS is in german). It used English during installation though, right? So it did what you asked. Pls help, because otherwise i cannot solve any problems with R. Using Win7 and R version 2.15.0 (2012-03-30). Retep -- View this message in context: http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html Sent from the R help mailing list archive at Nabble.com. Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tm package: problem of TermDocumentMatrix and minWordLength
Dear All, The following code illustrate the problem. [R code] require(tm) exampledoc - c(R is good, R is really good) examplecorpus - Corpus(VectorSource(exampledoc), encoding = UTF-8) dtm - DocumentTermMatrix(examplecorpus, control = list(minWordLength = 1)) as.matrix(dtm) [/R code] The term R and is were not included in the dtm even the control parameter minWordLength was set to 1. Terms Docs good really 11 0 21 1 Would you reproduce this problem? The following is my sessionInfo sessionInfo() R version 2.15.0 (2012-03-30) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tm_0.5-7.1 loaded via a namespace (and not attached): [1] compiler_2.15.0 slam_0.1-23 tools_2.15.0 Regards, CH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence intervals for nls or nls2 model
On Tue, May 15, 2012 at 11:20 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Tue, May 15, 2012 at 8:08 PM, Francisco Mora Ardila fm...@oikos.unam.mx wrote: Hi all I have fitted a model usinf nls function to these data: x [1] 1 0 0 4 3 5 12 10 12 100 100 100 y [1] 1.281055090 1.563609934 0.001570796 2.291579783 0.841891853 [6] 6.553951324 14.243274230 14.519899320 15.066473610 21.728809880 [11] 18.553054450 23.722637370 The model fitted is: modellogis-nls(y~SSlogis(x,a,b,c)) It runs OK. Then I calculate confidence intervals for the actual data using: dataci-predict(as.lm(modellogis), interval = confidence) BUt I don´t get smooth curves when plotting it, so I want to get other confidence vectors based on a new x vector by defining a new data to do predictions: x0 - seq(0,15,1) dataci-predict(as.lm(modellogis), newdata=data.frame(x=x0), interval = confidence) BUt it does not work: I get the same initial confidence interval Any ideas on how to get tconfidence and prediction intervals using new X data on a previous model? as.lm is a linear model between the response variable and the gradient of the nonlinear model and as we see below x is not part of that linear model so x can't be in newdata when predicting from the tangent model. We can only make predictions at the original x points. For other x's we could use Interpolation. See ?approx (?spline can also work in smooth cases but in the example provided the function has a kink and that won't work well with splines.) as.lm(modellogis)$model y a b c (offset) 1 1.281055090 0.06601796 -4.411829e-01 1.168928e+00 1.397153 2 1.563609934 0.04798815 -3.268846e-01 9.766080e-01 1.015584 3 0.001570796 0.04798815 -3.268846e-01 9.766080e-01 1.015584 4 2.291579783 0.16311227 -9.767241e-01 1.597189e+00 3.451981 5 0.841891853 0.12203013 -7.665928e-01 1.512752e+00 2.582551 6 6.553951324 0.21464369 -1.206154e+00 1.564573e+00 4.542552 7 14.243274230 0.74450055 -1.361047e+00 -1.455630e+00 15.756031 8 14.519899320 0.59707858 -1.721353e+00 -6.770205e-01 12.636107 9 15.066473610 0.74450055 -1.361047e+00 -1.455630e+00 15.756031 10 21.728809880 1. -2.943955e-13 -9.073765e-12 21.163223 11 18.553054450 1. -2.943955e-13 -9.073765e-12 21.163223 12 23.722637370 1. -2.943955e-13 -9.073765e-12 21.163223 I have added a FAQ to the home page since this isn't the first time this question has come up: http://nls2.googlecode.com#FAQs -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splus equivalent of reshape in R
On 05/16/2012 01:18 PM, Santosh wrote: Hello R/Splus users.. I am posting in R discussion group in hope of wider response compared to what I received from Splus user groups Was wondering if there is any function available in Splus 8.2 that is equivalent to reshape of R? Below is a sample dataset. Size [both rows and columns) of the dataset may vary... Hi Santosh, You may be able to use the code in the function rep_n_stack in the prettyR package in S-PLUS. It does what you want, and since it is written in R source code, it may run in S-PLUS. Just extract the code and source it into S-PLUS. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order a data frame by date with orderl
Hi, many thanks for your answer. if i set tz=GMT it does the job. Great! thanks cheers Benedikt Am 16.05.2012 12:20, schrieb jholtman [via R]: Is the a daylight saving time problem? Check your timezone and see when it occurred; these times might not be legal. Sent from my iPad On May 16, 2012, at 3:27, Benedikt Gehr [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=0 wrote: Hi I have a rather large data frame (7000 rows with 28 columns) which I want to sort by date. Below I have a example of the data frame. The Date column is called DT, is a factor and looks like this: class(res.merge$DT) [1] factor head(res.merge$DT) [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012 15:01:15 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50 res.merge is the data frame unordered. Now I want to order the data frame with: res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y %H:%M:%S)),] This works in fact, however for some reason there are always two entires that go at the end of the data frame for no obvious reason (see below, 09.05.2012 ist the most recent date). And this is the case for different data.frames. The two entries at the end are always 25.3.2012 02:00:xx and 25.3.2012 02.30.xx. Can anybody tell me what the problem is? Any help is most appreciated. Best Benedikt res.ordered[2545:2549,] DT Typ NOD Day_s DOW_s Time_s Long Lat 2547 9.5.2012 14:30:56 GPS 1893 9.5.2012We 14:30:00 7.452218 46.43579 2548 9.5.2012 15:02:09 GPS 1893 9.5.2012We 15:00:35 7.451983 46.43583 2549 9.5.2012 15:30:50 GPS 1893 9.5.2012We 15:30:00 7.451973 46.43597 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e Time_e BV Temp SOG 2547 1182.8 3 A 1 143 55 9.5.2012We 14:30:56 3735 31 0.09 2548 1182.8 3 A 1 143 94 9.5.2012We 15:02:09 3637 32 0.02 2549 1176.5 3 A 1 143 50 9.5.2012We 15:30:50 3730 29 0.17 1845 1295.2 3 A 1 151 17 25.3.2012So 02:00:18 37157 0.18 1846 1287.3 3 A 1 144 16 25.3.2012So 02:30:16 37208 0.14 Heading SAE HAE BW_2 BW_3 X.. 2547 24.90 3.81 9.47 3666 3625 9.08 25487.86 0.51 7.17 3593 3586 9.11 2549 344.72 2.86 4.10 3662 3623 9.12 1845 335.54 3.53 5.63 3618 3618 0.81 1846 75.37 5.44 8.96 3618 3618 0.81 -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630229.html To unsubscribe from order a data frame by date with order, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4630225code=YmVuZWRpa3QuZ2VockBpZXUudXpoLmNofDQ2MzAyMjV8LTc4NzA5MjQxMQ==. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Benedikt Gehr Ph.D. Student Institute of Evolutionary Biology and Environmental Studies University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Office 13 J 36b Phone: +41 (0)44 635 49 72 http://www.ieu.uzh.ch/staff/phd/gehr.html -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630237.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] How to sum and group data by DATE in data frame
Michael Weylandt wrote Can you provide a reproducible example? Of course, Michael. Consider the following time series: 11/2/2011 14:30 123.53 11/2/2011 15:00 123.78 11/2/2011 15:30 124.24 11/2/2011 16:00 124.2 11/2/2011 16:30 124.07 11/2/2011 17:00 123.91 11/2/2011 17:30 123.44 11/2/2011 18:00 123.0616 11/2/2011 18:30 123.06 11/2/2011 19:00 123.13 11/2/2011 19:30 123.745 11/2/2011 20:00 123.96 11/2/2011 20:30 123.99 11/2/2011 21:00 123.99 11/3/2011 14:30 124.3 11/3/2011 15:00 124.38 11/3/2011 15:30 124.67 11/3/2011 16:00 125.19 11/3/2011 16:30 124.9 11/3/2011 17:00 125.27 11/3/2011 17:30 125.5 11/3/2011 18:00 125.58 11/3/2011 18:30 125.91 11/3/2011 19:00 125.8 11/3/2011 19:30 125.83 11/3/2011 20:00 126.215 11/3/2011 20:30 126.25 11/3/2011 21:00 126.25 11/4/2011 14:30 124.901 11/4/2011 15:00 124.43 11/4/2011 15:30 124.4654 11/4/2011 16:00 124.46 11/4/2011 16:30 124.68 11/4/2011 17:00 124.86 11/4/2011 17:30 124.73 11/4/2011 18:00 125.22 11/4/2011 18:30 125.48 11/4/2011 19:00 125.5601 11/4/2011 19:30 125.4091 11/4/2011 20:00 125.15 11/4/2011 20:30 125.43 11/4/2011 21:00 125.481 11/7/2011 15:30 125.91 11/7/2011 16:00 125.29 11/7/2011 16:30 124.79 11/7/2011 17:00 124.77 11/7/2011 17:30 124.7 11/7/2011 18:00 124.37 11/7/2011 18:30 124.56 11/7/2011 19:00 124.86 11/7/2011 19:30 125.3 11/7/2011 20:00 125.59 11/7/2011 20:30 125.95 11/7/2011 21:00 125.73 11/7/2011 21:30 126.27 11/7/2011 22:00 126.26 11/8/2011 15:30 127.33 11/8/2011 16:00 126.37 11/8/2011 16:30 126.46 11/8/2011 17:00 126 11/8/2011 17:30 126.06 11/8/2011 18:00 126.2662 11/8/2011 18:30 126.23 11/8/2011 19:00 126.4499 11/8/2011 19:30 127.12 11/8/2011 20:00 127.48 11/8/2011 20:30 127.49 11/8/2011 21:00 127.69 11/8/2011 21:30 127.88 11/8/2011 22:00 127.88 11/9/2011 15:30 124.51 11/9/2011 16:00 124.42 11/9/2011 16:30 124.92 11/9/2011 17:00 125.18 11/9/2011 17:30 125.23 11/9/2011 18:00 124.81 11/9/2011 18:30 125.07 11/9/2011 19:00 124.61 11/9/2011 19:30 123.8869 11/9/2011 20:00 123.24 11/9/2011 20:30 123.3329 11/9/2011 21:00 123.6 11/9/2011 21:30 123.19 11/9/2011 22:00 123.161 The rownames are datas plus hour, the data column is the time series' value. -- View this message in context: http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-tp903708p4630228.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : Wrong Q3 + Mean.
Hi, Probably you could check this: ?quantile Particularly the 'type' option. Best Regards, Pascal - Mail original - De : Retep32 retepdel...@web.de À : r-help@r-project.org Cc : Envoyé le : Mercredi 16 mai 2012 16h22 Objet : [R] Wrong Q3 + Mean. Hi. a [1] 13 13 14 14 15 15 16 20 21 26 summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max. 13.0 14.0 15.0 16.7 19.0 26.0 mean(a) [1] 16.7 quantile(a) 0% 25% 50% 75% 100% 13 14 15 19 26 Clearly, this is not right. My Instructor and I have no idea why the program does that. I removed the program from the computer , installed it again and it still shows the mistake. It is also strange, that I chose english as installlanguage, but the program is in german (my OS is in german). Pls help, because otherwise i cannot solve any problems with R. Using Win7 and R version 2.15.0 (2012-03-30). Retep -- View this message in context: http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] finding mean and SD for a log-normal distribution
Dear R Expert allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 describing my distribution. I would like to convert this distribution into a log normal distribution that would best describe it when resimulated using log normal distribution. Currently I am using another software to estimate the respective mean and SD on the log scale and the results are: 1.6667 and SD 0.47071. Then, to best reproduce my original distribution in R, I use the following commands: c - rlnorm(5000,1.6667,0.47071) d - exp(c) mean(c) sd(c) and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), respectively, which I am reasonably happy with. I would like to grow independent of the another software I use, but am unable to figure out how to generate the values of 1.6667 and 0.47071 using R. could someone please help me with this question? thanks, Andras [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write data using xlsReadWrite
Hai, I have change it to these, but error and I couldn't fix it. Do you have any idea why? file - system.file(D:\\FYP\\image\\Cropped Images\\user61, forgerUser61.xlsx, package = xlsx) wb - loadWorkbook(forgerUser61.xlsx) sheets - getSheets(wb) sheet - sheets[[all]] res - readRows(sheet, startRow=4, endRow=5, startColumn=2, endColumn=3) Error in readRows(sheet, startRow = 4, endRow = 5, startColumn = 2, endColumn = 3) : attempt to apply non-function -- View this message in context: http://r.789695.n4.nabble.com/write-data-using-xlsReadWrite-tp4629825p4630231.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with get() inside of lme()
You can achieve that with a combination of as.formula and paste. library(nlme) data(petrol, package = MASS) lme(as.formula(paste(Y.VAR, ~EP)), random= ~1|No, data=petrol) Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens chuck.01 Verzonden: zaterdag 12 mei 2012 23:28 Aan: r-help@r-project.org Onderwerp: Re: [R] problem with get() inside of lme() Here is an example: library(nlme) library(lme4) library(MASS) data(petrol) # a variable for one of the columns in petrol Y.VAR - Y # This works: lmer(get(Y.VAR)~EP +(1|No), data=petrol) # This doesn't: lme(get(Y.VAR)~EP, random= ~1|No, data=petrol) # but this does: lme(Y~EP, random= ~1|No, data=petrol) I'd really like to use the variable... again, this is inside a function. Any idea how to solve this. Thanks for your time and expertise, Chuck chuck.01 wrote please note that I edited the original message to say: length(with(new3, perm.score))==length(with(new3, get(TRAIT1))) [1] TRUE chuck.01 wrote Hi, The following lines of code are inside of a function, where TRAIT1 is a function variable calling a column-name inside of the data.frame new3. This works just fine: m2 - lmer(get(TRAIT1) ~ perm.score + (1|site), data=new3) but this will not work: m3 - lme(get(TRAIT1) ~ perm.score , random= ~1|site, data=new3) I get the following error: Error in model.frame.default(formula = ~TRAIT1 + perm.score + site, data = list( : variable lengths differ (found for 'perm.score') it seems to be putting TRAIT1 on the left side of the equation, and if I am wrong about that, the different lengths from the error is still not true: length(with(new3, perm.score))==length(with(new3, get(TRAIT1))) [1] TRUE Any ideas on either what is going on, or how I can fix this? ** I'm not including example data, or function because I am hoping it is not needed ** Please let me know if I am wrong. -- View this message in context: http://r.789695.n4.nabble.com/problem-with-get-inside-of-lme-tp4629360p4629588.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANCOVA power
Dear list members: I am trying to calculate power for an ANCOVA analysis. I have found different solutions such as power.t.test and power.anova.test but they seem to refer to the ANOVA part of the ANCOVA. My model is of the form: lm (y ~ factor + x1 + x2 + x2*myfactor) where myfactor is a factorial variable. And I am interested in calculating the power of the significance test, mainly for the interaction term between x2 and the factor. I would appreciate you help Xan. -- View this message in context: http://r.789695.n4.nabble.com/ANCOVA-power-tp4630238.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem to resolve a step for reading a large TXT and, split in several file
Hello, Your bug is obvious, each pass through the loop you read twice and write only once. The file pointer keeps moving forward... Use something like while (length(pv - readLines(con, n=n)) 0 ) { # note that this line changed. i - i + 1 write.table(pv, file = paste(fileNames.temp.1, _, i, .txt, sep = ), sep = \t) } (or put the line with read.table where you have readLines.) Anyway, I don't like it very much. If you know the number of lines in the input file, it would be much better to use integer division and modulus to determine how many times and how much to read. Something like n - 100 passes - number.of.lines.in.file %/% n remaining - number.of.lines.in.file %% n for(i in seq.int(passes)){ [ ... read n lines at a time process them...] } if(remaining){ n - remaining [ ...read what's left... ] } If you do not know how many lines are there in the file, see (package::function) parser::nlines R.utils::countLines Hope this helps, Rui Barradas Em 16-05-2012 11:00, r-help-requ...@r-project.org escreveu: Date: Tue, 15 May 2012 22:16:42 +0200 From: gianni lavaredogianni.lavar...@gmail.com To:r-help@r-project.org Subject: [R] Problem to resolve a step for reading a large TXT and split in several file Message-ID: caj6jbr-ywgjsfu8o0unvet6m8p8wvp7ybosxw5nrdz48wod...@mail.gmail.com Content-Type: text/plain Dear Researchs, It's the first time I am trying to resolve this problem. I have a TXT file with 1408452 rows. I wish to split file-by-file where each file has 1,000,000 rows with the following procedure: # split in two file one with 1,000,000 of rows and one with 408,452 of rows file- 09G001_72975_7575_25_4025.txt fileNames- strsplit(as.character(file), ., fixed = TRUE) fileNames.temp.1- unique(as.vector(do.call(rbind, fileNames)[, 1])) con- file(file, open = r) # n is the number of row n- 100 i- 0 while (length(readLines(con, n=n)) 0 ) { i- i + 1 pv- read.table(con,header=F,sep=\t, nrow=n) write.table(pv, file = paste(fileNames.temp.1,_,i,.txt,sep = ), sep = \t) } close(con) when I use 1,000,000 I have in the directory only 09G001_72975_7575_25_4025_1.txt (with 100 of rows) and not 09G001_72975_7575_25_4025_2.txt (with 408,452). I din't understand where is my bug Furthermore when i wish for example split in 3 files (where n is 469484 = 1408452/3) i have this message: *Error in read.table(con, header = F, sep = \t, nrow = n) : no lines available in input* Thanks for all help and sorry for the disturb [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automating R for Hypothesis Testing
Rui- Just a quick question. I understand your comment on using ANOVA, but doesn't this only test for similarities of the mean. We are trying to see if a the same model can fit for two or three months, therefore have the similar slope and intercept. The ANOVA would only do one part of this correct, with the F-Test? Thanks! Meredith Rui Barradas wrote Hello, I'm glad it helped. As for your second question, I don't know, but I'm not very comfortable with the way you're doing things. Why subtract the coefficients of model 1 from model 2? And why the dummy? Why set model 1 to zero? Isn't it better to use anova's F? After all, it's designed for it, for the linear model... And if you really want/need the dummy, wouldn't a nested anova do it? (F statistic, once again.) anova(model1, model2) is simple and statistically speaking seems to me much better. (I specially don't like the subtraction bit.) Rui Barradas meredith wrote Rui- Thanks this definitely helps, just one quick question. How would you code the values of chi-fm and chi-fms to change based on the degrees of freedom of each model H(i)? Meredith Rui Barradas wrote Hello, Yes, it does help. Now we can see your data and what you're doing. What follows is a suggestion on what you could do, not full solution. (You forgot to say what X1 is, but I don't think it's important to understand the suggestion.) (If I'm wrong, say something.) milwaukeephos - read.csv(milwaukeephos.csv, header=TRUE, stringsAsFactors=FALSE) # list of data.frames, one per month ls1 - split(milwaukeephos, milwaukeephos$month) #- if you want to keep the models, not needed if you don't. # (yoy probably don't) modelH - vector(list, 12) modelHa - vector(list, 12) modelH2 - vector(list, 12) modelH2a - vector(list, 12) #- values to record, these are needed, create them beforehand. chi_fm - numeric(12) chi_fms - numeric(12) # seq_months - c(1:12, 1) # wrap months around. for(i in 1:12){ month_this - seq_months[i] month_next - seq_months[i + 1] lload - c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg) lflow - c(ls1[[month_this]]$flow, ls1[[month_next]]$flow) modelH[[i]] - lm(lload ~ lflow) # If you don't want to keep the models, use modelH only # ( without [[i]] ) # and do the same with X1 # rest of your code for first test goes here chi_fm[i] - bfm %*% var_fm %*% (bunres_fm - bres_fm) # and the same for the second test chi_fms[i] - ...etc... } Hope this helps, Rui Barradas meredith wrote dput: http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv milwaukeephos.csv # Feb-march modelH_febmarch-lm(llfeb_march~lffeb_march) modelHa_febmarch-lm(llfeb_march~X1feb_mar+lffeb_march) anova(modelHa_febmarch) coefficients(modelH_febmarch) (Intercept) lffeb_march -2.4298901.172821 coefficients(modelHa_febmarch) (Intercept) X1feb_mar lffeb_march -2.8957776 -0.5272793 1.3016303 bres_fm-matrix(c(-2.429890,0,1.172821),nrow=3) bunres_fm-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3) bfm-t(bunres_fm-bres_fm) fmvect-seq(1,1,length=34) X1a_febmar-seq(0,0,length=9) # dummy variable step 1 X1b_febmar-seq(1,1,length=25) # dummy variable step 2 X1feb_mar-c(X1a_febmar,X1b_febmar) #dummy variable creation # Test Stat Equation for Chisq fmxx-cbind(fmvect,X1feb_mar,lffeb_march) tfmx-t(fmxx) xcom_fm-(tfmx %*% fmxx) xinv_fm-ginv(xcom_fm) var_fm-xinv_fm*0.307 chi_fm-bfm %*% var_fm %*% (bunres_fm-bres_fm) chi_fm # chisq value for recording if less than CV move onto to slope modification modelH2_febmarch-lm(llfeb_march~X3feb_march) modelH2a_febmarch-lm(llfeb_march~X3feb_march+X4feb_march) anova(modelH2a_febmarch) coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test (Intercept) X3feb_march 5.3421301.172821 coefficients(modelH2a_febmarch) (Intercept) X3feb_march X4feb_march 5.2936263 1.0353752 0.2407557 # Test Stat bsres_fm-matrix(c(5.342130,1.172821,0),nrow=3) bsunres_fm-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3) bsfm-t(bsunres_fm-bsres_fm) #X matrix fmxs-cbind(fmvect,X3feb_march,X4feb_march) tfmxs-t(fmxs) xcoms_fm-(tfmxs %*% fmxs) xinvs_fm-ginv(xcoms_fm) var_fms-xinvs_fm*0.341 chi_fms-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm) chi_fms # Record Chisq value Does this help? Here lffeb_march is the combination of Feb and March log flows and llfeb_march is the combination of Feb and March log loads X3: lffeb_march-mean(feb_march) X4: X1*X3 Thanks Rui Barradas wrote Hello, I'm not at all sure if I understand your problem. Does this describe it? test first model for months 1 and 2 if test statistic less than critical value{ test second model for months 1 and 2 print results of the first and second tests? just one of them? } move on to months 2 and 3 etc, until months 12 and 1
Re: [R] Help needed for efficient way to loop through rows and columns
Can you show us what you want the final data.frame to look like? You've created five variables stored as factors and you seem to be trying to change those to numeric values? Is that correct? Since AB and BA are always set to 1, you could just replace those values globally rather than mess with the ifelse commands for those values. Only AA and BB are affected by the value of AorB. Your apply() function processes the data.frame by row so i is a vector consisting of all the values in the row. You seem to be coding as if i was a single integer (as in a for loop). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Priya Bhatt Sent: Wednesday, May 16, 2012 3:08 AM To: r-help@r-project.org Subject: [R] Help needed for efficient way to loop through rows and columns Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called sample: names - c(S1, S2, S3, S4) X - c(BB, AB, AB, AA) Y - c(BB, BB, AB, AA) Z - c(BB, BB, AB, NA) AorB - c(A, A, A, B) sample - data.frame(names, X, Y, Z, AorB) for a given row, if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2 I've been trying to write this using apply and ifelse statements in hopes that my code runs quickly, but I'm afraid I've make a big mess. See below: apply(sample, 1, function(i) { ifelse(sample$AorB[i] == A, (ifelse(sample[i,] == AA, sample[i,] - 2 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 0, sample[i,] - NA )) ) ) , ifelse(sample$AorB[i,] == B), (ifelse(sample[i,] == AA, sample[i,] - 0 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 2, sample[i,] - NA) }) Any Advice? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation among variables in the same subset
? cor e.g., x - data.frame(rnorm(5), rnorm(5), rnorm(5), rnorm(5), rnorm(5)) cor(x) Best, Michael On Wed, May 16, 2012 at 6:52 AM, Andrea Sica aerdna.s...@gmail.com wrote: Dear all, I have created a subset from my dataset, which contains 6 variables. I need to make the correlation among all of them, possibly, without making it one by one. Is there any command that can permits me to do it directly for all of them in the same time? Thank you so much in advance. Andrea [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tm package: problem of TermDocumentMatrix and minWordLength
try this: dtm - DocumentTermMatrix(examplecorpus, control = list(wordLengths=c(1,100))) On Wed, May 16, 2012 at 6:22 AM, C.H. chainsawti...@gmail.com wrote: Dear All, The following code illustrate the problem. [R code] require(tm) exampledoc - c(R is good, R is really good) examplecorpus - Corpus(VectorSource(exampledoc), encoding = UTF-8) dtm - DocumentTermMatrix(examplecorpus, control = list(minWordLength = 1)) as.matrix(dtm) [/R code] The term R and is were not included in the dtm even the control parameter minWordLength was set to 1. Terms Docs good really 1 1 0 2 1 1 Would you reproduce this problem? The following is my sessionInfo sessionInfo() R version 2.15.0 (2012-03-30) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tm_0.5-7.1 loaded via a namespace (and not attached): [1] compiler_2.15.0 slam_0.1-23 tools_2.15.0 Regards, CH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to sum and group data by DATE in data frame
Fascinating... dput() has never given me anything that looks like that I would have expected something much more like z - structure(c(123.53, 123.78, 124.24, 124.2, 124.07, 123.91, 123.44, 123.0616, 123.06, 123.13, 123.745, 123.96, 123.99, 123.99, 124.3, 124.38, 124.67, 125.19, 124.9, 125.27, 125.5, 125.58, 125.91, 125.8, 125.83, 126.215, 126.25, 126.25, 124.901, 124.43, 124.4654, 124.46, 124.68, 124.86, 124.73, 125.22, 125.48, 125.5601, 125.4091, 125.15, 125.43, 125.481, 125.91, 125.29, 124.79, 124.77, 124.7, 124.37, 124.56, 124.86, 125.3, 125.59, 125.95, 125.73, 126.27, 126.26, 127.33, 126.37, 126.46, 126, 126.06, 126.2662, 126.23, 126.4499, 127.12, 127.48, 127.49, 127.69, 127.88, 127.88, 124.51, 124.42, 124.92, 125.18, 125.23, 124.81, 125.07, 124.61, 123.8869, 123.24, 123.3329, 123.6, 123.19, 123.161), index = structure(c(1320258600, 1320260400, 1320262200, 1320264000, 1320265800, 1320267600, 1320269400, 1320271200, 1320273000, 1320274800, 1320276600, 1320278400, 1320280200, 1320282000, 1320345000, 1320346800, 1320348600, 1320350400, 1320352200, 1320354000, 1320355800, 1320357600, 1320359400, 1320361200, 1320363000, 1320364800, 1320366600, 1320368400, 1320431400, 1320433200, 1320435000, 1320436800, 1320438600, 1320440400, 1320442200, 1320444000, 1320445800, 1320447600, 1320449400, 1320451200, 1320453000, 1320454800, 1320697800, 1320699600, 1320701400, 1320703200, 1320705000, 1320706800, 1320708600, 1320710400, 1320712200, 1320714000, 1320715800, 1320717600, 1320719400, 1320721200, 1320784200, 1320786000, 1320787800, 1320789600, 1320791400, 1320793200, 1320795000, 1320796800, 1320798600, 1320800400, 1320802200, 1320804000, 1320805800, 1320807600, 1320870600, 1320872400, 1320874200, 1320876000, 1320877800, 1320879600, 1320881400, 1320883200, 1320885000, 1320886800, 1320888600, 1320890400, 1320892200, 1320894000), class = c(POSIXct, POSIXt), tzone = ), class = zoo) which is about 100x more convenient With that, aggregate(z, as.Date(time(z)), sum) and aggregate(z, format(time(z), %m %d), sum) give different results (at least in my time zone) so try the latter (it seems to be what you were probably looking for) If that doesn't nail it down, I'll need you to answer the questions I asked in my previous email. Best, Michael On Wed, May 16, 2012 at 6:14 AM, Cren oscar.soppe...@bancaakros.it wrote: Michael Weylandt wrote Can you provide a reproducible example? Of course, Michael. Consider the following time series: 11/2/2011 14:30 123.53 11/2/2011 15:00 123.78 11/2/2011 15:30 124.24 11/2/2011 16:00 124.2 11/2/2011 16:30 124.07 11/2/2011 17:00 123.91 11/2/2011 17:30 123.44 11/2/2011 18:00 123.0616 11/2/2011 18:30 123.06 11/2/2011 19:00 123.13 11/2/2011 19:30 123.745 11/2/2011 20:00 123.96 11/2/2011 20:30 123.99 11/2/2011 21:00 123.99 11/3/2011 14:30 124.3 11/3/2011 15:00 124.38 11/3/2011 15:30 124.67 11/3/2011 16:00 125.19 11/3/2011 16:30 124.9 11/3/2011 17:00 125.27 11/3/2011 17:30 125.5 11/3/2011 18:00 125.58 11/3/2011 18:30 125.91 11/3/2011 19:00 125.8 11/3/2011 19:30 125.83 11/3/2011 20:00 126.215 11/3/2011 20:30 126.25 11/3/2011 21:00 126.25 11/4/2011 14:30 124.901 11/4/2011 15:00 124.43 11/4/2011 15:30 124.4654 11/4/2011 16:00 124.46 11/4/2011 16:30 124.68 11/4/2011 17:00 124.86 11/4/2011 17:30 124.73 11/4/2011 18:00 125.22 11/4/2011 18:30 125.48 11/4/2011 19:00 125.5601 11/4/2011 19:30 125.4091 11/4/2011 20:00 125.15 11/4/2011 20:30 125.43 11/4/2011 21:00 125.481 11/7/2011 15:30 125.91 11/7/2011 16:00 125.29 11/7/2011 16:30 124.79 11/7/2011 17:00 124.77 11/7/2011 17:30 124.7 11/7/2011 18:00 124.37 11/7/2011 18:30 124.56 11/7/2011 19:00 124.86 11/7/2011 19:30 125.3 11/7/2011 20:00 125.59 11/7/2011 20:30 125.95 11/7/2011 21:00 125.73 11/7/2011 21:30 126.27 11/7/2011 22:00 126.26 11/8/2011 15:30 127.33 11/8/2011 16:00 126.37 11/8/2011 16:30 126.46 11/8/2011 17:00 126 11/8/2011 17:30 126.06 11/8/2011 18:00 126.2662 11/8/2011 18:30 126.23 11/8/2011 19:00 126.4499 11/8/2011 19:30 127.12 11/8/2011 20:00 127.48 11/8/2011 20:30 127.49 11/8/2011 21:00 127.69 11/8/2011 21:30 127.88 11/8/2011 22:00 127.88 11/9/2011 15:30 124.51 11/9/2011 16:00 124.42 11/9/2011 16:30 124.92 11/9/2011 17:00 125.18 11/9/2011 17:30 125.23 11/9/2011 18:00 124.81 11/9/2011 18:30 125.07 11/9/2011 19:00 124.61 11/9/2011 19:30 123.8869 11/9/2011 20:00 123.24 11/9/2011 20:30 123.3329 11/9/2011 21:00 123.6 11/9/2011 21:30 123.19 11/9/2011 22:00 123.161 The rownames are datas plus hour, the data column is the time series' value. -- View this message in context: http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-tp903708p4630228.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] Reading Excel Formulas as values
I can't replicate your problem. I created a spreadsheet in Excel 2007 consisting of three columns. Numbers from 1 - 15, rand(), and the sum of the first two columns. Using all the defaults with read.xlsx() (package: xlsx), I get the values of each column and using keepFormulas=TRUE, I get the formulas as factors. I don't get any NA's. I can also place a formula on the second sheet that accesses data from the first sheet without any problems. I haven't tried, Excel 2010. Could your formulas be accessing data from another spreadsheet? -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Mike Smith Sent: Tuesday, May 15, 2012 3:11 PM To: r-help@r-project.org Subject: [R] Reading Excel Formulas as values When I read excel files using the read.xlsx() command any cells that have formulas in them come up as NA. Is there a way to read just the numeric value of the cell without using the paste value command in Excel? I need to read in hundreds of Excel spreadsheets and compile them into one large super spreadsheet automatically. Hence the reason I cannot reformat each sheet manually. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence intervals for nls or nls2 model
Thanks! Now it is clear. Francisco On Wed, 16 May 2012 07:32:56 -0400, Gabor Grothendieck wrote On Tue, May 15, 2012 at 11:20 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Tue, May 15, 2012 at 8:08 PM, Francisco Mora Ardila fm...@oikos.unam.mx wrote: Hi all I have fitted a model usinf nls function to these data: x [1] 1 0 0 4 3 5 12 10 12 100 100 100 y [1] 1.281055090 1.563609934 0.001570796 2.291579783 0.841891853 [6] 6.553951324 14.243274230 14.519899320 15.066473610 21.728809880 [11] 18.553054450 23.722637370 The model fitted is: modellogis-nls(y~SSlogis(x,a,b,c)) It runs OK. Then I calculate confidence intervals for the actual data using: dataci-predict(as.lm(modellogis), interval = confidence) BUt I don´t get smooth curves when plotting it, so I want to get other confidence vectors based on a new x vector by defining a new data to do predictions: x0 - seq(0,15,1) dataci-predict(as.lm(modellogis), newdata=data.frame(x=x0), interval = confidence) BUt it does not work: I get the same initial confidence interval Any ideas on how to get tconfidence and prediction intervals using new X data on a previous model? as.lm is a linear model between the response variable and the gradient of the nonlinear model and as we see below x is not part of that linear model so x can't be in newdata when predicting from the tangent model. We can only make predictions at the original x points. For other x's we could use Interpolation. See ?approx (?spline can also work in smooth cases but in the example provided the function has a kink and that won't work well with splines.) as.lm(modellogis)$model y a b c (offset) 1 1.281055090 0.06601796 -4.411829e-01 1.168928e+00 1.397153 2 1.563609934 0.04798815 -3.268846e-01 9.766080e-01 1.015584 3 0.001570796 0.04798815 -3.268846e-01 9.766080e-01 1.015584 4 2.291579783 0.16311227 -9.767241e-01 1.597189e+00 3.451981 5 0.841891853 0.12203013 -7.665928e-01 1.512752e+00 2.582551 6 6.553951324 0.21464369 -1.206154e+00 1.564573e+00 4.542552 7 14.243274230 0.74450055 -1.361047e+00 -1.455630e+00 15.756031 8 14.519899320 0.59707858 -1.721353e+00 -6.770205e-01 12.636107 9 15.066473610 0.74450055 -1.361047e+00 -1.455630e+00 15.756031 10 21.728809880 1. -2.943955e-13 -9.073765e-12 21.163223 11 18.553054450 1. -2.943955e-13 -9.073765e-12 21.163223 12 23.722637370 1. -2.943955e-13 -9.073765e-12 21.163223 I have added a FAQ to the home page since this isn't the first time this question has come up: http://nls2.googlecode.com#FAQs -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- Francisco Mora Ardila Estudiante de Doctorado Centro de Investigaciones en Ecosistemas Universidad Nacional Autónoma de México __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error on easy way for JoSAE Package
On May 16, 2012, at 1:33 AM, ana24maria wrote: Thank you very much. After using dput and the easy way ( result - eblup.mse.f.wrap(domain.data = amigo, lme.obj = fit.lme)), i have got the following error: Error in `[.data.frame`(sample.data, , variabs) : undefined columns selected What John was asking you to do was at your console just type: dput(amigo) ... and then copy the output to an email and send that to the list. Your first posting had data that was ambiguous as to content as well as mangled by the various email clients and servers that processed on the path to our eyes. What should I do? You should also read the Posting Guide. -- View this message in context: http://r.789695.n4.nabble.com/Error-on-easy-way-for-JoSAE-Package-tp4625684p4630220.html PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding mean and SD for a log-normal distribution
On 16.05.2012 12:37, Andras Farkas wrote: Dear R Expert allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 describing my distribution. I would like to convert this distribution into a log normal distribution that would best describe it when resimulated using log normal distribution. Currently I am using another software to estimate the respective mean and SD on the log scale and the results are: 1.6667 and SD 0.47071. Then, to best reproduce my original distribution in R, I use the following commands: c- rlnorm(5000,1.6667,0.47071) d- exp(c) mean(c) sd(c) and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), respectively, which I am reasonably happy with. I would like to grow independent of the another software I use, but am unable to figure out how to generate the values of 1.6667 and 0.47071 using R. could someone please help me with this question? Just make use of a textbook: meanlog - log(6) - 0.5 * log(1 + 9/(6^2)) sdlog - sqrt(log(1 + 9/(6^2))) Uwe Ligges thanks, Andras [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] code to iterate function apply to matrix
On 16.05.2012 08:11, umai88 wrote: I got this code below and i want to repeat the loop for 100 times.. And what is the problem? What are you aiming at? Uwe Ligges x-rnorm(60) mat1-matrix(x,nrow=15,ncol=4) trim-numeric(ncol(mat1)) win-numeric(ncol(mat1)) ssd-numeric(ncol(mat1)) for(j in 1:ncol(mat1)) { n=length(mat1[,j]) alpha=0.1 k=floor(alpha*n)+1 r=k-(alpha*n) i=k+1 m=n-k y1-sort(mat1[,j]) y-y1[i:m] x.low=(1-r)*y1[k+1]+r*y1[k] x.upp=(1-r)*y1[n-k]+r*y1[n-k+1] trim[j] =1/((1-2*alpha)*n)*(sum(y)+r*(y1[k]+y1[n-k+1])) win[j]=1/n*(sum(y)+k*(x.low+x.upp)) ssd[j]-sum((y-win[j])**2)+k*( (y1[k+1]-win[j])**2 + (y1[n-k]-win[j])**2 ) } trim.mean-matrix(trim, nrow=1) win.mean-matrix(win, nrow=1) sum.sq.dev-matrix(ssd, nrow=1) -- View this message in context: http://r.789695.n4.nabble.com/code-to-iterate-function-apply-to-matrix-tp4630221.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use the value of rect to determine the location of legend
On 16.05.2012 02:13, Gundala Viswanath wrote: Given the attached plot, Nothing came through. how can I locate the center text with Mean and SD so that it can be placed exactly under ---emp.? The current code I have is this: L = list(bquote(Em.Mean ==.(new_avg)),bquote(Em.SD==.(new_std)), bquote(Th.Mean ==.(theor_avg)), bquote(Th.SD==.(theor_sd))) Not reproducible. Uwe Ligges legend(topright, c(kids,emp.), cex=0.7, bty=n, col=c(cm.colors(6), red), pch=c(rep(19, 6), -5), lty = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0), ) # How can I locate this legend(topcenter, cex=0.5, bty=n, legend=sapply(L, as.expression)) -G.V. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret: Error when using rpart and CV != LOOCV
More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote: Hy, I got the following problem when trying to build a rpart model and using everything but LOOCV. Originally, I wanted to used k-fold partitioning, but every partitioning except LOOCV throws the following warning: Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - Below are some simplified testcases which repoduce the warning on my system. Question: What does this error mean? How can I avoid it? System-Information: - sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 reshape_0.8.4 [6] plyr_1.7.1 lattice_0.20-6 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 [5] tools_2.15.0 --- Simlified Testcase I: Throws warning --- library(caret) data(trees) formula=Volume~Girth+Height train(formula, data=trees, method='rpart') --- Simlified Testcase II: Every other CV-method also throws the warning, for example using 'cv': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='cv') train(formula, data=trees, method='rpart', trControl=tc) --- Simlified Testcase III: The only CV-method which is working is 'LOOCV': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='LOOCV') train(formula, data=trees, method='rpart', trControl=tc) --- Thanks! -- Dominik Bruhn mailto: domi...@dbruhn.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need your help setting $R_check_force_suggests = FALSE on Windows system
On 15.05.2012 19:23, Zhiqiu Hu wrote: r-help@r-project.org Dear friends, I want to make the following change of R setting on a windows 7 desktop. $R_check_force_suggests = FALSE You can change it globally in the operating systems defaults for environment variable, or for the current session in the Windows command shell (cmd) you can simply say set _R_CHECK_FORCE_SUGGESTS_=FALSE Note the underscores and the upper case spelling! Uwe Ligges Since I have no experience using Unix, I don't how to make the suggestions in writing R extension works for windows. I will appreciate if you would help me to figure out what is the equivalent of the following settings in Windows system. *** In addition to the available command line options, R CMD check also allows customization by setting (Perl) configuration variables in a configuration file, the location of which can be specified via the --rcfile option and defaults to $HOME/.R/check.conf provided that the environment variable HOME is set. The following configuration variables are currently available. $R_check_force_suggests If true, give an error if suggested packages are not available. Default: true. *** Installation paths on my desktop C:\Rtools C:\Program Files\R\R-2.15.0\bin\x64 Thank you very much. Noah __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kolmogorov-Smirnov critical values
Hi! Any one knows how to obtain critical values for the k-s statistic, using R? Thanks, Alex -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging multiple data sets
Hello R user, I have four data sets in dir D:/Bharat Warule/Rdata_file which are output_data_prod_1.rda, output_data_prod_2.rda, output_data_prod_3.rda, output_data_prod_4.rda. Each data set is huge size like number of rows 343297 and columns are near to 50. For example: x1 - data.frame(x11=c(1,2,3,4,5),x112=c(10,10,10,10,10)) x2 - data.frame(x11=c(1,2,3,4,5),x122=c(20,20,20,20,20)) x3 - data.frame(x11=c(1,2,3,4,5),x132=c(30,30,30,30,30)) x4 - data.frame(x11=c(1,2,3,4,5),x142=c(40,40,40,40,40)) x5 - data.frame(x11=c(1,2,3,4,5),x152=c(50,50,50,50,50)) for(i in 1:5){ name - paste('x',i,sep='') name1 - paste(name,rda,sep='.') save(name, file = name1) } I want merge this data sets into one data set but I don’t know where I am going wrong? Please help me. Thanks for your help. subsetname - x1 file_no- 4 output_data_prod- data.frame() for(n in 1:file_no){ myfile- gsub(( ), , paste(subsetname , _, n,.rda)) temp_data - load(file = myfile) data_22 - get(temp_data) if(dim(output_data_prod)[1]==0){output_data_prod - data_22 }else{ output_data_prod - merge(inData1 = output_data_prod, inData2 = data_22 ,type = inner, all=FALSE , by =c(x11))} } - Bharat Warule Cypress Analytica , Pune -- View this message in context: http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp4630244.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hmisc improveProb() and PredictABEL reclassification () function and continuous NRI
Dear Sirs. I am working with the R packages Hmisc and PredictABEL to make NRI estimates from my Cox models with and without a specific biomarker. According to Pencina et al (Statistics in Medicine 2010, DOI: 0.1002/sim.4085 ), a continuous/non-categorical NRI (NRI0) is to be used when there are no obvious reason to categorize risk, such as the risk of future cardiovascular events in patients with established cardiovascular disease. My question is therefore: Which value(s) are to be used in the calculation of continuous NRI from the output in Hmisc or in PredictABEL? Does continuous NRI equal total NRI in the output? Yours sincerely Gard Frodahl T. Svingen PhD student the University of Bergen Bergen, Norway [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector w/o arithmetic addition for boxplot
On 15.05.2012 23:47, rl269 wrote: Hello, I am having trouble asking R to read individual numeric vectors for a box plot of the residuals of a linear regression. It is performing arithmetic addition on the 16 individual variables that I want individual box plots for. I have 16 race*treatment variables that were created from cleaned data.frames for race and treatment independently: t1W, t1B.t4W, t4B, t4H, t4O. class(t1W) produces numeric (1000 observations of 1's and 0's) To create the box plot I am using boxplot(residuals(IRR)~ treatRace_clean) where I have tried treatRace_clean as both of the following treatRace_clean- as.factor(as.vector(t1W + t1B + t1H + t1O + t2W + t2B + t2H + t2O + t3W + t3B + t3H + t3O + t4W + t4B + t4H + t4O)) treatRace_clean- as.vector(c(t1W, t1B ,t1H ,t1O, t2W , t2B ,t2H ,t2O ,t3W , t3B, t3H + t3O , t4W , t4B , t4H , t4O)) Actually, I have no idea what you are really aiming at, reproducible code and a precise description would help a lot. Uwe Ligges However, I continue to get this error code: Error: $ operator is invalid for atomic vectors Thoughts? -- View this message in context: http://r.789695.n4.nabble.com/vector-w-o-arithmetic-addition-for-boxplot-tp4630190.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clusters in zero-inflated negative binomial models
Dear all, I want to build a model in R based on animal collection data, that look like the following Nr Village DistrictSiteSurvey Species Count 1 AX A F Dry B 0 2 AY A V Wet A 5 3 BX B F Wet B 1 4 BY B V Dry B 0 Each data point shows one collection unit in a certain Village, District, Site, and Survey for a certain Species. 'Count' is the number of animals collected in that collection unit. It is possible that zero animals are collected in that unit because of very low densities, but also because of climatic conditions (wind, rain, etc), so we would expect an excess in zeroes. I have tested that the data are overdispersed (variance much bigger than mean), so a zero-inflated negative binomial model seems the most suitable model in this case. To be sure, I will compare the zero-inflated model to the standard binomial model using the vuong test. The models will be made for each species separately. For these models I can use the glm.nb(), and the and zeroinfl () in the package pscl, looking something like this (after selection of the subset B-subset(data, Species==B)): NB=glm.nb(formula = Count ~ District+Site+Survey, data = B) ZINB=zeroinfl(formula = Count ~ District+Site+Survey, dist=negbin, data = B) Vuong(NB,ZINB) I have tried this and it works very elegantly. However, the animal collections were only done in 4 districts, and in each district 3 villages were chosen (a total of 12 villages). This should be included in the design. The package survey allows this for the standard negative binomial model, but it seems to me that it is not possible for the zero-inflated NB. So, my question is two-fold: 1. Is a zero-inflated NB possible in the survey package. If yes, how? 2. If no, how can I build a zero-inflated NB model that takes into account the clustering of the observations (animal counts) in villages and the clustering of the villages in districts. Thank you very much for the help. ITM Colloquium Antwerp, Belgium 3-5 December 2012 www.itg.be/colloq2012 Disclaimer: Http://www.itg.be/disclaimer Directions to our location(s): http://g.co/maps/ua89b __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for efficient way to loop through rows and columns
Hello, Your data.frame is composed exclusively of factors, but try this (I've changed the name to 'sampl', because 'sample' is an R function.) # logical index vectors iA - sampl$AorB == A iB - sampl$AorB == B new.sampl - data.frame( apply(sampl, 2, function(x){ iAA - x == AA iBB - x == BB x[ iA iAA ] - 2 x[ iA iBB ] - 0 # x[ iB iAA ] - 0 x[ iB iBB ] - 2 # x[ x %in% c(AB, BA) ] - 1 x} )) Hope this helps, Rui Barradas Priya Bhatt wrote Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called sample: names - c(S1, S2, S3, S4) X - c(BB, AB, AB, AA) Y - c(BB, BB, AB, AA) Z - c(BB, BB, AB, NA) AorB - c(A, A, A, B) sample - data.frame(names, X, Y, Z, AorB) for a given row, if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2 I've been trying to write this using apply and ifelse statements in hopes that my code runs quickly, but I'm afraid I've make a big mess. See below: apply(sample, 1, function(i) { ifelse(sample$AorB[i] == A, (ifelse(sample[i,] == AA, sample[i,] - 2 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 0, sample[i,] - NA )) ) ) , ifelse(sample$AorB[i,] == B), (ifelse(sample[i,] == AA, sample[i,] - 0 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 2, sample[i,] - NA) }) Any Advice? [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Help-needed-for-efficient-way-to-loop-through-rows-and-columns-tp4630226p4630248.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
On Wed, May 16, 2012 at 9:52 AM, aramos ara...@fep.up.pt wrote: Hi! Any one knows how to obtain critical values for the k-s statistic, using R? Take a look at ?ks.test and the code of ks.test to see how R does it. OSS is super helpful for these sorts of things. Michael Thanks, Alex -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] install ggplot2 package
Has one try to install the ggplot2 package recently? I tried to install it on my new system and had trouble: install.packages(ggplot2) Installing package(s) into 'C:/Program Files/R/R-2.14.2/library' (as 'lib' is unspecified) also installing the dependency 'scales' trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : download of package 'scales' failed trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : download of package 'ggplot2' failed Thanks Ming Ming Yang, PhD Xerox Research Center Webster 800 Phillips Rd (MS:0147-11B); Webster, NY, 14580 Ph: (585) 422-2375 Fx: (585) 231-8404 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] double buffering in windows() not working
Fine for me, and I cannot investigate anything since there is not even a single piece of reproducible code given. Uwe Ligges On 15.05.2012 23:20, Daniel Carr wrote: I have doubled buffered animations that I show in class. They used to work but now flash. The default windows() option is buffered = TRUE. Just in case, I tried using windows( buffered = TRUE) but this made no difference. I am not sure when the change occurred. An older R2.11 version in one class room worked. R2.14.1, R2.14.2 and R2.15 don't work on my computer. Some of the animations add to the plot, for example using points and segments. I thought that might be triggering the buffer swap, but just drawing filled circles causes flashing. I am using XP and the 32 bit version. I think I tried it with Windows7 and still had a problem. My RSeek search did not turn up anything recent problem related to buffering. Thanks in advance for help provided Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart - predict terminal nodes for new observations
On 15.05.2012 16:30, tudor wrote: Dear useRs: Is there a way I could predict the terminal node associated with a new data entry in an rpart environment? In the example below, if I had a new data entry with an AM of 5, I would like to link it to the terminal node 2. My searches led to http://tolstoy.newcastle.edu.au/R/e4/help/08/07/17702.html but I do not seem to be able to operationalize Professor Ripley's suggestions. Use the predict() function. Uwe Ligges Many thanks. Tudor tree.prune n= 2400 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 2400 779 0 (0.6754167 0.3245833) 2) AM 6.5 1428 254 0 (0.8221289 0.1778711) * 3) AM=6.5 972 447 1 (0.4598765 0.5401235) 6) P=10.39666 390 86 0 (0.7794872 0.2205128) * 7) P 10.39666 582 143 1 (0.2457045 0.7542955) * -- View this message in context: http://r.789695.n4.nabble.com/rpart-predict-terminal-nodes-for-new-observations-tp4630104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging multiple data sets
On 16.05.2012 15:51, Bharat Warule wrote: Hello R user, I have four data sets in dir D:/Bharat Warule/Rdata_file which are output_data_prod_1.rda, output_data_prod_2.rda, output_data_prod_3.rda, output_data_prod_4.rda. Each data set is huge size like number of rows 343297 and columns are near to 50. For example: x1- data.frame(x11=c(1,2,3,4,5),x112=c(10,10,10,10,10)) x2- data.frame(x11=c(1,2,3,4,5),x122=c(20,20,20,20,20)) x3- data.frame(x11=c(1,2,3,4,5),x132=c(30,30,30,30,30)) x4- data.frame(x11=c(1,2,3,4,5),x142=c(40,40,40,40,40)) x5- data.frame(x11=c(1,2,3,4,5),x152=c(50,50,50,50,50)) for(i in 1:5){ name- paste('x',i,sep='') name1- paste(name,rda,sep='.') save(name, file = name1) To fix this part, use: save(list = name, file = name1) } I want merge this data sets into one data set but I don’t know where I am going wrong? Please help me. Thanks for your help. subsetname- x1 file_no- 4 output_data_prod- data.frame() for(n in 1:file_no){ myfile- gsub(( ), , paste(subsetname , _, n,.rda)) To match the above: myfile- gsub(( ), , paste(subsetname , n, .rda, sep=)) temp_data- load(file = myfile) data_22- get(temp_data) if(dim(output_data_prod)[1]==0){output_data_prod- data_22 }else{ output_data_prod- merge(inData1 = output_data_prod, Nonsense, the arguments of merge are called x and y rather than inData1 and inData2. Uwe Ligges inData2 = data_22 ,type = inner, all=FALSE , by =c(x11))} } - Bharat Warule Cypress Analytica , Pune -- View this message in context: http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp4630244.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
On 16.05.2012 15:52, aramos wrote: Hi! Any one knows how to obtain critical values for the k-s statistic, using R? ks.test(.)$statistic Uwe ligges Thanks, Alex -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install ggplot2 package
It looks like there might be a mirror problem -- use chooseCRANmirror() to select a different mirror. Best, Michael On Wed, May 16, 2012 at 10:21 AM, Yang, Ming ming.y...@xerox.com wrote: Has one try to install the ggplot2 package recently? I tried to install it on my new system and had trouble: install.packages(ggplot2) Installing package(s) into 'C:/Program Files/R/R-2.14.2/library' (as 'lib' is unspecified) also installing the dependency 'scales' trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : download of package 'scales' failed trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : download of package 'ggplot2' failed Thanks Ming Ming Yang, PhD Xerox Research Center Webster 800 Phillips Rd (MS:0147-11B); Webster, NY, 14580 Ph: (585) 422-2375 Fx: (585) 231-8404 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install ggplot2 package
Looks like your mirror was in an inconstant state. Seems to be fixed by a finished rysnc in the meantime ... Uwe ligges On 16.05.2012 16:21, Yang, Ming wrote: Has one try to install the ggplot2 package recently? I tried to install it on my new system and had trouble: install.packages(ggplot2) Installing package(s) into 'C:/Program Files/R/R-2.14.2/library' (as 'lib' is unspecified) also installing the dependency 'scales' trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : download of package 'scales' failed trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : download of package 'ggplot2' failed Thanks Ming Ming Yang, PhD Xerox Research Center Webster 800 Phillips Rd (MS:0147-11B); Webster, NY, 14580 Ph: (585) 422-2375 Fx: (585) 231-8404 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install ggplot2 package
Hi Yang, Did you try a different CRAN mirror? Best, Ista On Wed, May 16, 2012 at 10:21 AM, Yang, Ming ming.y...@xerox.com wrote: Has one try to install the ggplot2 package recently? I tried to install it on my new system and had trouble: install.packages(ggplot2) Installing package(s) into 'C:/Program Files/R/R-2.14.2/library' (as 'lib' is unspecified) also installing the dependency 'scales' trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip' Warning in install.packages : download of package 'scales' failed trying URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : cannot open: HTTP status was '404 Not Found' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip' Warning in install.packages : download of package 'ggplot2' failed Thanks Ming Ming Yang, PhD Xerox Research Center Webster 800 Phillips Rd (MS:0147-11B); Webster, NY, 14580 Ph: (585) 422-2375 Fx: (585) 231-8404 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple data.frame question
Dear friends - I hope you will forgive me another simple question, illustrated by ID - c(1,1,1,2,2,3,3,3) PERIOD - c(1,2,3,2,3,1,2,3) X - runif(8,0,10)) FF - data.frame(ID=ID,PERIOD=PERIOD,X=X) I need to the fourth value of X as NA, and ID and PERIOD is updated to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively. How do I use the pattern in ID and PERIOD to find the lacking X and put NA? Best wishes Troels Ring, Aalborg, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding mean and SD for a log-normal distribution
On May 16, 2012, at 6:37 AM, Andras Farkas wrote: Dear R Expert allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 describing my distribution. I would like to convert this distribution into a log normal distribution that would best describe it when resimulated using log normal distribution. Currently I am using another software to estimate the respective mean and SD on the log scale and the results are: 1.6667 and SD 0.47071. Then, to best reproduce my original distribution in R, I use the following commands: c - rlnorm(5000,1.6667,0.47071) d - exp(c) mean(c) sd(c) I get a better match to those values with: distrib - rlnorm(50,1.682,0.47071) (Bad practice to use 'c' as an object name.) and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), respectively, which I am reasonably happy with. I would like to grow independent of the another software I use, but am unable to figure out how to generate the values of 1.6667 and 0.47071 using R. could someone please help me with this question? You need to review your resources on statistical distributions. The Wikipedia article has the needed transformations for parameters between the log and untransformed scales under the section entitled Arithmetic moments. So that was the basis for this test: # mu for LN log(6) - 0.5*log(1+9/6^2) [1] 1.680188 # sigma for LN sqrt( log( 1 +9/6^2)) [1] 0.4723807 c - rlnorm(50,1.680188,0.4723807) d - exp(c) # Expected value mean(c) [1] 5.99303 # SD sd(c) [1] 2.996532 So my half-assed approximation was in better agreement with theory than your other software. On the other hand you haven't really given us much background for this estimation process so its not possible to offer a solid value judgment. R has package that do distribution fitting, MASS has fitdistr and there is a fitdistrplus package and others I believe. There's a monograph out about R's facilities but at the moment I cannot put my hands on my copy. There is a Distributions TaskView: http://cran.r-project.org/web/views/Distributions.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replacing with NA
Dear R users, I was wondering how I can replace the values of a vector with the values from in another vector in the same row For example, how can I replace the value of x below with NA when the value of Z in the same row is NA? x -1:20 z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 10) Many thanks Mintewab Från: Mintewab Bezabih Skickat: den 15 maj 2012 15:53 Till: r-help@r-project.org Kopia: r-help@r-project.org Ämne: missing observations Dear R users, I have missing observations in my data that I remove in my analysis. I am able to run my codes alright but I want the non missing values to be correctly identified and therefore want to tag my id vector along in my results. Since the vector of ids has no role in the analysis, I dont know how to include it. Here is my reprducable example:and my id is the vector I want to add to the analysis somehow so that my missing values are identified. I cannot use na.action function and that is why I have to drop my missing obesevations beforehand. library(fields) x -1:20 y- runif(20) z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16, 17, NA, 12, 10) id -1:20 mydataset-data.frame(x, y, z) temperature[complete.cases(mydataset),] x- temperature[, c(1)] y- temperature[, c(2)] z- temperature[, c(3)] tpsfit - Tps(cbind(x, y), z, scale.type=unscaled) Many thanks as always. Regards, Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replacing with NA
Dear R users, I was wondering how I can replace the values of a vector with the values from in another vector in the same row For example, how can I replace the value of x below with NA when the value of Z in the same row is NA? x -1:20 z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 10) Many thanks Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing with NA
x[is.na(z)] - NA This might send you a nasty bug if x and z are different lengths though -- just a head's up. Michael On Wed, May 16, 2012 at 12:55 PM, Mintewab Bezabih mintewab.beza...@economics.gu.se wrote: Dear R users, I was wondering how I can replace the values of a vector with the values from in another vector in the same row For example, how can I replace the value of x below with NA when the value of Z in the same row is NA? x -1:20 z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 10) Many thanks Mintewab Från: Mintewab Bezabih Skickat: den 15 maj 2012 15:53 Till: r-help@r-project.org Kopia: r-help@r-project.org Ämne: missing observations Dear R users, I have missing observations in my data that I remove in my analysis. I am able to run my codes alright but I want the non missing values to be correctly identified and therefore want to tag my id vector along in my results. Since the vector of ids has no role in the analysis, I dont know how to include it. Here is my reprducable example:and my id is the vector I want to add to the analysis somehow so that my missing values are identified. I cannot use na.action function and that is why I have to drop my missing obesevations beforehand. library(fields) x -1:20 y- runif(20) z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16, 17, NA, 12, 10) id -1:20 mydataset-data.frame(x, y, z) temperature[complete.cases(mydataset),] x- temperature[, c(1)] y- temperature[, c(2)] z- temperature[, c(3)] tpsfit - Tps(cbind(x, y), z, scale.type=unscaled) Many thanks as always. Regards, Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence intervals for nls or nls2 model
If you want a confidence based in new x values you can do. I have this post with steps to do this. It's written in Portuguese but the R code is useful. http://ridiculas.wordpress.com/2011/05/19/bandas-de-confianca-para-modelo-de-regressao-nao-linear/ Bests. Walmes. == Walmes Marques Zeviani LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W) Departamento de Estatística - Universidade Federal do Paraná fone: (+55) 41 3361 3573 VoIP: (3361 3600) 1053 1173 e-mail: wal...@ufpr.br twitter: @walmeszeviani homepage: http://www.leg.ufpr.br/~walmes linux user number: 531218 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how disable the Error massage in read.table() no lines available in input
Dear Researchers, I am looking a way to disable the Error massage in read.table() as warn = TRUE in readLines(), when the lines are empty Error in read.table(con, header = F, sep = , nrow = n) : no lines available in input thanks for all suggestions Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data.frame question
On May 16, 2012, at 11:56 AM, Troels Ring wrote: Dear friends - I hope you will forgive me another simple question, illustrated by ID - c(1,1,1,2,2,3,3,3) PERIOD - c(1,2,3,2,3,1,2,3) X - runif(8,0,10)) Extraneous paren removed: FF - data.frame(ID=ID,PERIOD=PERIOD,X=X) I need to the fourth value of X as NA, and ID and PERIOD is updated to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively. How do I use the pattern in ID and PERIOD to find the lacking X and put NA? ffnew=merge(x=expand.grid(1:3,1:3), + y=FF, by =1:2, all.x=TRUE) ffnew Var1 Var2 X 111 6.6294571 212 0.5749111 313 8.7895630 421NA 522 5.7213062 623 6.1030507 731 8.9182841 832 4.2823937 933 8.8249263 Best wishes Troels Ring, Aalborg, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data.frame question
Thanks a lot - beautiful Troels Den 16-05-2012 19:29, David Winsemius skrev: On May 16, 2012, at 11:56 AM, Troels Ring wrote: Dear friends - I hope you will forgive me another simple question, illustrated by ID - c(1,1,1,2,2,3,3,3) PERIOD - c(1,2,3,2,3,1,2,3) X - runif(8,0,10)) Extraneous paren removed: FF - data.frame(ID=ID,PERIOD=PERIOD,X=X) I need to the fourth value of X as NA, and ID and PERIOD is updated to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively. How do I use the pattern in ID and PERIOD to find the lacking X and put NA? ffnew=merge(x=expand.grid(1:3,1:3), + y=FF, by =1:2, all.x=TRUE) ffnew Var1 Var2 X 111 6.6294571 212 0.5749111 313 8.7895630 421NA 522 5.7213062 623 6.1030507 731 8.9182841 832 4.2823937 933 8.8249263 Best wishes Troels Ring, Aalborg, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding words that are within +/- X words of KRAS using tm package or other means
Hello All, This will probably be easy for some but isn't for me. Currently am working on a text mining exercise. Want to be able to predict whether cancer patients got KRAS testing, and, if so, whether the test yielded a result of wild type/negative or mutant/positive. I've begun with a bag-of-words approach that looks at the count of specific terms in the medical records and then uses some of those as predictors. This works great for predicting whether or not patients got tested. It's not so good though when it comes to predicting the outcome of testing. Trouble is that patients can have a reference to KRAS testing and also have a lot of references to, say, positive where that term has nothing to do with the result of their KRAS testing. So I'd like to be able to identify the number of instances in a patient's medical record where relevant terms like wild type, negative, mutant, or positive come either shortly before or shortly after KRAS. It would be great if there is a way to do this within the tm package. I've found that very helpful for preparing my data thus far. If not though, I have a data frame that contains patient number in one column and the patient's complete text medical record in another. So some sort of regular expression likely would work just fine. Here are some examples of the sort of thing I'm looking to count: Received KRAS testing results on xx/xx/. Test results indicate the presence of a mutation. Tumor is KRAS negative KRAS (mutated) Tumor is positive for KRAS mutation And here's an example of something I want to ignore. Will conduct KRAS testing prior to initiation of therapy. ... (Several lines of material) ... Bilirubin positive. A couple of things stand out here. The first is that I need to be able to pick up on variations of the relevant terms. So, for example, that means being able to identify that either mutant or mutated came in close proximity to KRAS. The other thing is that while increasing the number of words to look forward and backward will identify more valid cases, it will also tend to identify more invalid ones as well. For example, looking as many as 12 words after KRAS will lead to correct identification of: Received KRAS testing results on xx/xx/. Test results indicate the presence of a mutation. but also incorrect identification of: Will conduct KRAS testing prior to initiation of therapy. Note that patient was positive for Lynch mutation. Thinking I will need to to keep the window short in order to obtain the best results. Would be nice if I could easily increase or decrease the number of words to look forward and backward though. Would also be good if I could, say, select a relatively small number of terms to look forward and a larger number of words to look forward. Having gotten to the end of this description it occurs to me this is actually harder than I thought. If one of you gurus could help me out, that would be greatly appreciated. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] transfer R objects back to console/command line
Dear R community, is there any way to invoke R in batch mode, do some calculations and get the values of some R variables back into the (bash)shell ? I only managed to get some output saved into a text file with: R --slave --args 2 2 test.Rtest2.R test.R contains: a - as.numeric(commandArgs()[4]) b - as.numeric(commandArgs()[5]) c=a*b Is ther any way to acess the contents of c in the command line after running R ? Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] transfer R objects back to console/command line
Take a look at this SO question: http://stackoverflow.com/questions/10575005/output-a-boolean-from-an-rscript-into-a-bash-variable None of the solutions are Boolean specific so you should be good with them (the key is printing and capturing) Michael On Wed, May 16, 2012 at 2:36 PM, Jannis bt_jan...@yahoo.de wrote: Dear R community, is there any way to invoke R in batch mode, do some calculations and get the values of some R variables back into the (bash)shell ? I only managed to get some output saved into a text file with: R --slave --args 2 2 test.Rtest2.R test.R contains: a - as.numeric(commandArgs()[4]) b - as.numeric(commandArgs()[5]) c=a*b Is ther any way to acess the contents of c in the command line after running R ? Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
On Wed, May 16, 2012 at 06:52:48AM -0700, aramos wrote: Hi! Any one knows how to obtain critical values for the k-s statistic, using R? Hi. I do not know, whether there is a function for this. However, the following randomized approach allows to extract a table of statistic/p.value pairs from ks.test() for fixed sample sizes. n1 - 30 n2 - 50 d - 1 res - matrix(nrow=d, ncol=2) for (i in seq.int(length=d)) { x1 - runif(n1) + runif(1) x2 - runif(n2) + runif(1) out - ks.test(x1, x2) res[i, 1] - out$statistic res[i, 2] - out$p.value } tab - unique(res[order(res[, 1]), ]) colnames(tab) - c(statistic, p.val) If you are mainly interested in the range of the p-values for relatively close distributions, then replace x1 - runif(n1) + runif(1) x2 - runif(n2) + runif(1) by x1 - runif(n1) x2 - runif(n2) Part of the obtained table is statisticp.val [39,] 0.3000 5.642910e-02 [40,] 0.3067 4.815638e-02 [41,] 0.3133 4.091424e-02 [42,] 0.3200 3.466530e-02 [43,] 0.3267 2.925540e-02 [44,] 0. 2.458672e-02 [45,] 0.3400 2.060188e-02 [46,] 0.3467 1.719140e-02 [47,] 0.3533 1.428992e-02 [48,] 0.3600 1.183727e-02 [49,] 0.3667 9.767969e-03 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scraping a web page.
Thanks Gabor, Nifty regexp. I never used strapplyc before and I am sure this will become a nice addition to my toolkit. KW Message: 5 Date: Tue, 15 May 2012 07:55:33 -0400 From: Gabor Grothendieck ggrothendi...@gmail.com To: Keith Weintraub kw1...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Scraping a web page. Message-ID: CAP01uR=zdxHocxpsZdpT+4Kx2=L2vr9jnr=i=_Qhs39O=qo...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Tue, May 15, 2012 at 7:06 AM, Keith Weintraub kw1...@gmail.com wrote: Thanks, ?That was very helpful. I am using readLines and grep. If grep isn't powerful enough I might end up using the XML package but I hope that won't be necessary. This only uses readLines and strapplyc (from gsubfn). It scrape the relevant strings from your post on nabble and by modifying URL and pat you can likely get it to work with whatever the format of your original files is: library(gsubfn) URL - http://r.789695.n4.nabble.com/Scraping-a-web-page-tp4630005.html; L - readLines(URL) pat - 'br/quot;/en/Ships.*-(\\d{7}).htmlquot;' strapplyc(L, pat, simplify = c) The result from the last line is: [1] 8605507 8122830 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimization problem
Hi, I'm dealing with an optimization problem. I'm using 'optim' to maximize the output of a function, given some restrictions on the input. I would like to know if there is a way to impose some restrictions on 'intermediate variables' of the function. An example.. fx = function (x) { s - 0 for (i in 1:3) { s - x[i]^3 + s } s } optim(rep(4,3), method=L-BFGS-B, lower=rep(-10,nlin), upper=rep(10,nlin)) It would return '-10' for all variables. I want, however, a solution satisfying mean(x)7. Please, don't analyse this specific example, but the logic of satisfying a criterium for the mean of the input (with thousands of variables). My real problem involves price elasticity and I want to find the price increase for each individual that would give me maximum total profit margin, but respecting a minimum retention of clients. Thank you very much, John Mayer -- View this message in context: http://r.789695.n4.nabble.com/Optimization-problem-tp4630278.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret: Error when using rpart and CV != LOOCV
Thanks Max for your answer. First, I do not understand your post. Why is it a problem if two of predictions match? From the formula for calculating R^2 I can see that there will be a DivByZero iff the total sum of squares is 0. This is only true if the predictions of all the predicted points from the test-set are equal to the mean of the test-set. Why should this happen? Anyway, I wrote the following code to check what you tried to tell: -- library(caret) data(trees) formula=Volume~Girth+Height customSummary - function (data, lev = NULL, model = NULL) { print(summary(data$pred)) return(defaultSummary(data, lev, model)) } tc=trainControl(method='cv', summaryFunction=customSummary) train(formula, data=trees, method='rpart', trControl=tc) -- This outputs: --- Min. 1st Qu. MedianMean 3rd Qu.Max. 18.45 18.45 18.45 30.12 35.95 53.44 Min. 1st Qu. MedianMean 3rd Qu.Max. 22.69 22.69 22.69 32.94 38.06 53.44 Min. 1st Qu. MedianMean 3rd Qu.Max. 30.37 30.37 30.37 30.37 30.37 30.37 [cut many values like this] Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - As I didn't understand your post, I don't know if this confirms your assumption. Thanks anyway, Dominik On 16/05/12 17:30, Max Kuhn wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote: Hy, I got the following problem when trying to build a rpart model and using everything but LOOCV. Originally, I wanted to used k-fold partitioning, but every partitioning except LOOCV throws the following warning: Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - Below are some simplified testcases which repoduce the warning on my system. Question: What does this error mean? How can I avoid it? System-Information: - sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 reshape_0.8.4 [6] plyr_1.7.1 lattice_0.20-6 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 [5] tools_2.15.0 --- Simlified Testcase I: Throws warning --- library(caret) data(trees) formula=Volume~Girth+Height train(formula, data=trees, method='rpart') --- Simlified Testcase II: Every other CV-method also throws the warning, for example using 'cv': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='cv') train(formula, data=trees, method='rpart', trControl=tc) --- Simlified Testcase III: The only CV-method which is working is 'LOOCV': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='LOOCV') train(formula, data=trees, method='rpart', trControl=tc) --- Thanks! -- Dominik Bruhn mailto: domi...@dbruhn.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Dominik Bruhn mailto: domi...@dbruhn.de signature.asc Description: OpenPGP digital signature __
Re: [R] caret: Error when using rpart and CV != LOOCV
Sorry for the follow-up, but I dig deeper into the problem. My text on the R^2 was wrong: In my opinion, and at least to Wikipedia, R^2 yields a division by zero iff SStot (the total sum of squares) is zero. SStot is the sum of the sum of the difference between the observed (not the predicted) values and the mean of the observed values. As this value is not dependeant on the the predicted/modelled values, the occurrence of a DivByZero can not dependent on the model but only on the data itself. In short to get a SStot=0 (and therefor a DivByZero), you would need a training-dataset where every value equals the mean of the training-set, therefor a constant dataset. My input and also my trainingset is far from beeing constant, so where is the error? Thanks again! Dominik On 16/05/12 17:30, Max Kuhn wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote: Hy, I got the following problem when trying to build a rpart model and using everything but LOOCV. Originally, I wanted to used k-fold partitioning, but every partitioning except LOOCV throws the following warning: Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - Below are some simplified testcases which repoduce the warning on my system. Question: What does this error mean? How can I avoid it? System-Information: - sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 reshape_0.8.4 [6] plyr_1.7.1 lattice_0.20-6 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 [5] tools_2.15.0 --- Simlified Testcase I: Throws warning --- library(caret) data(trees) formula=Volume~Girth+Height train(formula, data=trees, method='rpart') --- Simlified Testcase II: Every other CV-method also throws the warning, for example using 'cv': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='cv') train(formula, data=trees, method='rpart', trControl=tc) --- Simlified Testcase III: The only CV-method which is working is 'LOOCV': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='LOOCV') train(formula, data=trees, method='rpart', trControl=tc) --- Thanks! -- Dominik Bruhn mailto: domi...@dbruhn.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Dominik Bruhn mailto: domi...@dbruhn.de signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Job posting - Statistical Consultant - Univ. of Texas at Austin
All, Just to get the word out: We are looking for a new Statistical Consultant at the Division of Statistics and Scientific Computation here at the University of Texas at Austin. Please pass along to any colleagues who might be interested... http://ssc.utexas.edu/people/employment Thanks, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
I think that command will give me the statistics observed value!! Not quantiles from the k-s distribution! -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting t copula with fixed dof
I need to fit a t copula with fixed degree of freedom let's say 4. I do not want to estimate the dof together with correlation matrix optimally. Instead fix the dof to 4 and only estimate the correlation matrix in the optimization routine. Is anyone aware of such estimation method in R. The packages and functions that I know of can't do this estimation. I searched online but couldn't find anything. I will appreciate any help/comments. Best Regards Ibrahim Ergen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
Thanks, I've already done that!! What is OSS? -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed for efficient way to loop through rows and columns
Yes here it is. I actually convert them all as strings, initially using options(stringsAsFactors=F) at the top of my code. This what the initial dataframe looks like. Please note this is a toy dataset: namesXYZAorB S1BBBBBBA S2AABBBBA S3ABABAAB S4AAAANAB And the code to create this initial dataframe is: names - c(S1, S2, S3, S4) X - c(BB, AA, AB, AA) Y - c(BB, BB, AB, AA) Z - c(BB, BB, AA, NA) AorB - c(A, A, B, B) sample - data.frame(names, X, Y, Z, AorB) The final data.frame should look like: names XYZAorB S1000A S2200A S3110B S400NA B You're right! - I'll should be able to globally change all ABs and BAs to 1s. Thanks:) I'm not exactly sure how to change AA and BB depending on AorB for each row though. Thoughts? Thanks for your help thus far, David. Best, Priya On Wed, May 16, 2012 at 6:53 AM, David L Carlson dcarl...@tamu.edu wrote: Can you show us what you want the final data.frame to look like? You've created five variables stored as factors and you seem to be trying to change those to numeric values? Is that correct? Since AB and BA are always set to 1, you could just replace those values globally rather than mess with the ifelse commands for those values. Only AA and BB are affected by the value of AorB. Your apply() function processes the data.frame by row so i is a vector consisting of all the values in the row. You seem to be coding as if i was a single integer (as in a for loop). -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Priya Bhatt Sent: Wednesday, May 16, 2012 3:08 AM To: r-help@r-project.org Subject: [R] Help needed for efficient way to loop through rows and columns Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called sample: names - c(S1, S2, S3, S4) X - c(BB, AB, AB, AA) Y - c(BB, BB, AB, AA) Z - c(BB, BB, AB, NA) AorB - c(A, A, A, B) sample - data.frame(names, X, Y, Z, AorB) for a given row, if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2 I've been trying to write this using apply and ifelse statements in hopes that my code runs quickly, but I'm afraid I've make a big mess. See below: apply(sample, 1, function(i) { ifelse(sample$AorB[i] == A, (ifelse(sample[i,] == AA, sample[i,] - 2 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 0, sample[i,] - NA )) ) ) , ifelse(sample$AorB[i,] == B), (ifelse(sample[i,] == AA, sample[i,] - 0 , ifelse(sample[i,] == AB || sample[i,] == BA , sample[i,] - 1, ifelse(sample[i,] == BB, sample[i,] - 2, sample[i,] - NA) }) Any Advice? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
Open source software (what you're driving) Michael On Wed, May 16, 2012 at 12:27 PM, aramos ara...@fep.up.pt wrote: Thanks, I've already done that!! What is OSS? -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data.frame question
Hi Troels, Not sure this is what you want. X-runif(9,0,10) FF1-data.frame(ID=c(1,2,3)[rep(c(1,1,1,2,2,2,3,3,3))], PERIOD=c(1,2,3)[rep(c(1,2,3),times=3)],X=X) FF1$X[4]-NA FF1 ID PERIOD X 1 1 1 8.27119347 2 1 2 9.64698097 3 1 3 2.74132386 4 2 1 NA 5 2 2 4.29322683 6 2 3 5.09269667 7 3 1 4.07936332 8 3 2 7.41808455 9 3 3 0.01558664 A.K. - Original Message - From: Troels Ring tr...@gvdnet.dk To: r-help@r-project.org Cc: Sent: Wednesday, May 16, 2012 11:56 AM Subject: [R] simple data.frame question Dear friends - I hope you will forgive me another simple question, illustrated by ID - c(1,1,1,2,2,3,3,3) PERIOD - c(1,2,3,2,3,1,2,3) X - runif(8,0,10)) FF - data.frame(ID=ID,PERIOD=PERIOD,X=X) I need to the fourth value of X as NA, and ID and PERIOD is updated to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively. How do I use the pattern in ID and PERIOD to find the lacking X and put NA? Best wishes Troels Ring, Aalborg, Denmark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] TukeyHSD plot error
Hi, I am seeking help with an error when running the example from R Documentation for TukeyHSD. The error occurs with any example I run, from any text book or website. thank you... plot(TukeyHSD(fm1, tension)). Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) : error in evaluating the argument 'x' in selecting a method for function 'plot': Error in UseMethod(vcov) : no applicable method for 'vcov' applied to an object of class NULL ?TukeyHSD require(graphics) summary(fm1 - aov(breaks ~ wool + tension, data = warpbreaks)) Df Sum Sq Mean Sq F value Pr(F) wool 1451 450.7 3.339 0.07361 . tension 2 2034 1017.1 7.537 0.00138 ** Residuals 50 6748 135.0 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 TukeyHSD(fm1, tension, ordered = TRUE) Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks) $tension difflwr upr p adj M-H 4.72 -4.6311985 14.07564 0.4474210 L-H 14.72 5.3688015 24.07564 0.0011218 L-M 10.00 0.6465793 19.35342 0.0336262 plot(TukeyHSD(fm1, tension)) Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) : error in evaluating the argument 'x' in selecting a method for function 'plot': Error in UseMethod(vcov) : no applicable method for 'vcov' applied to an object of class NULL sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) attached base packages: [1] grid tcltk splines stats graphics grDevices datasets utils methods base other attached packages: [1] RcmdrPlugin.HH_1.1-30 HH_2.2-30 latticeExtra_0.6-19 RColorBrewer_1.0-5 [5] leaps_2.9 multcomp_1.2-12 mvtnorm_0.9-9992 NADA_1.5-4 [9] ggplot2_0.9.0 Rcmdr_1.8-3 car_2.0-12 nnet_7.3-1 [13] DAAG_1.12 survival_2.36-12 randomForest_4.6-6 rpart_3.1-52 [17] RODBC_1.3-5 tree_1.0-29 spatstat_1.25-5 mgcv_1.7-13 [21] sciplot_1.0-9 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [25] maptools_0.8-14 foreign_0.8-49nlme_3.1-103 MASS_7.3-17 [29] boot_1.3-4sp_0.9-98 odesolve_0.9-9 mcmc_0.8 [33] lme4_0.999375-42 Matrix_1.0-6 lattice_0.20-6 chron_2.3-42 [37] akima_0.5-7 rcom_2.2-3.1.1rscproxy_1.3-1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 memoise_0.1 munsell_0.3 plyr_1.7.1 [7] proto_0.3-9.2reshape2_1.2.1 scales_0.2.0 stats4_2.15.0 stringr_0.6 tools_2.15.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error code trying to extract second column from coeftest output
I want to use the standard error values in the summary that is produced using coeftest, but I am getting an error code- any ideas? library(lmtest) coeftest(lmodT_WBHO) t test of coefficients: Estimate Std. Error t value Pr(|t|) t1W 5.948190.17072 34.8410 2.2e-16 *** t2W 6.562160.17438 37.6322 2.2e-16 *** t3W 6.082520.16525 36.8082 2.2e-16 *** t4W 6.180410.17028 36.2949 2.2e-16 *** t1B 5.50.50566 10.8768 2.2e-16 *** t2B 5.650000.53034 10.6535 2.2e-16 *** t3B 4.523810.51756 8.7406 2.2e-16 *** t4B 4.380950.51756 8.4646 2.2e-16 *** t1H 5.050000.53034 9.5221 2.2e-16 *** t2H 4.80.55903 8.5465 2.2e-16 *** t3H 5.526320.54412 10.1564 2.2e-16 *** t4H 4.714290.63388 7.4372 2.236e-13 *** t1O 5.176470.57524 8.9988 2.2e-16 *** t2O 5.818180.50566 11.5060 2.2e-16 *** t3O 6.50.63388 10.2543 2.2e-16 *** t4O 5.714290.63388 9.0147 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 se1 - coeftest(lmodT_WBHO)$coef[,2] Error in coeftest(lmodT_WBHO)$coef : $ operator is invalid for atomic vectors -- View this message in context: http://r.789695.n4.nabble.com/error-code-trying-to-extract-second-column-from-coeftest-output-tp4630298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting reliable financial ratios
Check out this site: http://www.gummy-stuff.org/Yahoo-data.htm It shows how to download a .csv file with the data you might want. Here is an example URL: http://finance.yahoo.com/d/quotes.csv?s=XOM+BBDb.TO+JNJ+MSFTf=snd1l1yrr2 The r2 in the above URL means P/E ratio. You should be able to automate this in R pretty easily. An endeavor I leave to the reader. Good luck, KW [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kolmogorov-Smirnov critical values
On May 16, 2012, at 12:27 PM, aramos wrote: Thanks, I've already done that!! But the illustration for how you get the statistics is in the code. Describe what you want: number of samples, two versus single sided, two sample versus comparing to theory, which table columns should be used. Then someone can probably help. -- View this message in context: http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scraping a web page.
Duncan, Thanks for the advice. It turns out that the web pages are pretty well behaved. I ended up using readHTMLTable str_select grep gsub readLines When I have time I am going to convert my code to use the html parser and the more robust getNodeSet method that you mention below. Thanks for your detailed reply, KW Message: 139 Date: Tue, 15 May 2012 21:02:05 -0700 From: Duncan Temple Lang dun...@wald.ucdavis.edu To: r-help@r-project.org Subject: Re: [R] Scraping a web page. Message-ID: 4fb326bd.9080...@wald.ucdavis.edu Content-Type: text/plain; charset=ISO-8859-1 Hi Keith Of course, it doesn't necessarily matter how you get the job done if it actually works correctly. But for a general approach, it is useful to use general tools and can lead to more correct, more robust, and more maintainable code. Since htmlParse() in the XML package can both retrieve and parse the HTML document doc = htmlParse(the.url) is much more succinct than using curlPerform(). However, if you want to use RCurl, just use txt = getURLContent(the.url) and that replaces h = basicTextGatherer() curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update) h$value() If you have parsed the HTML document, you can find the a nodes that have an href attribute that start with /en/Ships via hrefs = unlist(getNodeSet(doc, //a[starts-with(@href, '/en/Ships')]/@href)) The result is a character vector and you can extract the relevant substrings with substring() or gsub() or any wrapper of those functions. There are many benefits of parsing the HTML, including not falling foul of as far as I can tell the the a tag is always on it's own line being not true. D. On 5/15/12 4:06 AM, Keith Weintraub wrote: Thanks, That was very helpful. I am using readLines and grep. If grep isn't powerful enough I might end up using the XML package but I hope that won't be necessary. Thanks again, KW -- On May 14, 2012, at 7:18 PM, J Toll wrote: On Mon, May 14, 2012 at 4:17 PM, Keith Weintraub kw1...@gmail.com wrote: Folks, I want to scrape a series of web-page sources for strings like the following: /en/Ships/A-8605507.html /en/Ships/Aalborg-8122830.html which appear in an href inside an a tag inside a div tag inside a table. In fact all I want is the (exactly) 7-digit number before .html. The good news is that as far as I can tell the the a tag is always on it's own line so some kind of line-by-line grep should suffice once I figure out the following: What is the best package/command to use to get the source of a web page. I tried using something like: if(url.exists(http://www.omegahat.org/RCurl;)) { h = basicTextGatherer() curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update) # Now read the text that was cumulated during the query response. h$value() } which works except that I get one long streamed html doc without the line breaks. You could use: h - readLines(http://www.omegahat.org/RCurl;) -- or -- download.file(url = http://www.omegahat.org/RCurl;, destfile = tmp.html) h = scan(tmp.html, what = , sep = \n) and then use grep or the XML package for processing. HTH James [[alternative HTML version deleted]] -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding mean and SD for a log-normal distribution
On May 16, 2012, at 12:37 , Andras Farkas wrote: Dear R Expert allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 describing my distribution. I would like to convert this distribution into a log normal distribution that would best describe it when resimulated using log normal distribution. Currently I am using another software to estimate the respective mean and SD on the log scale and the results are: 1.6667 and SD 0.47071. Then, to best reproduce my original distribution in R, I use the following commands: c - rlnorm(5000,1.6667,0.47071) d - exp(c) mean(c) sd(c) and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), respectively, which I am reasonably happy with. I would like to grow independent of the another software I use, but am unable to figure out how to generate the values of 1.6667 and 0.47071 using R. could someone please help me with this question? Perhaps this was what you were looking for: d - log(c) mean(d) [1] 1.675003 sd(d) [1] 0.4656469 Taking exp() of a log-normal rarely makes much sense. More commonly, you take log() to get a normal distribution. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble with ifelse statement
Hello, I apologize in advance for not providing sample data, I'm a very new to R and can't easily generate appropriate sample data quickly. I'm hoping someone can offer advice without it. This code below works and does what I want it to do, which is for a given row in my dataframe, where the variable peak.cort = max, it makes the value of another variable max.cort = to match the value of a third variable cortisol for that row. * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - (raw.saliva.data$cortisol[index]) * Now, I want to execute this function only if the value of a fourth variable, sample is 1 and 5. I tried to add an ifelse statement to the code above so that it looks like this: * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - ifelse(sample1 sample5, raw.saliva.data$cortisol[index], NA) * and I get this error: Error in sample 1 : comparison (6) is possible only for atomic and list types I can't figure out how to fix this problem. Any advice is appreciated. Thank you. -- *Melissa* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival survfit with newdata
Dear all, I am confused with the behaviour of survfit with newdata option. I am using the latest version R-2-15-0. In the simple example below I am building a coxph model on 90 patients and trying to predict 10 patients. Unfortunately the survival curve at the end is for 90 patients. Could somebody please from the survival package confirm that this behaviour is as expected or not - because I cannot find a way of using 'newdata' with really new data. Thanks in advance. DK x-matrix(rnorm(100*20),100,20) time-runif(100,min=0,max=7) status-sample(c(0,1), 100, replace = TRUE) trainX-x[11:100,] trainTime-time[11:100] trainStatus-status[11:100] testX-x[1:10,] coxph.model- coxph(Surv(trainTime,trainStatus)~ trainX) sfit- survfit(coxph.model,newdata=data.frame(testX)) dim(sfit$surv) [1] 90 90 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] clusters in zero-inflated negative binomial models
Lies Durnez ldurnez at itg.be writes: I want to build a model in R based on animal collection data, that look like the following NrVillage DistrictSiteSurvey Species Count 1 AX A F Dry B 0 2 AY A V Wet A 5 3 BX B F Wet B 1 4 BY B V Dry B 0 Each data point shows one collection unit in a certain Village, District, Site, and Survey for a certain Species. 'Count' is the number of animals collected in that collection unit. It is possible that zero animals are collected in that unit because of very low densities, but also because of climatic conditions (wind, rain, etc), so we would expect an excess in zeroes. I have tested that the data are overdispersed (variance much bigger than mean), so a zero-inflated negative binomial model seems the most suitable model in this case. [snip snip snip] However, the animal collections were only done in 4 districts, and in each district 3 villages were chosen (a total of 12 villages). This should be included in the design. The package survey allows this for the standard negative binomial model, but it seems to me that it is not possible for the zero-inflated NB. So, my question is two-fold: 1. Is a zero-inflated NB possible in the survey package. If yes, how? 2. If no, how can I build a zero-inflated NB model that takes into account the clustering of the observations (animal counts) in villages and the clustering of the villages in districts. Treating villages and districts as random effects (clusters) basically puts you in the domain of generalized linear mixed models. You can use the glmmADMB package to fit zero-inflated, mixed negative binomial models. You can also use the MCMCglmm package to fit lognormal-Poisson models, which are another form of overdispersed count data (it depends how strongly you require that the actual model be NB as opposed to just a reasonable model for overdispersed count data). 4 districts is not very many for estimating an among-district variance (which is basically what you are doing when you fit a clustered/ mixed model), so I might suggest using district as a fixed effect, but then using district:village (i.e. the interaction between district and village, or village alone if they are uniquely labeled). http://glmm.wikidot.com/faq may be useful. I would suggest that you send follow-ups to the r-sig-mixed-models at r-project.org mailing list. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimization problem
There are a couple of options. First if you want the mean to equal 7, then that means the sum must equal 21 and therefore you can let optim only play with 2 of the variables, then set the 3rd to be 21-s1-s2. If you want the mean to be greater than 7 then just put in a test, if the mean is less than 7 then return -Inf or another really small number, if the mean is large enough then go on to compute the function that you want to maximize. Also note that you don't need the loop, it can be replaced with sum(s^3). On Wed, May 16, 2012 at 10:44 AM, Pacin Al jok...@gmail.com wrote: Hi, I'm dealing with an optimization problem. I'm using 'optim' to maximize the output of a function, given some restrictions on the input. I would like to know if there is a way to impose some restrictions on 'intermediate variables' of the function. An example.. fx = function (x) { s - 0 for (i in 1:3) { s - x[i]^3 + s } s } optim(rep(4,3), method=L-BFGS-B, lower=rep(-10,nlin), upper=rep(10,nlin)) It would return '-10' for all variables. I want, however, a solution satisfying mean(x)7. Please, don't analyse this specific example, but the logic of satisfying a criterium for the mean of the input (with thousands of variables). My real problem involves price elasticity and I want to find the price increase for each individual that would give me maximum total profit margin, but respecting a minimum retention of clients. Thank you very much, John Mayer -- View this message in context: http://r.789695.n4.nabble.com/Optimization-problem-tp4630278.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable spatial correlation
Hello, I used correlogram from spatial package to determine correlation scale for my data but just looking with bare eye it seems that the correlation scale varies over the domain. Can someone suggest what would the best way to handle that problem? Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble with ifelse statement
It seems like your problem is that R can't find your variable sample and is instead finding its own sample() function which can't be compared to an integer and is giving your problem. It seems likely that sample is part of raw.saliva.data? If that's the case, change sample -- raw.saliva.data$sample However, I'm thinking that this isn't going to do quite what you really want because sample is long (corresponding to all the rows, not just the subset by index) -- you could probably do something like this instead: raw.saliva.data - within(raw.saliva.data, max.cort[index] - ifelse( (sample 1 sample 5)[index], cortisol[index], NA)) Since you're telling R to look within() raw.saliva.data, lookup should probably work for you. Note that we have to restrict both the (sample) and (cortisol) parts of ifelse() to just the rows index for this to work (else things get lined up wrong) Note finally that you also have to reassign back to raw.saliva.data for this to have an effect. Hope this helps, Michael On Wed, May 16, 2012 at 5:01 PM, Melissa Rosenkranz melissarosenkr...@gmail.com wrote: Hello, I apologize in advance for not providing sample data, I'm a very new to R and can't easily generate appropriate sample data quickly. I'm hoping someone can offer advice without it. This code below works and does what I want it to do, which is for a given row in my dataframe, where the variable peak.cort = max, it makes the value of another variable max.cort = to match the value of a third variable cortisol for that row. * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - (raw.saliva.data$cortisol[index]) * Now, I want to execute this function only if the value of a fourth variable, sample is 1 and 5. I tried to add an ifelse statement to the code above so that it looks like this: * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - ifelse(sample1 sample5, raw.saliva.data$cortisol[index], NA) * and I get this error: Error in sample 1 : comparison (6) is possible only for atomic and list types I can't figure out how to fix this problem. Any advice is appreciated. Thank you. -- *Melissa* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error code trying to extract second column from coeftest output
I recommend that you troubleshoot your own problem using the str function... for example, str( coeftest(lmodT_WBHO)). The error message is not a code... it is perfectly readable English, and it is telling you that the result of calling coeftest is not a list with parts that can be pulled out using the $ operator. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. rl269 rl...@acad.umass.edu wrote: I want to use the standard error values in the summary that is produced using coeftest, but I am getting an error code- any ideas? library(lmtest) coeftest(lmodT_WBHO) t test of coefficients: Estimate Std. Error t value Pr(|t|) t1W 5.948190.17072 34.8410 2.2e-16 *** t2W 6.562160.17438 37.6322 2.2e-16 *** t3W 6.082520.16525 36.8082 2.2e-16 *** t4W 6.180410.17028 36.2949 2.2e-16 *** t1B 5.50.50566 10.8768 2.2e-16 *** t2B 5.650000.53034 10.6535 2.2e-16 *** t3B 4.523810.51756 8.7406 2.2e-16 *** t4B 4.380950.51756 8.4646 2.2e-16 *** t1H 5.050000.53034 9.5221 2.2e-16 *** t2H 4.80.55903 8.5465 2.2e-16 *** t3H 5.526320.54412 10.1564 2.2e-16 *** t4H 4.714290.63388 7.4372 2.236e-13 *** t1O 5.176470.57524 8.9988 2.2e-16 *** t2O 5.818180.50566 11.5060 2.2e-16 *** t3O 6.50.63388 10.2543 2.2e-16 *** t4O 5.714290.63388 9.0147 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 se1 - coeftest(lmodT_WBHO)$coef[,2] Error in coeftest(lmodT_WBHO)$coef : $ operator is invalid for atomic vectors -- View this message in context: http://r.789695.n4.nabble.com/error-code-trying-to-extract-second-column-from-coeftest-output-tp4630298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TukeyHSD plot error
Hmmm, I can't reproduce, but I'm not really sure why that would happen... is there any way you can test this in a --vanilla R session? (That's the UNIX-y way to start a totally clean session; not sure exactly how to achieve that on Windows) Does this happen if you just run example(TukeHSD) directly or only when you copy and paste the commands yourself? Hopefully we'll be able to track this down, but my initial guess is that it's some nasty combination of all the packages you have up. Michael On Wed, May 16, 2012 at 1:16 PM, Bret Jagger cantleavethi...@gmail.com wrote: Hi, I am seeking help with an error when running the example from R Documentation for TukeyHSD. The error occurs with any example I run, from any text book or website. thank you... plot(TukeyHSD(fm1, tension)). Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) : error in evaluating the argument 'x' in selecting a method for function 'plot': Error in UseMethod(vcov) : no applicable method for 'vcov' applied to an object of class NULL ?TukeyHSD require(graphics) summary(fm1 - aov(breaks ~ wool + tension, data = warpbreaks)) Df Sum Sq Mean Sq F value Pr(F) wool 1 451 450.7 3.339 0.07361 . tension 2 2034 1017.1 7.537 0.00138 ** Residuals 50 6748 135.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 TukeyHSD(fm1, tension, ordered = TRUE) Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks) $tension diff lwr upr p adj M-H 4.72 -4.6311985 14.07564 0.4474210 L-H 14.72 5.3688015 24.07564 0.0011218 L-M 10.00 0.6465793 19.35342 0.0336262 plot(TukeyHSD(fm1, tension)) Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) : error in evaluating the argument 'x' in selecting a method for function 'plot': Error in UseMethod(vcov) : no applicable method for 'vcov' applied to an object of class NULL sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) attached base packages: [1] grid tcltk splines stats graphics grDevices datasets utils methods base other attached packages: [1] RcmdrPlugin.HH_1.1-30 HH_2.2-30 latticeExtra_0.6-19 RColorBrewer_1.0-5 [5] leaps_2.9 multcomp_1.2-12 mvtnorm_0.9-9992 NADA_1.5-4 [9] ggplot2_0.9.0 Rcmdr_1.8-3 car_2.0-12 nnet_7.3-1 [13] DAAG_1.12 survival_2.36-12 randomForest_4.6-6 rpart_3.1-52 [17] RODBC_1.3-5 tree_1.0-29 spatstat_1.25-5 mgcv_1.7-13 [21] sciplot_1.0-9 spdep_0.5-45 coda_0.14-6 deldir_0.0-16 [25] maptools_0.8-14 foreign_0.8-49 nlme_3.1-103 MASS_7.3-17 [29] boot_1.3-4 sp_0.9-98 odesolve_0.9-9 mcmc_0.8 [33] lme4_0.999375-42 Matrix_1.0-6 lattice_0.20-6 chron_2.3-42 [37] akima_0.5-7 rcom_2.2-3.1.1 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 memoise_0.1 munsell_0.3 plyr_1.7.1 [7] proto_0.3-9.2 reshape2_1.2.1 scales_0.2.0 stats4_2.15.0 stringr_0.6 tools_2.15.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to install package
Hi, I get the following error while installing a package. Can someone please help? install.packages(memisc) Warning in install.packages : argument 'lib' is missing: using 'C:/Users/ravi/Documents/R/R-2.15.0' Warning in install.packages : downloaded length 8255 != reported length 200 Error in install.packages : Line starting '!DOCTYPE html PUBLI ...' is malformed! thanks -- View this message in context: http://r.789695.n4.nabble.com/Unable-to-install-package-tp4630320.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] triangular matrices input/output
Hi, Is there any package that deals with triangular matrices? Say ways of inputting an upper (lower) triangular matrix? Or convert a vector of length 6 to an upper (lower) triangular matrix (by row/column)? Thanks! - ## PhD candidate in Statistics Big R Fan Big LEGO Fan Big sTaTs Fan ## -- View this message in context: http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble with ifelse statement
Hello, 'sample' is a really bad name for a variable, it's already taken, it's an R function. sample1 sample5 # '' is not vectorized, it's '' you want. # Without 'ifelse' raw.saliva.data$max.cort[index] - raw.saliva.data$cortisol[index sample 1 sample 5] Negate this last conjunction if you want to set the other 'max.cort' to NA, !(index sample 1 sample 5) And, finally, this is untested. Give a small dataset example, including 'sample' (after calling it something else). Hope this helps, Rui Barradas la mer wrote Hello, I apologize in advance for not providing sample data, I'm a very new to R and can't easily generate appropriate sample data quickly. I'm hoping someone can offer advice without it. This code below works and does what I want it to do, which is for a given row in my dataframe, where the variable peak.cort = max, it makes the value of another variable max.cort = to match the value of a third variable cortisol for that row. * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - (raw.saliva.data$cortisol[index]) * Now, I want to execute this function only if the value of a fourth variable, sample is 1 and 5. I tried to add an ifelse statement to the code above so that it looks like this: * index - raw.saliva.data$peak.cort == 'max' raw.saliva.data$max.cort[index] - ifelse(sample1 sample5, raw.saliva.data$cortisol[index], NA) * and I get this error: Error in sample 1 : comparison (6) is possible only for atomic and list types I can't figure out how to fix this problem. Any advice is appreciated. Thank you. -- *Melissa* [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/trouble-with-ifelse-statement-tp4630309p4630316.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Updating Neural Networks
Hi useRs, I apologize if Ive missed some documentation somewhere, but I cant seem to find anything related to this question For a ensemble/data-mining problem, Im trying to train a neural network on my data set and have it output predictions (or coefficients) after varying numbers of epochs (preferably using nnet, as that package seems the most user-friendly to me, but Im open to other packages too). Is there a way to do this, or would I need to rerun nnet for every different epoch value I wish to consider? Thanks so much for your help! Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] triangular matrices input/output
The Matrix package provides good support for many special sorts of matrices, but here it looks like you probably don't need that additional machinery for such small case: makeUpper - function(vec, diag = FALSE){ n - (-1 + sqrt(1 + 8*length(vec)))/2 stopifnot(isTRUE(all.equal(n, as.integer(n if(!diag) n - n + 1 mat - matrix(0, ncol = n, nrow = n) mat[upper.tri(mat, diag)] - vec mat } I think does what you want and it's not too hard to generalize to lower triangular. E.g., v - 1:6 makeUpper(v) makeUpper(v, diag = TRUE) It's not super well tested though so caveat lector. Michael On Wed, May 16, 2012 at 5:09 PM, casperyc caspe...@hotmail.co.uk wrote: Hi, Is there any package that deals with triangular matrices? Say ways of inputting an upper (lower) triangular matrix? Or convert a vector of length 6 to an upper (lower) triangular matrix (by row/column)? Thanks! - ## PhD candidate in Statistics Big R Fan Big LEGO Fan Big sTaTs Fan ## -- View this message in context: http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: Add abline to Single Value qqmath() Plot
On Tue, 15 May 2012, ilai wrote: Apologies in advance if I misinterpret R console insists that I retype ... but actually makes more sense (than sourcing a script) to use the group argument (see the last example in ?qqmath) as in 4 groups in each of 30 panels, or allow.multiple=T, outer=T if you really want separate panels for each transformation. Also can use layout=c(1,1,120) if you need each in a separate page - or say c(3,2,20) for 20 pages of 6 panels each, etc. Regarding your script, there is a syntax error: ilai, Grouping doesn't do what's needed, but the split() function does. Thanks for pointing me in that direction. Rich -- Richard B. Shepard, Ph.D. | Integrity - Credibility - Innovation Applied Ecosystem Services, Inc. |Helping Ensure Our Clients' Futures http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] triangular matrices input/output
Do leave the posts for anyone else who might google the same question. (I don't think you really could delete the post anyways, perhaps only on one mirror) You could probably use some combination or rev() and t() to fill by row, but I haven't thought through the geometry all the way yet. Michael On May 16, 2012, at 8:13 PM, YUProf caspe...@hotmail.co.uk wrote: Hi Michael, I have figured out a 'super' easy way myself and already deleted the post. It can be done using: (no package necssary) d=c(3,6,2,1,4,5) x=matrix(,3,3) # by column, x[!lower.tri(x)]=d I am still trying very hard to think of a way to fit it by row as I sometimes have to! THANKS! Chen == Mr Chen YU PhD candidate in Statistics School of Mathematics, Statistics and Actuarial Science, University of Kent D7/D Woolf College, The Pavilion, Giles Lane, Canterbury, Kent CT2 7BQ Mobile: +44(0)7725003559 == From: michael.weyla...@gmail.com Date: Wed, 16 May 2012 19:41:36 -0400 Subject: Re: [R] triangular matrices input/output To: caspe...@hotmail.co.uk CC: r-help@r-project.org The Matrix package provides good support for many special sorts of matrices, but here it looks like you probably don't need that additional machinery for such small case: makeUpper - function(vec, diag = FALSE){ n - (-1 + sqrt(1 + 8*length(vec)))/2 stopifnot(isTRUE(all.equal(n, as.integer(n if(!diag) n - n + 1 mat - matrix(0, ncol = n, nrow = n) mat[upper.tri(mat, diag)] - vec mat } I think does what you want and it's not too hard to generalize to lower triangular. E.g., v - 1:6 makeUpper(v) makeUpper(v, diag = TRUE) It's not super well tested though so caveat lector. Michael On Wed, May 16, 2012 at 5:09 PM, casperyc caspe...@hotmail.co.uk wrote: Hi, Is there any package that deals with triangular matrices? Say ways of inputting an upper (lower) triangular matrix? Or convert a vector of length 6 to an upper (lower) triangular matrix (by row/column)? Thanks! - ## PhD candidate in Statistics Big R Fan Big LEGO Fan Big sTaTs Fan ## -- View this message in context: http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subreddit for R related stuff
Thought there might be some Redditors lurking on these mailing lists. I created a sub-reddit for R (and by extension Bioconductor) discussions, links, etc. http://www.reddit.com/r/Rsoftware/ This will be the first and only shameless plug. -Robert Robert M. Flight, Ph.D. University of Louisville Bioinformatics Laboratory University of Louisville Louisville, KY PH 502-852-1809 (HSC) PH 502-852-0467 (Belknap) EM robert.fli...@louisville.edu EM rfligh...@gmail.com robertmflight.blogspot.com bioinformatics.louisville.edu/lab github.com/rmflight/general/wiki The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but That's funny ... - Isaac Asimov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using XML package to read RSS
Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: URL - http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom; library(XML) doc - xmlTreeParse(URL) src - xpathApply(xmlRoot(doc), //entry) I get an empty list rather than a list of each of the entry: src list() attr(,class) [1] XMLNodeSet I'm not sure how to fix this. Any suggestions? Do I need to provide a namespace, or is the RSS malformed? Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using XML package to read RSS
Hi James. Yes, you need to identify the namespace in the query, e.g. getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;)) This yeilds 40 matching nodes. (getNodeSet() is more convenient to use when you don't specify a function to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the entire document with the query //) BTW, you want to use xmlParse() and not xmlTreeParse(). D. On 5/16/12 6:40 PM, J Toll wrote: Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: URL - http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom; library(XML) doc - xmlTreeParse(URL) src - xpathApply(xmlRoot(doc), //entry) I get an empty list rather than a list of each of the entry: src list() attr(,class) [1] XMLNodeSet I'm not sure how to fix this. Any suggestions? Do I need to provide a namespace, or is the RSS malformed? Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret: Error when using rpart and CV != LOOCV
Dominik, See this line: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.37 30.37 30.37 30.37 30.37 30.37 The variance of the predictions is zero. caret uses the formula for R^2 by calculating the correlation between the observed data and the predictions which uses sd(pred) which is zero. I believe that the same would occur with other formulas for R^2. Max On Wed, May 16, 2012 at 11:54 AM, Dominik Bruhn domi...@dbruhn.de wrote: Thanks Max for your answer. First, I do not understand your post. Why is it a problem if two of predictions match? From the formula for calculating R^2 I can see that there will be a DivByZero iff the total sum of squares is 0. This is only true if the predictions of all the predicted points from the test-set are equal to the mean of the test-set. Why should this happen? Anyway, I wrote the following code to check what you tried to tell: -- library(caret) data(trees) formula=Volume~Girth+Height customSummary - function (data, lev = NULL, model = NULL) { print(summary(data$pred)) return(defaultSummary(data, lev, model)) } tc=trainControl(method='cv', summaryFunction=customSummary) train(formula, data=trees, method='rpart', trControl=tc) -- This outputs: --- Min. 1st Qu. Median Mean 3rd Qu. Max. 18.45 18.45 18.45 30.12 35.95 53.44 Min. 1st Qu. Median Mean 3rd Qu. Max. 22.69 22.69 22.69 32.94 38.06 53.44 Min. 1st Qu. Median Mean 3rd Qu. Max. 30.37 30.37 30.37 30.37 30.37 30.37 [cut many values like this] Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - As I didn't understand your post, I don't know if this confirms your assumption. Thanks anyway, Dominik On 16/05/12 17:30, Max Kuhn wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote: More information is needed to be sure, but it is most likely that some of the resampled rpart models produce the same prediction for the hold-out samples (likely the result of no viable split being found). Almost every incarnation of R^2 requires the variance of the prediction. This particular failure mode would result in a divide by zero. Try using you own summary function (see ?trainControl) and put a print(summary(data$pred)) in there to verify my claim. Max On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote: Hy, I got the following problem when trying to build a rpart model and using everything but LOOCV. Originally, I wanted to used k-fold partitioning, but every partitioning except LOOCV throws the following warning: Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. - Below are some simplified testcases which repoduce the warning on my system. Question: What does this error mean? How can I avoid it? System-Information: - sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 reshape_0.8.4 [6] plyr_1.7.1 lattice_0.20-6 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 [5] tools_2.15.0 --- Simlified Testcase I: Throws warning --- library(caret) data(trees) formula=Volume~Girth+Height train(formula, data=trees, method='rpart') --- Simlified Testcase II: Every other CV-method also throws the warning, for example using 'cv': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='cv') train(formula, data=trees, method='rpart', trControl=tc) --- Simlified Testcase III: The only CV-method which is working is 'LOOCV': --- library(caret) data(trees) formula=Volume~Girth+Height tc=trainControl(method='LOOCV') train(formula, data=trees, method='rpart', trControl=tc) --- Thanks! -- Dominik Bruhn mailto:
Re: [R] using XML package to read RSS
On Wed, May 16, 2012 at 9:02 PM, Duncan Temple Lang dun...@wald.ucdavis.edu wrote: Hi James. Yes, you need to identify the namespace in the query, e.g. getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;)) This yeilds 40 matching nodes. (getNodeSet() is more convenient to use when you don't specify a function to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the entire document with the query //) BTW, you want to use xmlParse() and not xmlTreeParse(). D. Brilliant! Thank you so much. I never would have figure out specifying the namespace like that. I had tried: src - xpathApply(xmlRoot(doc), //entry, namespaces = http://www.w3.org/2005/Atom;) but that wasn't working. Thanks again, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.