Re: [R] Implementing step-wise linear regression
Hello Troy. A tiny question (without answering your question), why did you choose to do it this way instead of using ?step or ?stepAIC ? Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 3:47 AM, Troy S troysocks-tw...@yahoo.com wrote: Dear R fans, I am trying to do step-wise linear regression using the F-test to decide which variables to admit. Ewout Steyerberg suggests using the F-test for this purpose. I first build a model using no variables using lm(y ~ 1) and then using one variable that is a strong predictor using lm(y ~ x). When I call var.test on these two models, I do not get a significant p-value0.07. But a summary of the second model gives a F-test p-value that is very small. My questions are: Should I be using var.test to run the F-test to decide which variable to add next? What is the difference between the F-test run by var.test and summary.lm? Has step-wise model building using the F-test been programmed already? Thanks! Troy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to measure/rank “variable importance” when using rpart?
Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* For example, here is some dummy code, created so you might show your solutions on it. This example is structured so that it is clear that variable x1 and x2 are important while (in some sense) x1 is more important then x2 (since x1 should apply to more cases, thus make more influence on the structure of the data, then x2). set.seed(31431) n - 400 x1 - rnorm(n) x2 - rnorm(n) x3 - rnorm(n) x4 - rnorm(n) x5 - rnorm(n) X - data.frame(x1,x2,x3,x4,x5) y - sample(letters[1:4], n, T) y - ifelse(X[,2] -1 , b, y) y - ifelse(X[,1] 0 , a, y) require(rpart) fit - rpart(y~., X) plot(fit); text(fit) info.gain.rpart(fit) # your function - telling us on each variable how important it is (references are always welcomed) Thanks! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorization
On Sun, Jan 23, 2011 at 07:29:16PM -0800, eric wrote: Is there a way to vectorize this loop or a smarter way to do it ? y [1] 0.003990746 -0.037664639 0.005397999 0.010415496 0.003500676 [6] 0.001691775 0.008170774 0.011961998 -0.016879531 0.007284486 [11] -0.015083581 -0.006645958 -0.013153103 0.028148639 -0.005724317 [16] -0.027408025 0.014767422 -0.001619691 0.018334730 -0.009747171 x -numeric(length(y)) for (i in 1 :length(y)) { x[i] - ifelse( i==1, 1*(1+y[i]), (1+y[i])*x[i-1]) } x [1] 10039.907 9661.758 9713.912 9815.087 9849.447 9866.110 9946.724 [8] 10065.706 9895.802 9967.888 9817.536 9752.289 9624.016 9894.919 [15] 9838.278 9568.630 9709.934 9694.207 9871.948 9775.724 Basically trying to see how the equity of an investment changes after each return period. Start with $10,000 and a series of returns over time. Figure out the equity after each time period (return). Hello. The cycle computes a cumulative product. The initialization may be add as a common multiplier. So, z in the following should be equal to x up to the machine rounding error. y - c( 0.003990746, -0.037664639, 0.005397999, 0.010415496, 0.003500676, 0.001691775, 0.008170774, 0.011961998, -0.016879531, 0.007284486, -0.015083581, -0.006645958, -0.013153103, 0.028148639, -0.005724317, -0.027408025, 0.014767422, -0.001619691, 0.018334730, -0.009747171) x - numeric(length(y)) for (i in 1:length(y)) { x[i] - ifelse(i==1, 1*(1+y[i]), (1+y[i])*x[i-1]) } z - 1*cumprod(1 + y) max(abs(x - z)) # [1] 1.818989e-12 Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sensitivity logical operators in R
On Sun, Jan 23, 2011 at 11:13:11PM +0100, Marc Jekel wrote: Hello R Fans, Another question for the community that really frightened me today. The following logical comparison produces a false as output: t = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,1,-1,-1,1)) tt = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,-1,1,1,-1)) t == tt This is really strange behavior. Most likely this has something to do how R represents numbers internally and the possible sensitivity of a computer? Does anyone know when this strange behavior occurs and how to fix it? The number 0.7 has infinite expansion in binary 0.1011001100110011001100110011... so is rounded in the standard numeric data type, which is used for speed needed in complex computations. If you know in advance that the result has at most 2 decimal positions, then round(, digits=2) yields the correct comparison round(t, 2) == round(tt, 2) # [1] TRUE athough 0.2 is also not exactly representable. Both sides are rounded to the same representable number. See also http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy for other examples. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glitch in building R package
On 22.01.2011 00:16, Horace Tso wrote: I follow Alan Lenarcic's very helpful tutorial on building R package for Windows (XP), which could be found in www.stat.columbia.edu/~gelman/stuff_for_blog/AlanRPackageTutorial.pdfhttp://www.stat.columbia.edu/~gelman/stuff_for_blog/AlanRPackageTutorial.pdf. The package involves a small dll compiled from some very simple C++ codes. 1. Although the tutorial was certainly very helpful at the time it was written, some parts are outdated these days. Please read Writing R Extensions and the R Installation and Administration manual. 2. You probably forgot to tell your package to do something that corresponds to dyn.load, either in a .FirstLib or in a NAMESPACE directive. Best, Uwe Ligges The build process seemed to work smoothly, until i install. Then I got an error saying the C function was not in the load table. This is rather mysterious because I've been able to call this function from R with dyn.load(name.dll). So the dll is working. The install error says : C:\R-testR CMD INSTALL --build FirstPack_0.1.tar.gz * installing to library 'c:/R/R-2.12.0/library' * installing *source* package 'FirstPack' ... ** libs cygwin warning: MS-DOS style path detected: c:/R/R-2.12.0/etc/i386/Makeconf Preferred POSIX equivalent is: /cygdrive/c/R/R-2.12.0/etc/i386/Makeconf CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames g++ -Ic:/R/R-2.12.0/include -O2 -Wall -c XDemo.cpp -o XDemo.o g++ -Ic:/R/R-2.12.0/include -O2 -Wall -c XDemo_main.cpp -o XDemo_main .o g++ -shared -s -static-libgcc -o FirstPack.dll tmp.def XDemo.o XDemo_main.o -Lc: /R/R-2.12.0/bin/i386 -lR installing to c:/R/R-2.12.0/library/FirstPack/libs/i386 ** R ** data Warning: empty 'data' directory ** preparing package for lazy loading Error in .C(DemoAutoCor, OutVec = as.double(vector(numeric, OutLength)), : C symbol name DemoAutoCor not in load table ERROR: lazy loading failed for package 'FirstPack' * removing 'c:/R/R-2.12.0/library/FirstPack' Here is how i built the package. I have the directory structure as described in Writing R Extensions and I issued the following command in DOS prompt, C:\R-testR CMD build FirstPack * checking for file 'FirstPack/DESCRIPTION' ... OK * preparing 'FirstPack': * checking DESCRIPTION meta-information ... OK * cleaning src cygwin warning: MS-DOS style path detected: C:/R-test/FirstPack_0.1.tar Preferred POSIX equivalent is: /cygdrive/c/R-test/FirstPack_0.1.tar CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames cygwin warning: MS-DOS style path detected: C:/R-test/FirstPack_0.1.tar Preferred POSIX equivalent is: /cygdrive/c/R-test/FirstPack_0.1.tar CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames Warning in readLines(ldpath) : incomplete final line found on 'FirstPack/DESCRIPTION' * checking for LF line-endings in source and make files * checking for empty or unneeded directories WARNING: directory 'FirstPack/data' is empty * building 'FirstPack_0.1.tar.gz' cygwin warning: MS-DOS style path detected: C:/R-test/FirstPack_0.1.tar Preferred POSIX equivalent is: /cygdrive/c/R-test/FirstPack_0.1.tar CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames cygwin warning: MS-DOS style path detected: C:/R-test/FirstPack_0.1.tar Preferred POSIX equivalent is: /cygdrive/c/R-test/FirstPack_0.1.tar CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames Thanks in advance. H [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] From two polynomials one multivariate
Hello I have one function that creates polynomials (i.e legendre.polynomials) I want to use this one to create polynomials for variable x and variable y. legendre.polynomials(2) [[1]] 1 [[2]] x [[3]] -0.5 + 1.5*x^2 the ideal would be to receive the same output but for another variable (eg. y) Then having two equation with different x and y I can create the multinomial I want to. I checked the multipolynom package but it can not do what I am looking for. Best Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing in arguments into function
Hi, If you have the formula stored in a string, you could also use as.formula in your call to lm, like this: form - x ~ y + z lm(as.formula(form)) HTH, Ivan Le 1/23/2011 21:38, Joshua Wiley a écrit : Hi Paul, You need to pass the formula object, not a string. If you have a function that is passing one of its arguments down to lm(), just pass the argument directly, no need to do anything special. Here are some examples using a built in dataset: ## wrapper function foo- function(fooform, ...) { summary(lm(formula = fooform, ...)) } ## seeing it in action foo(mpg ~ hp * wt, data = mtcars) ## save a formula in an object myform- mpg ~ hp * wt ## pass the object to foo() which passes it down foo(myform, data = mtcars) ## pass the formula object myform directly to lm() summary(lm(myform, data = mtcars)) Do one of those answer your question or do what you want? Hope this helps, Josh On Sun, Jan 23, 2011 at 8:46 AM, Paul Evansp.evan...@yahoo.com wrote: Hi, I had a function that looked like: diff- lm(x ~ y + z) How can I pass the argument to the 'lm' function on the fly? E.g., if I pass it in as a string (e.g. x ~ y + z), then the lm function treats it as a string and not a proper argument. many thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gstat error message.
On 23.01.2011 21:47, Kamina Chororoka wrote: Hi, I am a student at the University of Twente ( ITC). I am using the R packages for my data analysis, but for the last few weeks now , I have been getting the error message when trying to work on variograms or krigging. Error : .onLoad failed in loadNamespace() for 'gstat', details: call: fun(...) error: .Random.seed is not an integer vector but of type 'list' Error: package/namespace load failed for 'gstat' You have a .Random.seed in your Workspace that is not compatoible with gstat obviously. Hence type rm(.Random.seed) and try again. Uwe Ligges I have tried several options in vain. I have tried to reinstall, to load the extension from the local drive but, again in vain. I would like to have technical assistance from your desk. Looking to hearing from you soon. Kamina Chororoka. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to slice a zoo object
Hi Would anyone have any pointers on how to slice up a large zoo table. I have the following structure: - str(ZOO_OBJ) zoo [1:632, 1:83] 30.4 30.4 30.4 30.4 30.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:83] COL1 COL2 COL3 COL4 ... - attr(*, index)= POSIXct[1:632], format: 2009-05-01 01:00:00 2009-05-02 01:00:00 ... and I would just like to take only arbitrary columns, i.e. another zoo object with only the columns COL2, COL5, etc.. I've tried various syntactical combinations such as those for data.frames and also tried manipulating the coredata(). What would be nice would be the ability to do this by column names and not their indexes. Any help appreciated and thanks in advance, Blair __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] no font could be found for family Arial
Please do read the posting guide: what OS, what version of R, what graphics device At a guess this was Mac OS X (and this was the wrong list) and you need to repair your Mac OS fonts. There are threads on R-sig-mac about that every couple of months, including this month. On Sun, 23 Jan 2011, emmats wrote: I was re-running some code that I hadn't run in a couple of months to make barplots in R. I didn't change a single thing in the script, but the plots wouldn't work this time around. The plot itself (the bars and axes) will graph in the window, but no text appears. In the console it says I have a number of errors, all of which say no font could be found for family 'Arial'. I have not knowingly changed anything in R and I would like to be able to make barplots with labels and titles again. Does anyone know how to fix this? Thank you! -- View this message in context: http://r.789695.n4.nabble.com/no-font-could-be-found-for-family-Arial-tp3233322p3233322.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-ME] Question on overdispersion
Dear Jarrod, recently you suggested that to overcome overdispersion in a binomial distribution by fitting a observation-level random effect. I was wondering if this method has been described anywhere so I can put a reference in my report. Thanks a lot, Greetings David Jansen Hi Thierry + nameless, It is not necessary to expand the binomial into Bernoulli trials (nor advisable if n and/or the binomial size are large). You can just fit observation-level random effects: dataset$resid-as.factor(1:dim(dataset)[1]) fit3- glmer(cbind(male_chick_no, female_chick_no) ~ 1+(1|FemaleID)+ (1|resid), data = dataset, family = binomial) gives the same answer as fit2 Cheers, Jarrod -- -- David Jansen PhD student in Vocal communication in banded mongoose __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] An introduction to R: 6.3.2
On Jan 23, 2011, at 19:32 , MM wrote: The $ notation, such as accountants$statef, shouldn't it be The $ notation, such as accountants$home, instead? Yes... (Not strictly incorrect, but confusing when we have accountants - data.frame(home=statef, loot=incomes, shot=incomef) a couple of lines earlier.) Fixed for r-devel. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to slice a zoo object
I've solved my own problem using ZOO_OBJ[,COL2, COL5]. The trick was preceding the list of names with a comma as described in the standard ts document. On Mon, Jan 24, 2011 at 9:51 AM, Blair Sutton blai...@gmail.com wrote: Hi Would anyone have any pointers on how to slice up a large zoo table. I have the following structure: - str(ZOO_OBJ) zoo [1:632, 1:83] 30.4 30.4 30.4 30.4 30.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:83] COL1 COL2 COL3 COL4 ... - attr(*, index)= POSIXct[1:632], format: 2009-05-01 01:00:00 2009-05-02 01:00:00 ... and I would just like to take only arbitrary columns, i.e. another zoo object with only the columns COL2, COL5, etc.. I've tried various syntactical combinations such as those for data.frames and also tried manipulating the coredata(). What would be nice would be the ability to do this by column names and not their indexes. Any help appreciated and thanks in advance, Blair __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to slice a zoo object
On Mon, Jan 24, 2011 at 4:51 AM, Blair Sutton blai...@gmail.com wrote: Hi Would anyone have any pointers on how to slice up a large zoo table. I have the following structure: - str(ZOO_OBJ) zoo [1:632, 1:83] 30.4 30.4 30.4 30.4 30.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:83] COL1 COL2 COL3 COL4 ... - attr(*, index)= POSIXct[1:632], format: 2009-05-01 01:00:00 2009-05-02 01:00:00 ... and I would just like to take only arbitrary columns, i.e. another zoo object with only the columns COL2, COL5, etc.. I've tried various syntactical combinations such as those for data.frames and also tried manipulating the coredata(). What would be nice would be the ability to do this by column names and not their indexes. Any help appreciated and thanks in advance, It works the same as matrices: library(zoo) z - zoo(cbind(A = 1:4, B = 5:8, C = 9:12)) z[, c(A, C)] A C 1 1 9 2 2 10 3 3 11 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to slice a zoo object
My previous post had a typo. So as not to confuse future readers of the list it should have read ZOO_OBJ[,cbind(COL2, COL5)]. Thanks Gabor. On Mon, Jan 24, 2011 at 10:54 AM, Blair Sutton blai...@gmail.com wrote: I've solved my own problem using ZOO_OBJ[,COL2, COL5]. The trick was preceding the list of names with a comma as described in the standard ts document. On Mon, Jan 24, 2011 at 9:51 AM, Blair Sutton blai...@gmail.com wrote: Hi Would anyone have any pointers on how to slice up a large zoo table. I have the following structure: - str(ZOO_OBJ) zoo [1:632, 1:83] 30.4 30.4 30.4 30.4 30.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:83] COL1 COL2 COL3 COL4 ... - attr(*, index)= POSIXct[1:632], format: 2009-05-01 01:00:00 2009-05-02 01:00:00 ... and I would just like to take only arbitrary columns, i.e. another zoo object with only the columns COL2, COL5, etc.. I've tried various syntactical combinations such as those for data.frames and also tried manipulating the coredata(). What would be nice would be the ability to do this by column names and not their indexes. Any help appreciated and thanks in advance, Blair __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing in arguments into function
Isn't necessary a formula, but a class that could be coerced to that class: x - rnorm(100) y - rnorm(100) z - rnorm(100) lm(x ~ y + z) On Sun, Jan 23, 2011 at 2:46 PM, Paul Evans p.evan...@yahoo.com wrote: Hi, I had a function that looked like: diff - lm(x ~ y + z) How can I pass the argument to the 'lm' function on the fly? E.g., if I pass it in as a string (e.g. x ~ y + z), then the lm function treats it as a string and not a proper argument. many thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using loglog link in VGAM or creating loglog link for GLM
On Sun, 2011-01-23 at 10:56 -0800, torbjore wrote: I think you guys make it more difficult than it has to be. Estimating probability of success with a loglog-link is equivalent to estimating probability of failure with a cloglog-link, so all you have to do is to change the response variable accordingly (and then you can interpret parameter estimates (or any contrasts in general) as hazard ratios)... Err, isn't that **exactly** what I said?! (May be the you guys wasn't aimed at me but there were only two responses in that thread, one of them from me... and you haven't quoted to what it was you were referring.) Though, I wish there was a loglog-link in glm and lmer so you wouldn't have to do this. You are, in effect, asking other people to maintain more code for you when you can easily alter the coding for the response. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting information from text data
On 2011-01-23 19:28, Deb Midya wrote: Hi R-Users, Thanks in advance. I am using R-2.12.0 on Windows XP. I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: my.corpus- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) In readLines(y, encoding = x$Encoding) : incomplete final line found on 'M:\textmine/slr.txt' So it looks like your slr.txt file has a problem. Inspect it with your editor. x- TermDocMatrix(my.corpus) Error: could not find function TermDocMatrix Where did you get the idea that package tm has this function? I see a function TermDocumentMatrix(). As you can see, R provides a very helpful reminder that you should check the name of the function. Peter Ehlers B. Using package(s) other than tm Once again, thank you very much for the time you have given. Regards, Deb The code: library(tm) my.path- 'M:\\textmine' my.corpus- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) x- TermDocMatrix(my.corpus) x [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Implementing step-wise linear regression
Tal Galili tal.galili at gmail.com writes: Hello Troy. A tiny question (without answering your question), why did you choose to do it this way instead of using ?step or ?stepAIC ? [snip snip] My questions are: Should I be using var.test to run the F-test to decide which variable to add next? What is the difference between the F-test run by var.test and summary.lm? var.test isn't what you want at all; it is for comparing variances among *populations*. Has step-wise model building using the F-test been programmed already? Not completely (?step uses AIC). However add1(...,test=F) is probably what you're looking for. Please note that stepwise regression is *strongly* deprecated by many statisticians, e.g. http://www.stata.com/support/faqs/stat/stepwise.html Harrell, Frank. 2001. Regression Modeling Strategies. Springer. Mundry, Roger, and Charles L. Nunn. 2009. Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution. The American Naturalist 173, no. 1 (January 1): 119-123. doi:10.1086/593303. http://www.journals.uchicago.edu/doi/abs/10.1086/593303. WHITTINGHAM, MARK J., Philip A. Stephens, Richard B. Bradbury, and Robert P. Freckleton. 2006. Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75, no. 5: 1182-1189. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to carry out a hierarchical cluster analysis
I would appreciate any information about how to carry out a hierarchical cluster analysis with cluster of subjects. I need to find cluster of subjects that share many variables. I know that fcp package have a lot of options to carry out the regular hierarchical cluster analysis (cluster of variables). Thank´s in advance Juan Hernández Facultad de Psicología La Laguna University Canary Islands Spain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function comparable to cutpt.coxph from Survival Analysis using S
Quoting Frank H: These relationships rarely occur in nature... I agree; I have seen cutpoint relationships only a handful of times in 25 years of medical work. A better approach is to look at the data using a smoothing spline: options(na.action=na.exclude) fit - coxph(Surv(time,status) ~ age + weight + pspline(x)) temp - predict(fit, type='terms') plot(x, temp[,3]) My primary goal has always been to learn rather than to test. It is common to see upper or lower thresholds, but not cutpoints. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading PostgreSQL data with RODBC
I think this is a problem with quotes. If you look good, you see: seiz.df - sqlFetch(chnl, 'source.MAIN') ... 'source.main': table not found on channel You asked MAIN, but your db can't find main. If you use seiz.df - sqlFetch(chnl, '\source\.\MAIN\') , you problem should be gone. Bart -- View this message in context: http://r.789695.n4.nabble.com/Problem-reading-PostgreSQL-data-with-RODBC-tp3232706p3233977.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] writeRaster with raster package
Hello, I have a problem writing a raster with the raster package. I have the raster object mask which has the following geoinformation: R output mask class : RasterLayer filename: E:/Daten/FE/HyMAP/Luxembourg_2010/Kehlen_Useldange/mosaik/LUX_LC_noOverlap_mask nrow: 5198 ncol: 2813 ncell : 14621974 min value : 0 max value : 255 projection : +proj=utm +zone=31 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0 extent : 710938, 722190, 5501612, 5522404 (xmin, xmax, ymin, ymax) resolution : 4, 4 (x, y) --- If I now use writeRaster to write the file in Envi format, the geoinformation is lost for the object written to the file BUT not for the object in the work space. The content in the file is correct but I do net get the right header information about the geocoding. --- writeRaster(mask,mask, format=ENVI) class : RasterLayer filename: E:/Daten/FE/HyMAP/Luxembourg_2010/Kehlen_Useldange/mosaik/refdata/mask.envi nrow: 5198 ncol: 2813 ncell : 14621974 min value : 0 max value : 1 projection : +proj=utm +zone=31 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0 extent : 0, 2813, -5198, 0 (xmin, xmax, ymin, ymax) resolution : 1, 1 (x, y) mask class : RasterLayer filename: E:/Daten/FE/HyMAP/Luxembourg_2010/Kehlen_Useldange/mosaik/LUX_LC_noOverlap_mask nrow: 5198 ncol: 2813 ncell : 14621974 min value : 0 max value : 255 projection : +proj=utm +zone=31 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0 extent : 710938, 722190, 5501612, 5522404 (xmin, xmax, ymin, ymax) resolution : 4, 4 (x, y) --- Anybody knows what happens here and what can be done to fix that. Any help is highly appreciated! Thanx in advance! Ben -- View this message in context: http://r.789695.n4.nabble.com/writeRaster-with-raster-package-tp3233871p3233871.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear mixed model: question about t-values
Dear all, I have a question about the output of linear mixed model fitted in R using nlme package. In particular, what are the t-values that are given in an output, how are they calculated and based on what test? I guess it cannot be a simple Student t-test, otherwise how can the simple Student t-test test for significance of interactions, right? I cannot find this information in any of R help resources on linear-mixed models and I also checked few books. Example: the part of R output model.b-lme(diff~age+height,random=~1|field/replicate) Linear mixed-effects model fit by REML Fixed effects: diff ~ age + height_cm Value Std.Error DF t-value p-value (Intercept) 172.83559 27.094642 107 6.378958 0. ageyoung -5.28206 11.239981 4 -0.469935 0.6629 height_cm-0.67662 0.450183 107 -1.502997 0.1358 Thank you very much for your help! Best wishes, Olja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with factor analysis
Hi all, I am using the example on page 737 of The R Book by Michael J Crawley, to plot factor loadings against each other (in a multivariate analysis). However the following line code plot(loadings(model)[,1],loadings(model)[,2],pch=16,xlab=Factor 1, ylab=Factor 2) throws an error message Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf I have tried putting in values for the x and y limits, but even then no points appear in my plots. Does anyone know what I am doing wrong? Grateful for any help. Simon Hayward [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] arima/arima0 function
does the arima/arima0 function use the state space form of the model equation even when fitting with the CSS-method? regards Christoph [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Detrended fluctuation analysis
Hi All I was using the DFA() in the fractal package to examine a set of time series data. And I was not sure what the H estimate meant from the summary table. is it the alpha of the power law equation? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] augPred and NAs
I have a problem using augPred from the nlme package on a data set with NAs. Consider this example, modified from the augPred help page: library(nlme) Orthodont[100,]$age=NA Insert NA somewhere in the data set fm1 - lme(Orthodont, random = ~1,na.action=na.exclude) Can still fit with lme augPred(fm1) Error...!? Also with options na.rm=T or na.action=na.exclude Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, : missing value where TRUE/FALSE needed Any help on how to handle NAs in such cases is most appreciated. Neither Pinheiro Bates nor Google helped... Thanks, Morten Pedersen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LTA
Hi everyone, Does anyone know if there is a package to run Latent Transitional Analysis using R? Regards! -- Sebastián Daza sebastian.d...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Offset - usersplits function package RPART
What is the meaning of offset? As in glm() an offset is a variable on the right hand side of the equation that is a fixed part of the predictor. When doing linear regression, or rpart with continuous Y, there is no need for an offset; adding +offset(x3) on the right hand side of the equation gives the same result as (y -x3) on the left. But when y is a classification variable this is not so. I am blanking on the reference, but there has been a paper in the last 2 years that did this nicely. The response was a yes/no variable and there were both continuous predictors like age that were best handled with a logistic model, and genotype ones for which rpart was desired. The author fit the logistic and used the resulting linear predictor as on offset in rpart, then used the genotype classes found in rpart as an offset in the logistic, , back and forth, with convergence in 2-3 iterations. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm() and post hoc test
Dear Alex, I have read your question about post-hoc test: Let's assume that according to Anova(lm(y~a*b, data=d)) the a:b interaction is significant, and I would like to know if there are specific combinations of a and b levels that differ from the control group. Are there any caveats against simply looking at the p-values in the output generated by summary(lm(y~a*b,data=d))? The documentation for summary.lm says nothing about multiple comparison correction, so I assume I need to do that myself, is that correct? I have the same problem about the interpretation of command summary, and generically about the finest post-hoc test for implemented model with the code lm(). I think that R-help is quite insufficient... Which command did you use normally? Thanks a lot, Francesco Nutini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to simulate a variable Xt=Wit+0.5Wit-1 with Wit~U(0,2)
Dear all I simulate a panel data: n - 10 t - 5 nt - n*t pData - data.frame(id = rep(paste(JohnDoe, 1:n, sep = .), each = t),time = rep(1981:1985, n)) rho -0.99#simulate alphai corelated with the xi print(rho) alphai - rnorm(n,mean=0,sd=1)#alphai simulation x- as.matrix(rnorm(nt,1))#xi simulation akro - kronecker(alphai ,matrix(1,t,1))#kronecker of alphai cormat-matrix(c(1,rho,rho,1),nrow=2,ncol=2)#correlation matrix cormat.chold - chol(cormat)#choleski transformation of correlation matrix akrox - cbind(akro,x) ax - akrox%*%cormat.chold ai - as.matrix(ax[,1]) pData$alphai-as.vector(ai) xcorr - as.matrix(ax[,2:(1+ncol(x))]) pData$xcorrei-as.vector(xcorr) library(plm) pData-plm.data(pData, index = c(id, time)) pData But now i need a variable Xt=Wit+0.5Wit-1 with Wit~U(0,2), the code i Try to use is: for (i in 1:n) { p - i*t m - (i-1)*t+1 for (j in m:p){ xt-arima.sim(n=nt, list(ar=c(0.5))) } } Is this the correct way to simulate the AR(1), without the assumption Wit~U(0,2)? How i simulate the variable with the assumption Wit~U(0,2), and put it in my dataframe correctly? Comments and tips are welcome, regards Carlos Brás Statistics Portugal-INE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with expression
I have a problem with expressions. I am trying to create a title where the parameter of interest is displayed as a Greek character. Which parameter is being considered is stored in a character variable. As an example, if I have param - alpha and then do plot(0, 0, main = bquote(Parameter==.(param))) then in the title I get Parameter = alpha, whereas I want the Greek character alpha. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to measure/rank ?variable importance when using rpart?
--- included message Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* ---end Consider the following printout from rpart summary(rpart(time ~ age + ph.ecog + pat.karno, data=lung)) Node number 1: 228 observations,complexity param=0.03665178 mean=305.2325, MSE=44176.93 left son=2 (81 obs) right son=3 (147 obs) Primary splits: pat.karno 75 to the left, improve=0.03661157, (3 missing) ph.ecog1.5 to the right, improve=0.03620793, (1 missing) age75.5 to the right, improve=0.01606491, (0 missing) Surrogate splits: ph.ecog 1.5 to the right, agree=0.787, adj=0.392, (3 split) age 72.5 to the right, agree=0.680, adj=0.089, (0 split) In Breiman, Friedman, Olshen, Stone, the canonical CART book, the pat.karno variable would get .0366 points for this split, ph.ecog would get .0366 * .392 points age would get .0366 * .089 points The reason for adding in surrogates is to account for redundant variables. Suppose for instance that x1=height but so is x10, just measured on a different day. They won't be exactly the same, so one will get picked over the other at any given split; but at the end they should get the same importance score. This calculation is added up over all the splits to get a variable importance. So -- all the necessary ingredients are present. Someone just needs to write the importance function :-) Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to measure/rank ?variable importance when using rpart?
Hi Terry, I've actually already written such a function (based on an old similar question I once asked on this list), which I attached bellow to this e-mail. But I have a few problems with my function: 1) I wasn't sure how to include the surrogate variable importance level into the function (how to access them from the rpart object, how many of them to ask for from the original call to rpart, since it's default is 5 - is that enough? and how should they be presented in the final count-down? should all of these numbers be mixed together??). 2) I'm not sure which split type (error function) makes this a valid method of measuring of variable importance. For example, should we always use information gain for this function (e.g: rpart(..., parms = list(split = information)) ) Or will this also work with the gini index? Here is the function I've written so far: info.gain.rpart - function(fit1, to_plot = T, ylab = sum of all the improvement (in fit$split[, 'improve']), main = Information per variable ,..., sort = T, col) { info_gain - tapply(fit1$splits[, improve], rownames(fit1$splits), sum) # let's order info_gain according to the original order of the letters in the data.frame # needed function: order.x.by.y - function(x,y) order(match(x, y)) # this function gets x/y and returns the order of x so it will be like y x_names - names(attr(fit1, xlevels)) # the original names of the elements info_gain_order - order.x.by.y(names(info_gain),x_names) # the needed new order. info_gain - info_gain[info_gain_order] length_info_gain - length(info_gain) # info.gain - info.gain[c(8,1:7)] if(missing(col)) col - rep(grey, length_info_gain) if(length(col) length_info_gain) col - rep(col, length_info_gain) if(sort) { ss - order(info_gain,decreasing = T) info_gain - info_gain[ss] col - col[ss] # this way we can notice which belongs to which stem... } if(to_plot) barplot(info_gain, ylab = ylab, main = main,col =col,...) return(info_gain) } Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 4:53 PM, Terry Therneau thern...@mayo.edu wrote: --- included message Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* ---end Consider the following printout from rpart summary(rpart(time ~ age + ph.ecog + pat.karno, data=lung)) Node number 1: 228 observations,complexity param=0.03665178 mean=305.2325, MSE=44176.93 left son=2 (81 obs) right son=3 (147 obs) Primary splits: pat.karno 75 to the left, improve=0.03661157, (3 missing) ph.ecog1.5 to the right, improve=0.03620793, (1 missing) age75.5 to the right, improve=0.01606491, (0 missing) Surrogate splits: ph.ecog 1.5 to the right, agree=0.787, adj=0.392, (3 split) age 72.5 to the right, agree=0.680, adj=0.089, (0 split) In Breiman, Friedman, Olshen, Stone, the canonical CART book, the pat.karno variable would get .0366 points for this split, ph.ecog would get .0366 * .392 points age would get .0366 * .089 points The reason for adding in surrogates is to account for redundant variables. Suppose for instance that x1=height but so is x10, just measured on a different day. They won't be exactly the same, so one will get picked over the other at any given split; but at the end they should get the same importance score. This calculation is added up over all the splits to get a variable importance. So -- all the necessary ingredients are present. Someone just needs to write the importance function :-) Terry T. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to carry out a hierarchical cluster analysis
This is a nice tutorial on doing this: http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-analysis http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-analysis Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 3:50 PM, Juan Andres Hernandez jhernandezcabr...@gmail.com wrote: hierarchical cluster analysis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to measure/rank variable importance when using rpart?
Check out caret::varImp.rpart(). It's described in the original CART book. Andy From: Tal Galili Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* For example, here is some dummy code, created so you might show your solutions on it. This example is structured so that it is clear that variable x1 and x2 are important while (in some sense) x1 is more important then x2 (since x1 should apply to more cases, thus make more influence on the structure of the data, then x2). set.seed(31431) n - 400 x1 - rnorm(n) x2 - rnorm(n) x3 - rnorm(n) x4 - rnorm(n) x5 - rnorm(n) X - data.frame(x1,x2,x3,x4,x5) y - sample(letters[1:4], n, T) y - ifelse(X[,2] -1 , b, y) y - ifelse(X[,1] 0 , a, y) require(rpart) fit - rpart(y~., X) plot(fit); text(fit) info.gain.rpart(fit) # your function - telling us on each variable how important it is (references are always welcomed) Thanks! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with expression
On Tue, 25 Jan 2011, David Scott wrote: I have a problem with expressions. I am trying to create a title where the parameter of interest is displayed as a Greek character. Which parameter is being considered is stored in a character variable. As an example, if I have param - alpha param - as.name(alpha) HTH, Chuck and then do plot(0, 0, main = bquote(Parameter==.(param))) then in the title I get Parameter = alpha, whereas I want the Greek character alpha. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordering box plots
On Sun, 2011-01-23 at 17:37 -0600, Stuart Luppescu wrote: [snip] Thanks to Ben and Dennis for their help, but right after I sent the original message, I figured out how to solve my problem. I noticed that boxplot() contains the at= argument. To get the box locations, I used a line like this: box.locs - order(order(aggregate(meas.tab[,meas.name], by=list(meas.tab$unit), mean, na.rm=T)$x)) -- Stuart Luppescu -=- slu .at. ccsr.uchicago.edu University of Chicago -=- CCSR 才文と智奈美の父 -=-Kernel 2.6.36-gentoo-r5 If you give people a linear model function you give them something dangerous.-- John Fox useR! 2004, Vienna (May 2004) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with expression
Thanks. Exactly what I wanted. As usual, I played around with all sorts of things to try and get the expression right, but never thought of as.name. David Scott On 25/01/2011 4:32 a.m., Charles C. Berry wrote: On Tue, 25 Jan 2011, David Scott wrote: I have a problem with expressions. I am trying to create a title where the parameter of interest is displayed as a Greek character. Which parameter is being considered is stored in a character variable. As an example, if I have param- alpha param- as.name(alpha) HTH, Chuck and then do plot(0, 0, main = bquote(Parameter==.(param))) then in the title I get Parameter = alpha, whereas I want the Greek character alpha. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear mixed model: question about t-values
Kostenko, Olga O.Kostenko at nioo.knaw.nl writes: I have a question about the output of linear mixed model fitted in R using nlme package. In particular, what are the t-values that are given in an output, how are they calculated and based on what test? I guess it cannot be a simple Student t-test, otherwise how can the simple Student t-test test for significance of interactions, right? I cannot find this information in any of R help resources on linear-mixed models and I also checked few books. The t-statistics are the ratio of the previously quoted Value (parameter estimate) and Std.Error columns; the p value is a 2-sided test against the null hypothesis that this t-statistic is drawn from a standard t distribution with DF degrees of freedom, i.e. 2*pt(abs(tstat),df=DF,lower.tail=FALSE) You will find this referred to in the literature as a Wald test. For more information the best reference is Pinheiro and Bates 2000 (Springer) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error with source(): invalid 'times' value
hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pairs(), no axis labels/values for upper panel?
Dear all, I want to draw a graph that contains the scatterplot matrix in the lower panel and coefficients in the upper panel. I used and adapted the example for the function pairs but cannot figure out how to get no values and ticks in the upper panel (the values should only be in the lower panel). The upper panel looks odd to me this way. Any hints? Thanks in advance Steffen Here is an example what the graph looks like: # data(mtcars) panel.cor - function(a, b, digits=2, ...) { usr - par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) x-cbind(a,b) x-na.omit(x) n - nrow(x) pp - c(0.025, 0.975) corx - cor(x,method=s)[1, 2] CI-c(tanh(atanh(corx) + qnorm(pp)/sqrt((n - 3)/1.06))) txt1 - paste(rho =,format(c(corx, 0.123456789), digits=digits)[1]) txt2-paste((,format(c(CI,0.123456789)[1],digits=digits),; ,format(c(CI,0.123456789)[2],digits=digits),),sep=) txt3-paste(N =,round(n,0)) txt - paste(txt1,\n,95%KI ,txt2,\n,txt3, sep=) text(0.5, 0.5, txt,cex=.8) } diag.cor-function(a,b, ...) { usr - par(usr); on.exit(par(usr)) par(usr = c(0, 1, 0,1)) rect(0,0,1,1,col=grey) } pairs(mtcars[1:4],upper.panel=panel.cor,diag.panel=diag.cor,label.pos=0.5) # __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Does this work? source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
Le 24. 01. 11 18:22, Gabor Grothendieck a écrit : On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Does this work? source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) Thanks for your quick answer! Unfortunately, it does not change: source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) note this is not a systematic problem, but can't say exactly when/why it works or not... thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Implementing step-wise linear regression
FWIW, I think it fair to say that modern statistical practice generally views stepwise regression as a bad idea, especially in the hands of non-experts lke yourself. The procedures you describe are dangerous: they have an uncomfortably high chance of choosing the wrong variables and leading to widely overoptimistic assessments of the predictive value of the variables that are chosen. This leads to scientifically irreproducible results, otherwise known as nonsense (in polite company; I use another impolite term when I am not being nice). Shrinkage in its various manifestations is a much better way to achieve parsimony. See, e.g. the elasticnet, glmnet, pspline, mgcv, penalized, ... R packages and the MachineLearning task view on CRAN for various approaches and implementations. Better yet, consult a local, knowledgeable statistician to help you with this. Cheers, Bert On Mon, Jan 24, 2011 at 12:03 AM, Tal Galili tal.gal...@gmail.com wrote: Hello Troy. A tiny question (without answering your question), why did you choose to do it this way instead of using ?step or ?stepAIC ? Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 3:47 AM, Troy S troysocks-tw...@yahoo.com wrote: Dear R fans, I am trying to do step-wise linear regression using the F-test to decide which variables to admit. Ewout Steyerberg suggests using the F-test for this purpose. I first build a model using no variables using lm(y ~ 1) and then using one variable that is a strong predictor using lm(y ~ x). When I call var.test on these two models, I do not get a significant p-value—0.07. But a summary of the second model gives a F-test p-value that is very small. My questions are: Should I be using var.test to run the F-test to decide which variable to add next? What is the difference between the F-test run by var.test and summary.lm? Has step-wise model building using the F-test been programmed already? Thanks! Troy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8 LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40 tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
On Mon, Jan 24, 2011 at 12:29 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: Le 24. 01. 11 18:22, Gabor Grothendieck a écrit : On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Does this work? source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) Thanks for your quick answer! Unfortunately, it does not change: source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE, prompt.echo = NULL, continue.echo = + ) note this is not a systematic problem, but can't say exactly when/why it works or not... Check getOption(prompt.echo) and getOption(continue) and try different values for the prompt.echo= and continue.echo= arguments of source. I am able to get your times error by using source(myfile.R, echo = TRUE, continue.echo = NULL) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sm.density.compare, date on x axis?
Is it possible to generate a density plot comparing several (6 or 7) groups of data and have the x-axis in date format (e.g.: %d%b, 10Mar)? When I try to input the data in date format I get an error saying this function only allows 1-d data. Thank you, Claudia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find the sign
Hello :) I wanted to right an expression to check when x and y have the same sign and I wrote the following: if ((x0 y0) || (x0 y0)) which looks pretty ugly to me. Can you please suggest me a better way for that? Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
Do 'str(dep)' to see what dep is and where it comes from. If you have the 'options' set as I suggested, you can do this examination when the error occurs. On Mon, Jan 24, 2011 at 12:41 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8 LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40 tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find the sign
Try this: !diff(sign(c(x, y))) On Mon, Jan 24, 2011 at 4:18 PM, Alaios ala...@yahoo.com wrote: Hello :) I wanted to right an expression to check when x and y have the same sign and I wrote the following: if ((x0 y0) || (x0 y0)) which looks pretty ugly to me. Can you please suggest me a better way for that? Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find the sign
On Monday, January 24, 2011 07:18:03 pm Alaios wrote: Hello :) I wanted to right an expression to check when x and y have the same sign and I wrote the following: if ((x0 y0) || (x0 y0)) which looks pretty ugly to me. Can you please suggest me a better way for that? Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. if( sign( x ) == sign( y ) ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Relative Importance Package question
I have installed the latest relaimpo library ( form their website with the 8 functions) When running pvmd as a type in c=type(pmvd) in calc.relimp I get the error message ...could not find function pmvdcalc (this is in R version 2.12.1) Can anyone help? Paul Prof P Rheeder School of Health Systems and Public Health Faculty of Health Sciences University of Pretoria Room 6:12 HW Snyman North Tel: 012 354 1488 Fax: 012 354 1750 Mobile: 082 779 3054 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with factor analysis
Does anyone know what I am doing wrong? Could be a lot or could be a little, but we have to guess, because you haven't given us the important information. That you are following Crawley is of little or no interest. We need to know what _you_ did. What is model and what's in it? ## str(model) attributes(model) If you fitted your model using factanal then loadings(model)[,1] will fail with the following error message ## loadings(factanal(m1, factors=3)[,1]) Error in factanal(m1, factors = 3)[, 1] : incorrect number of dimensions Even if you did not see such a message it seems likely that model is in the wrong format for loadings to extract anything useful from it. Regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-factor-analysis-tp3234117p3234334.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get loglik parameter from splm package?
splm package is a r implemention of spatial panel data models. and the loglik paremeter is most important infomation for splm methods. but i found the loglik always been null ,it's craze to get right estimation in splm with null loglik. Any one knows the splm package and can get the right loglik ? please help me. thanks -- View this message in context: http://r.789695.n4.nabble.com/how-to-get-loglik-parameter-from-splm-package-tp3234185p3234185.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how many records can R handle
How many records can the R recursive partitioning software handle? We are analyzing 5,000,000 medical records looking at 100 risk factors for the outcome of interest Richard H. White, MD Hibbard E. Williams Endowed Professor of Medicine Chief, Division of General Medicine DIrector, Anticoagulation Service UC Davis Medical Center Suite 2400 PSSB 4150 V Street Sacramento, CA, 95817 Phone 916-734-7005 FAX 916-734-2732 Email: rhwh...@ucdavis.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Train error:: subscript out of bonds
Hi, I am trying to construct a svmpoly model using the caret package (please see code below). Using the same data, without changing any setting, I am just changing the seed value. Sometimes it constructs the model successfully, and sometimes I get an “Error in indexes[[j]] : subscript out of bounds”. For example when I set seed to 357 following code produced result only for 8 iterations and for 9th iteration it reaches to an error that “subscript out of bonds” error. I don’t understand why Any help would be great thanks ### for (i in 1:10) { fit1-NULL; x-NULL; x-which(number==i) trainset-d[-x,] testset-d[x,] train1-trainset[,-ncol(trainset)] train1-train1[,-(1)] test_t-testset[,-ncol(testset)] species_test-as.factor(testset[,ncol(testset)]) test_t-test_t[,-(1)] #CARET::TRAIN fit1-train(train1,as.factor(trainset[,ncol(trainset)]),svmpoly,trControl = trainControl((method = cv),10,verboseIter = F),tuneLength=3) pred-predict(fit1,test_t) t_train[[i]]-table(predicted=pred,observed=testset[,ncol(testset)]) tune_result[[i]]-fit1$results; tune_best-fit1$bestTune; scale1[i]-tune_best[[3]] degree[i]-tune_best[[2]] c1[i]-tune_best[[1]] } -- View this message in context: http://r.789695.n4.nabble.com/Train-error-subscript-out-of-bonds-tp3234510p3234510.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 - ribbon
Dear List, I am having trouble setting the transparency of a ribbon in ggplot2 and was wondering if anybody had any suggestions so far i have a plot exactly as i want it and i want to add a ribbon connecting the ymax and ymin, whci i do with the following command m10 + stat_summary(geom=ribbon, fun.ymin=min, fun.ymax=max) However the ribbon is a dark grey and i want to make it much lighter or change the colour. I was wondering if anybody had any ideas? I have kept the code used so far brief as it is quite long, however i can expand if required. Thanks Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tolerance limits for nls predicted values
Greetings, I would like to calculate tolerancelimits for a series of predicted values from nonlinear regression models. I've been using the tolerance package but the self-starting functions are not in the derivitive tables. When I spellout the functions and supply starting values I repeatedly get an error message regardless of the starting values I use (see output below). Does anyone have a solution for this problem? oh_fit2-nls(MBB ~ SSfpl(AGE, a, b, c, d)) oh_fit2 Nonlinear regression model model: MBB ~ SSfpl(AGE, a, b, c, d) data: parent.frame() a b c d 9.308 36.420 67.982 12.815 residual sum-of-squares: 46834 Number of iterations to convergence: 9 Achieved convergence tolerance: 8.727e-06 tol-nlregtol.int(oh_fit2,side = 2, alpha = 0.05, P = 0.95) The following object(s) are masked from 'temp (position 3)': a, b, c, d Error in deriv.formula(form, beta.names) : Function 'SSfpl' is not in the derivatives table oh_fit3-nls(MBB ~ a + ((b - a)/(1 + exp((c-AGE)/d))), start = list(a = 10, b = 35, c = 65, d = 10)) oh_fit3 Nonlinear regression model model: MBB ~ a + ((b - a)/(1 + exp((c - AGE)/d))) data: parent.frame() a b c d 9.307 36.421 67.982 12.815 residual sum-of-squares: 46834 Number of iterations to convergence: 9 Achieved convergence tolerance: 7.474e-06 tol-nlregtol.int(oh_fit3,side = 2, alpha = 0.05, P = 0.95) Error: Error in nls routine. Consider different starting estimates of the parameters. Type help(nls) for more options. Thank you in advance! Grant. ___ Grant M. Domke Forest Inventory and Analysis U.S. Forest Service Northern Research Station 1992 Folwell Ave. St. Paul, MN 55108 Ph: 651.649.5138 Fax: 651.649.5140 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R factorial microarray analysis with a looped design
Hi I'm using the Limma package to analyze microarray data from a 2x2 factorial loop design. u --- t ll ll v --- v.t The samples hybridized to my arrays are: u/t, t/v.t, v.t/t, v.t/v, v/u, and u/v. My design matrix is: v t v.t v 1 0 0 t 0 1 0 v+v.t 1 0 1 -t-v.t 0 -1 -1 -v -1 0 0 -v-v.t -1 0 -1 I need to find the main effect t, the main effect v, and the interaction effect v.t and have tried several different contrast matrices but none have worked so far since I need to separate t and v from the arrays with v.t on them (and vice versa). Any help would be appreciated. (The array design and design matrix are from Nature Reviews Genetics 3, 579-588 (August 2002) | doi:10.1038/nrg863 by Yang and Speed.) -- View this message in context: http://r.789695.n4.nabble.com/R-factorial-microarray-analysis-with-a-looped-design-tp3234456p3234456.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] crazy loop error.
Dear R-users, This is a loop which is part of a bigger script. I managed to isolate the error in this loop and simplified it to the bare minimum and made it self-contained. a-c(2,3,4,5,5,5,6,6,6,7) for(n in 1:10) { print(paste(n: ,n)) z1-a[n] #make a list container ldata-list() t=1 while(z1==a[n]) { #add dataframes to list ldata[[t]]-paste(hello) n=n+1 t=t+1 } print(--End of while loop---) for(y in 1:length(ldata)) { print(ldata[[y]]) } print(paste(n: ,n)) print(**End of for loop) } This script has a vector a, for-loop, and a nested while-loop. The for-loop runs from 1 to length of a. At every number of a, it enters the while-loop and a hello is saved into list ldata. If the next number in the vector a is a different number from previous then the while-loop is exited and saved hello is printed. If the next number in vector a is same as before then it loops inside the while-loop and several hellos are printed together. Then run-time error is Error in while (z1 == a[n]) { : missing value where TRUE/FALSE needed Thats because an NA creeps in somewhere. The problem can be seen far before that. The full output from the run is below. A lot of stuff was printed to help with the debugging. At n=4, there are three repeats of 5, therefore hello is printed 3 times. n then becomes 7. Then when the for-loop returns to top, n miraculously becomes 5. Hows that!!?? Then on, everything goes wrong. I cannot figure out the problem. [1] n: 1 [1] --End of while loop--- [1] hello [1] n: 2 [1] **End of for loop [1] n: 2 [1] --End of while loop--- [1] hello [1] n: 3 [1] **End of for loop [1] n: 3 [1] --End of while loop--- [1] hello [1] n: 4 [1] **End of for loop [1] n: 4 [1] --End of while loop--- [1] hello [1] hello [1] hello [1] n: 7 [1] **End of for loop [1] n: 5 [1] --End of while loop--- [1] hello [1] hello [1] n: 7 [1] **End of for loop [1] n: 6 [1] --End of while loop--- [1] hello [1] n: 7 [1] **End of for loop [1] n: 7 [1] --End of while loop--- [1] hello [1] hello [1] hello [1] n: 10 [1] **End of for loop [1] n: 8 [1] --End of while loop--- [1] hello [1] hello [1] n: 10 [1] **End of for loop [1] n: 9 [1] --End of while loop--- [1] hello [1] n: 10 [1] **End of for loop [1] n: 10 Error in while (z1 == a[n]) { : missing value where TRUE/FALSE needed Mr Stuck-up. Thanks for any help. Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Train error:: subscript out of bonds
put options(error=utils::recover) in the script and when the error occurs use the browser to examine the values to see where the error is. On Mon, Jan 24, 2011 at 12:44 PM, Neeti nikkiha...@gmail.com wrote: Hi, I am trying to construct a svmpoly model using the caret package (please see code below). Using the same data, without changing any setting, I am just changing the seed value. Sometimes it constructs the model successfully, and sometimes I get an “Error in indexes[[j]] : subscript out of bounds”. For example when I set seed to 357 following code produced result only for 8 iterations and for 9th iteration it reaches to an error that “subscript out of bonds” error. I don’t understand why Any help would be great thanks ### for (i in 1:10) { fit1-NULL; x-NULL; x-which(number==i) trainset-d[-x,] testset-d[x,] train1-trainset[,-ncol(trainset)] train1-train1[,-(1)] test_t-testset[,-ncol(testset)] species_test-as.factor(testset[,ncol(testset)]) test_t-test_t[,-(1)] #CARET::TRAIN fit1-train(train1,as.factor(trainset[,ncol(trainset)]),svmpoly,trControl = trainControl((method = cv),10,verboseIter = F),tuneLength=3) pred-predict(fit1,test_t) t_train[[i]]-table(predicted=pred,observed=testset[,ncol(testset)]) tune_result[[i]]-fit1$results; tune_best-fit1$bestTune; scale1[i]-tune_best[[3]] degree[i]-tune_best[[2]] c1[i]-tune_best[[1]] } -- View this message in context: http://r.789695.n4.nabble.com/Train-error-subscript-out-of-bonds-tp3234510p3234510.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting bioconductor repository in .Rprofile. Is there a permanent way?
I currently set the Bioconductor repository in my .Rprofile using this code (which needs editing for every version number change of Bioconductor): # Choose repositories repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=http://www.bioconductor.org/packages/2.7/bioc;, Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(repos) I'd like to avoid editing the version number. One hack to do so is this code that adds all repositories. setRepositories(ind=1:10) r - getOption(repos) r - r[!is.na(r)] options(repos=r) Is there a simpler way? I've searched for quite a while without finding an answer. Incidentally, the help page for options says: A Bioconductor mirror can be selected by setting options(BioC_mirror): the default value is http://www.bioconductor.org;. The word default is a bit confusing here, because when I start R, I see: R options()$BioC_mirror NULL -- Kevin Wright __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
Put a space after the # in the line #line 516 to avoid the problem. A similar problem also appears in parse(). parse(text=#line 102\nlog(pi)\n) Error in `Encoding-`(`*tmp*`, value = character(0)) : 'value' must be of positive length parse(text=# line 102\nlog(pi)\n) expression(log(pi)) attr(,srcfile) text attr(,wholeSrcref) # line 102 log(pi) (I'm still using 2.12.0.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Monday, January 24, 2011 10:19 AM To: Matthieu Stigler Cc: r-help@r-project.org Subject: Re: [R] error with source(): invalid 'times' value Do 'str(dep)' to see what dep is and where it comes from. If you have the 'options' set as I suggested, you can do this examination when the error occurs. On Mon, Jan 24, 2011 at 12:41 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8 LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40 tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] crazy loop error.
Roy, I have no idea what you're actually trying to do here, but it looks like there would be a more natural R'ish way if you're concerned about grouping consecutive elements of 'a'. At any rate, within your while loop, you're incrementing n by 1, and eventually n will be 10, which will be transformed to 11 when you add 1 to it, and a[11] will be NA, thus the error you receive... Roy Mathew wrote: Dear R-users, This is a loop which is part of a bigger script. I managed to isolate the error in this loop and simplified it to the bare minimum and made it self-contained. a-c(2,3,4,5,5,5,6,6,6,7) for(n in 1:10) { print(paste(n: ,n)) z1-a[n] #make a list container ldata-list() t=1 while(z1==a[n]) { #add dataframes to list ldata[[t]]-paste(hello) n=n+1 t=t+1 } print(--End of while loop---) for(y in 1:length(ldata)) { print(ldata[[y]]) } print(paste(n: ,n)) print(**End of for loop) } This script has a vector a, for-loop, and a nested while-loop. The for-loop runs from 1 to length of a. At every number of a, it enters the while-loop and a hello is saved into list ldata. If the next number in the vector a is a different number from previous then the while-loop is exited and saved hello is printed. If the next number in vector a is same as before then it loops inside the while-loop and several hellos are printed together. Then run-time error is Error in while (z1 == a[n]) { : missing value where TRUE/FALSE needed Thats because an NA creeps in somewhere. The problem can be seen far before that. The full output from the run is below. A lot of stuff was printed to help with the debugging. At n=4, there are three repeats of 5, therefore hello is printed 3 times. n then becomes 7. Then when the for-loop returns to top, n miraculously becomes 5. Hows that!!?? Then on, everything goes wrong. I cannot figure out the problem. [1] n: 1 [1] --End of while loop--- [1] hello [1] n: 2 [1] **End of for loop [1] n: 2 [1] --End of while loop--- [1] hello [1] n: 3 [1] **End of for loop [1] n: 3 [1] --End of while loop--- [1] hello [1] n: 4 [1] **End of for loop [1] n: 4 [1] --End of while loop--- [1] hello [1] hello [1] hello [1] n: 7 [1] **End of for loop [1] n: 5 [1] --End of while loop--- [1] hello [1] hello [1] n: 7 [1] **End of for loop [1] n: 6 [1] --End of while loop--- [1] hello [1] n: 7 [1] **End of for loop [1] n: 7 [1] --End of while loop--- [1] hello [1] hello [1] hello [1] n: 10 [1] **End of for loop [1] n: 8 [1] --End of while loop--- [1] hello [1] hello [1] n: 10 [1] **End of for loop [1] n: 9 [1] --End of while loop--- [1] hello [1] n: 10 [1] **End of for loop [1] n: 10 Error in while (z1 == a[n]) { : missing value where TRUE/FALSE needed Mr Stuck-up. Thanks for any help. Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting bioconductor repository in .Rprofile. Is there a permanent way?
On 01/24/2011 10:45 AM, Kevin Wright wrote: I currently set the Bioconductor repository in my .Rprofile using this code (which needs editing for every version number change of Bioconductor): # Choose repositories repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=http://www.bioconductor.org/packages/2.7/bioc;, Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(repos) I'd like to avoid editing the version number. One hack to do so is this code that adds all repositories. setRepositories(ind=1:10) r - getOption(repos) r - r[!is.na(r)] options(repos=r) Is there a simpler way? I've searched for quite a while without finding an answer. Incidentally, the help page for options says: A Bioconductor mirror can be selected by setting options(BioC_mirror): the default value is http://www.bioconductor.org;. The word default is a bit confusing here, because when I start R, I see: one possibility is to source('http://bioconductor.org/biocLite.R') in .Rprofile, after which biocinstallRepos() provides the correct bioc repositories for the version of R in use; it does clutter the .GlobalEnv a little and would be irritating if, e.g., on a laptop, internet access were slow or not reliable. For the latter I wrote makeActiveBinding(biocLite, local({ env - new.env() function() { if (!exists(biocLite, envir=env, inherits=FALSE)) { evalq(source(http://bioconductor.org/biocLite.R;, local=TRUE), env) } env[[biocLite]] } }), .GlobalEnv) which doesn't make the connection until one accesses the biocLite variable. Martin R options()$BioC_mirror NULL -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how many records can R handle
Exactly what do you want to do with the data? It is 5M rows with 100 columns of data? Do you want to read it all in at once? If so, and if they are numeric, you will need 4GB to hold one copy, and be running on a 64-bit version of R. If you want to do any processing with everything in memory, I would suggest you have at least 16GB of real memory since copies may be made while processing. Can you put this on a data base and only read in the columns you need. I can handle 5M rows with 10 columns on my laptop with 2GB easily. So it all depends on the problem you are trying to solve. On Mon, Jan 24, 2011 at 12:34 PM, Richard White rhwh...@ucdavis.edu wrote: How many records can the R recursive partitioning software handle? We are analyzing 5,000,000 medical records looking at 100 risk factors for the outcome of interest Richard H. White, MD Hibbard E. Williams Endowed Professor of Medicine Chief, Division of General Medicine DIrector, Anticoagulation Service UC Davis Medical Center Suite 2400 PSSB 4150 V Street Sacramento, CA, 95817 Phone 916-734-7005 FAX 916-734-2732 Email: rhwh...@ucdavis.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
On 11-01-24 12:07 PM, Matthieu Stigler wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? There is no such version, but this looks like a bug that was fixed in 2.12.1. Are you using 2.12.0? (I might be wrong about the timing of the fix; if you're using 2.12.1, try 2.12.1-patched.) Duncan Murdoch Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sensitivity logical operators in R
Hi again, I have checked the same code (see below) using MATLAB. It produces the same error (i.e., equal numbers are evaluated as unequal). Do I miss something? Thanks for help! Marc Marc Jekel schrieb: Hello R Fans, Another question for the community that really frightened me today. The following logical comparison produces a false as output: t = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,1,-1,-1,1)) tt = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,-1,1,1,-1)) t == tt This is really strange behavior. Most likely this has something to do how R represents numbers internally and the possible sensitivity of a computer? Does anyone know when this strange behavior occurs and how to fix it? Thank you all! This list is pleasure!!! Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting bioconductor repository in .Rprofile. Is there a permanent way?
Of course, before posting my question, I did RTFM and RTFcode and RTFmailinglists. The key word in my question was simpler. I rejected copying a modified version of the repositories file to my home directory since it has changed numerous times with addition of R-forge etc. Here is another option. More lines of code, but doesn't add unneeded repositories. pp - file.path(R.home(etc), repositories) rr - tools:::.read_repositories(pp) repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=rr[BioCsoft,URL], Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(pp, rr, repos) Martin, I appreciated your clever trick of evaluating on demand. Kevin On Mon, Jan 24, 2011 at 12:45 PM, Kevin Wright kw.s...@gmail.com wrote: I currently set the Bioconductor repository in my .Rprofile using this code (which needs editing for every version number change of Bioconductor): # Choose repositories repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=http://www.bioconductor.org/packages/2.7/bioc;, Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(repos) I'd like to avoid editing the version number. One hack to do so is this code that adds all repositories. setRepositories(ind=1:10) r - getOption(repos) r - r[!is.na(r)] options(repos=r) Is there a simpler way? I've searched for quite a while without finding an answer. Incidentally, the help page for options says: A Bioconductor mirror can be selected by setting options(BioC_mirror): the default value is http://www.bioconductor.org;. The word default is a bit confusing here, because when I start R, I see: R options()$BioC_mirror NULL -- Kevin Wright -- Kevin Wright __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find the sign
if (x*y0) {...} On Mon, Jan 24, 2011 at 1:27 PM, Rainer Schuermann rainer.schuerm...@gmx.net wrote: On Monday, January 24, 2011 07:18:03 pm Alaios wrote: Hello :) I wanted to right an expression to check when x and y have the same sign and I wrote the following: if ((x0 y0) || (x0 y0)) which looks pretty ugly to me. Can you please suggest me a better way for that? Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. if( sign( x ) == sign( y ) ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting bioconductor repository in .Rprofile. Is there a permanent way?
It is easier than that. Use http://www.bioconductor.org/packages/release/bioc or http://www.bioconductor.org/packages/devel/bioc /Henrik On Mon, Jan 24, 2011 at 10:56 AM, Martin Morgan mtmor...@fhcrc.org wrote: On 01/24/2011 10:45 AM, Kevin Wright wrote: I currently set the Bioconductor repository in my .Rprofile using this code (which needs editing for every version number change of Bioconductor): # Choose repositories repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=http://www.bioconductor.org/packages/2.7/bioc;, Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(repos) I'd like to avoid editing the version number. One hack to do so is this code that adds all repositories. setRepositories(ind=1:10) r - getOption(repos) r - r[!is.na(r)] options(repos=r) Is there a simpler way? I've searched for quite a while without finding an answer. Incidentally, the help page for options says: A Bioconductor mirror can be selected by setting options(BioC_mirror): the default value is http://www.bioconductor.org;. The word default is a bit confusing here, because when I start R, I see: one possibility is to source('http://bioconductor.org/biocLite.R') in .Rprofile, after which biocinstallRepos() provides the correct bioc repositories for the version of R in use; it does clutter the .GlobalEnv a little and would be irritating if, e.g., on a laptop, internet access were slow or not reliable. For the latter I wrote makeActiveBinding(biocLite, local({ env - new.env() function() { if (!exists(biocLite, envir=env, inherits=FALSE)) { evalq(source(http://bioconductor.org/biocLite.R;, local=TRUE), env) } env[[biocLite]] } }), .GlobalEnv) which doesn't make the connection until one accesses the biocLite variable. Martin R options()$BioC_mirror NULL -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sensitivity logical operators in R
Marc You have been given the answer already and a solution. See the R FAQ 7.31. As you have discovered this issue is not specific to R. In order to eliminate this problem entirely, you will need a computer system with infinite precision. Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Marc Jekel Sent: Monday, January 24, 2011 11:48 AM Cc: r-help@r-project.org Subject: Re: [R] sensitivity logical operators in R Hi again, I have checked the same code (see below) using MATLAB. It produces the same error (i.e., equal numbers are evaluated as unequal). Do I miss something? Thanks for help! Marc Marc Jekel schrieb: Hello R Fans, Another question for the community that really frightened me today. The following logical comparison produces a false as output: t = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,1,-1,-1,1)) tt = sum((c(.7,.69,.68,.67,.66)-.5)*c(1,-1,1,1,-1)) t == tt This is really strange behavior. Most likely this has something to do how R represents numbers internally and the possible sensitivity of a computer? Does anyone know when this strange behavior occurs and how to fix it? Thank you all! This list is pleasure!!! Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sensitivity logical operators in R
kognDisso wrote: Hi again, I have checked the same code (see below) using MATLAB. It produces the same error (i.e., equal numbers are evaluated as unequal). Do I miss something? 1. It is NOT an error 2. The numbers are NOT equal 3. Please read FAQ 7.31. 3. Do t - tt and you will see something like [1] -2.220446e-16 as answer. Berend -- View this message in context: http://r.789695.n4.nabble.com/sensitivity-logical-operators-in-R-tp3233109p3234921.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 - ribbon
On 1/24/2011 7:44 AM, Sam wrote: Dear List, I am having trouble setting the transparency of a ribbon in ggplot2 and was wondering if anybody had any suggestions so far i have a plot exactly as i want it and i want to add a ribbon connecting the ymax and ymin, whci i do with the following command m10 + stat_summary(geom=ribbon, fun.ymin=min, fun.ymax=max) However the ribbon is a dark grey and i want to make it much lighter or change the colour. I was wondering if anybody had any ideas? I have kept the code used so far brief as it is quite long, however i can expand if required. To see what is happening, a simple toy example works; the details of your specific graph aren't needed. p - ggplot(mtcars, aes(x=carb, y=mpg)) + geom_point() p + stat_summary(geom=ribbon, fun.ymin=min, fun.ymax=max) p + stat_summary(geom=ribbon, fun.ymin=min, fun.ymax=max, alpha=0.2, fill=blue) You can set the transparency (alpha) of the fill and the color of the fill (fill) to whatever you want. You can also set the color of the lines around the fill with colour. As fixed (not data determined) values, they are not inside an aes call. Thanks Sam -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting bioconductor repository in .Rprofile. Is there a permanent way?
Now that is _simple_. Thanks. Kevin On Mon, Jan 24, 2011 at 1:49 PM, Kevin Wright kw.s...@gmail.com wrote: Of course, before posting my question, I did RTFM and RTFcode and RTFmailinglists. The key word in my question was simpler. I rejected copying a modified version of the repositories file to my home directory since it has changed numerous times with addition of R-forge etc. Here is another option. More lines of code, but doesn't add unneeded repositories. pp - file.path(R.home(etc), repositories) rr - tools:::.read_repositories(pp) repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=rr[BioCsoft,URL], Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(pp, rr, repos) Martin, I appreciated your clever trick of evaluating on demand. Kevin On Mon, Jan 24, 2011 at 12:45 PM, Kevin Wright kw.s...@gmail.com wrote: I currently set the Bioconductor repository in my .Rprofile using this code (which needs editing for every version number change of Bioconductor): # Choose repositories repos - structure(c(CRAN=http://streaming.stat.iastate.edu/CRAN;, CRANextra=http://www.stats.ox.ac.uk/pub/RWin;, BioCsoft=http://www.bioconductor.org/packages/2.7/bioc;, Rforge=http://r-forge.r-project.org;)) options(repos=repos) rm(repos) I'd like to avoid editing the version number. One hack to do so is this code that adds all repositories. setRepositories(ind=1:10) r - getOption(repos) r - r[!is.na(r)] options(repos=r) Is there a simpler way? I've searched for quite a while without finding an answer. Incidentally, the help page for options says: A Bioconductor mirror can be selected by setting options(BioC_mirror): the default value is http://www.bioconductor.org;. The word default is a bit confusing here, because when I start R, I see: R options()$BioC_mirror NULL -- Kevin Wright -- Kevin Wright -- Kevin Wright __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LTA
Sebastian, There is a number of packages that fit hidden (or latent) Markov models which are in most regards identical to latent transition analysis. Best, Ingmar 2011/1/24 Sebastián Daza sebastian.d...@gmail.com Hi everyone, Does anyone know if there is a package to run Latent Transitional Analysis using R? Regards! -- Sebastián Daza sebastian.d...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange result from sort: sort(c(aa, ff)) gives ff aa with R.2.12.1 on windows 7
Dear list, Please consider the following call of sort sort(c(a,f)) [1] a f sort(c(f,a)) [1] a f sort(c(aa,ff)) [1] ff aa sort(c(ff,aa)) [1] ff aa The last two results look strange to me. Is that a bug??? The result seems to come from calls to order: order(c(a,f)) [1] 1 2 order(c(f,a)) [1] 2 1 order(c(aa,ff)) [1] 2 1 order(c(ff,aa)) [1] 1 2 I get the same results on R.2.12.1, R.2.11.1 and R.2.13.0 on Windows 7. However on Linux, I get the right answer (the answer I expected). From the help pages I get the impression that there might be an issue about locale, but I didn't understand the details. Can anyone tell me what goes on here, please Regards Søren sessionInfo() R version 2.12.1 Patched (2010-12-27 r53883) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] SHDtools_1.0 sessionInfo() R version 2.12.1 (2010-12-16) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_DK.utf8 LC_NUMERIC=C [3] LC_TIME=en_DK.utf8LC_COLLATE=en_DK.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_DK.utf8 [7] LC_PAPER=en_DK.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
Hi Well this is the output of str(dep) on a small example: str(dep) chr [1:8] ### ... Browse[1] dep [1] ### [2] ### chunk number 1: [3] ### [4] #line 516 \VolStocksDec2010.Rnw\ [5] path-\~/Dropbox/FAO/Papers/Volatility only\ [6] pathMarkov-\~/Dropbox/FAO/Markov Model/\ [7] library(zoo) [8] it seems quite accurate... I guess the problem comes form leading... even if this smaller example, it is still the same number (516) as in the test with bigger source doc... Can you reproduce this on your machine? I can reproduce it on two Linux buntu 10.4, R 2.12.1 ... Thanks!! Le 24. 01. 11 19:18, jim holtman a écrit : Do 'str(dep)' to see what dep is and where it comes from. If you have the 'options' set as I suggested, you can do this examination when the error occurs. On Mon, Jan 24, 2011 at 12:41 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.comwrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there any way to get score vector in each iteration in glm??
Hello everyone, I am doing the hypothesis test and I need the score vector of each iteration in glm (family=binomial); how could I do it. I tried trace option but it gives me just AIC of each iteration nothing more. thanks; [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] crazy loop error.
On Mon, Jan 24, 2011 at 07:16:58PM +0100, Roy Mathew wrote: Dear R-users, This is a loop which is part of a bigger script. I managed to isolate the error in this loop and simplified it to the bare minimum and made it self-contained. a-c(2,3,4,5,5,5,6,6,6,7) for(n in 1:10) { print(paste(n: ,n)) z1-a[n] #make a list container ldata-list() t=1 while(z1==a[n]) { #add dataframes to list ldata[[t]]-paste(hello) n=n+1 t=t+1 } print(--End of while loop---) for(y in 1:length(ldata)) { print(ldata[[y]]) } print(paste(n: ,n)) print(**End of for loop) } This script has a vector a, for-loop, and a nested while-loop. The for-loop runs from 1 to length of a. At every number of a, it enters the while-loop and a hello is saved into list ldata. If the next number in the vector a is a different number from previous then the while-loop is exited and saved hello is printed. If the next number in vector a is same as before then it loops inside the while-loop and several hellos are printed together. Then run-time error is Error in while (z1 == a[n]) { : missing value where TRUE/FALSE needed Thats because an NA creeps in somewhere. The problem can be seen far before that. The full output from the run is below. A lot of stuff was printed to help with the debugging. At n=4, there are three repeats of 5, therefore hello is printed 3 times. n then becomes 7. Then when the for-loop returns to top, n miraculously becomes 5. Hows that!!?? Hi. The for-loop for (i in 1:k) uses an internal index, which counts the repetitions. This is necessary, since the control over a loop like for (i in c(1,1,1,1)) cannot be based on the variable i only. Hence, changing i does not influence the next iteration of the loop. For example, the following loop always makes m*n repetitions, although using the same variable in nested loops is definitely not suggested. m - 3 n - 5 for (i in seq(length=m)) { for (i in seq(length=n)) { cat(*) } cat(\n) } Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
indeed this makes the trick! quite strange... is this a known bug/issue? thanks! Matthieu Le 24. 01. 11 19:48, William Dunlap a écrit : Put a space after the # in the line #line 516 to avoid the problem. A similar problem also appears in parse(). parse(text=#line 102\nlog(pi)\n) Error in `Encoding-`(`*tmp*`, value = character(0)) : 'value' must be of positive length parse(text=# line 102\nlog(pi)\n) expression(log(pi)) attr(,srcfile) text attr(,wholeSrcref) # line 102 log(pi) (I'm still using 2.12.0.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Monday, January 24, 2011 10:19 AM To: Matthieu Stigler Cc: r-help@r-project.org Subject: Re: [R] error with source(): invalid 'times' value Do 'str(dep)' to see what dep is and where it comes from. If you have the 'options' set as I suggested, you can do this examination when the error occurs. On Mon, Jan 24, 2011 at 12:41 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.comwrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
Le 24. 01. 11 20:43, Duncan Murdoch a écrit : On 11-01-24 12:07 PM, Matthieu Stigler wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? There is no such version, but this looks like a bug that was fixed in 2.12.1. Are you using 2.12.0? (I might be wrong about the timing of the fix; if you're using 2.12.1, try 2.12.1-patched.) Indeed 2.12.1, sorry for imprecision! I will give a try to 2.12.1-patched, although I am not so sure how I can install it (should I compile) on linux... thanks!! Duncan Murdoch Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
-Original Message- From: mat [mailto:matthieu.stig...@gmail.com] Sent: Monday, January 24, 2011 2:09 PM To: William Dunlap Cc: jim holtman; r-help@r-project.org; murdoch.dun...@gmail.com Subject: Re: [R] error with source(): invalid 'times' value indeed this makes the trick! quite strange... is this a known bug/issue? It is known now. The problem arises while printing the output of parse because the srcref attribute contains the numbers from those #line number entries and the printing routine gets confused if the numbers are out of the range 1 through the number of lines of text. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com thanks! Matthieu Le 24. 01. 11 19:48, William Dunlap a écrit : Put a space after the # in the line #line 516 to avoid the problem. A similar problem also appears in parse(). parse(text=#line 102\nlog(pi)\n) Error in `Encoding-`(`*tmp*`, value = character(0)) : 'value' must be of positive length parse(text=# line 102\nlog(pi)\n) expression(log(pi)) attr(,srcfile) text attr(,wholeSrcref) # line 102 log(pi) (I'm still using 2.12.0.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Monday, January 24, 2011 10:19 AM To: Matthieu Stigler Cc: r-help@r-project.org Subject: Re: [R] error with source(): invalid 'times' value Do 'str(dep)' to see what dep is and where it comes from. If you have the 'options' set as I suggested, you can do this examination when the error occurs. On Mon, Jan 24, 2011 at 12:41 PM, Matthieu Stigler matthieu.stig...@gmail.com wrote: ok, thanks Jim The problem comes from length(dep)leading, so we get negative number... length(dep) [1] 183 c(leading, length(dep) - leading) [1] 516 -333 But 183 seems to be the right number: $ wc -l /tmp/RFile.r 183 /tmp/RFile.r So now need to understand what is this dep, and why it has a bigger length... tried to check source code (:-)) but could not get it... any idea? Thanks a lot Matthieu Le 24. 01. 11 18:29, jim holtman a écrit : It sounds like you have some invalid expressions. Dump out the values of 'leading' and 'length(dep) - leading'. Learn some simple debugging techniques. One is to set options(error=utils::recover) so that on the error you can use the browser to examine what the values are. On Mon, Jan 24, 2011 at 12:07 PM, Matthieu Stigler matthieu.stig...@gmail.comwrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Strange result from sort: sort(c(aa, ff)) gives ff aa with R.2.12.1 on windows 7
On Mon, 24 Jan 2011, Søren Højsgaard wrote: Dear list, Please consider the following call of sort sort(c(a,f)) [1] a f sort(c(f,a)) [1] a f sort(c(aa,ff)) [1] ff aa sort(c(ff,aa)) [1] ff aa The last two results look strange to me. Is that a bug??? It seems that you and your OS disagree about Danish, and I'm in no position to know which is correct. But this is not an R issue: the sorting is done by OS services. The result seems to come from calls to order: order(c(a,f)) [1] 1 2 order(c(f,a)) [1] 2 1 order(c(aa,ff)) [1] 2 1 order(c(ff,aa)) [1] 1 2 I get the same results on R.2.12.1, R.2.11.1 and R.2.13.0 on Windows 7. However on Linux, I get the right answer (the answer I expected). From the help pages I get the impression that there might be an issue about locale, but I didn't understand the details. Can anyone tell me what goes on here, please I recall that 'aa' used to sort at the end of the alphabet in Danish telephone books, so it seems the sort used on Windows thinks so too. See ?Comparison for some further details. What I don't understand is that someone resident in Denmark finds this strange I get exactly the same in a Danish locale on Mac OS X, for example: sort(c(aa,ff)) [1] ff aa and also on my Linux box (Fedora 14 with LC_COLLATE=da_DK.utf8) sort(c(aa,ff)) [1] ff aa en_DK is not a Danish locale (in is English in Denmark). If you want an English sort, try an English locale for LC_COLLATE (there may well be several, hence 'an'). Regards Søren sessionInfo() R version 2.12.1 Patched (2010-12-27 r53883) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] SHDtools_1.0 sessionInfo() R version 2.12.1 (2010-12-16) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_DK.utf8 LC_NUMERIC=C [3] LC_TIME=en_DK.utf8LC_COLLATE=en_DK.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_DK.utf8 [7] LC_PAPER=en_DK.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to measure/rank variable importance when using rpart?
Hi Andy, Thank you for your response. I've already came by this function but also noticed that the help file states that: This method does *not* currently provide classspecific measures of importance when the *response is a factor*. Which is the case I need to deal with. Any suggestions as to how to adjust this function for the factor-response case? Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 5:21 PM, Liaw, Andy andy_l...@merck.com wrote: Check out caret::varImp.rpart(). It's described in the original CART book. Andy From: Tal Galili Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* For example, here is some dummy code, created so you might show your solutions on it. This example is structured so that it is clear that variable x1 and x2 are important while (in some sense) x1 is more important then x2 (since x1 should apply to more cases, thus make more influence on the structure of the data, then x2). set.seed(31431) n - 400 x1 - rnorm(n) x2 - rnorm(n) x3 - rnorm(n) x4 - rnorm(n) x5 - rnorm(n) X - data.frame(x1,x2,x3,x4,x5) y - sample(letters[1:4], n, T) y - ifelse(X[,2] -1 , b, y) y - ifelse(X[,1] 0 , a, y) require(rpart) fit - rpart(y~., X) plot(fit); text(fit) info.gain.rpart(fit) # your function - telling us on each variable how important it is (references are always welcomed) Thanks! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attach...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] crazy loop error.
Roy Mathew wrote: Thanks for the reply Erik, As you mentioned, grouping consecutive elements of 'a' was my idea. I am unaware of any R'ish way to do it. It would be nice if someone in the community knows this. Is this the idea you're trying to execute? It uses ?rle and ?mapply. a - c(2,3,5,5,5,6,6,7) mapply(rep, hello, rle(a)$lengths, USE.NAMES = FALSE) [[1]] [1] hello [[2]] [1] hello [[3]] [1] hello hello hello [[4]] [1] hello hello [[5]] [1] hello __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Masking commands - Permutation in gregmisc and e1071
I am using the function permutations from the package *gregmisc*. However, I am also making use of the package *e1071*, which also contains a function called permutations. I want to use the function permutations from the * gregmisc* package, however, the other package is masking this function. This happens both when I load the *e1071* package before *gregmisc* and when I load *e1071* after I load *gregmisc*. Is there any specific command to use the permutation from one package and not the other please? Many thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Masking commands - Permutation in gregmisc and e1071
On Mon, Jan 24, 2011 at 2:47 PM, Yanika Borg akina...@gmail.com wrote: I am using the function permutations from the package *gregmisc*. However, I am also making use of the package *e1071*, which also contains a function called permutations. I want to use the function permutations from the * gregmisc* package, however, the other package is masking this function. This happens both when I load the *e1071* package before *gregmisc* and when I load *e1071* after I load *gregmisc*. Is there any specific command to use the permutation from one package and not the other please? To specify package when you call a function, use package::function(...), for example gregmisc::permutations. Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to measure/rank variable importance when using rpart?
Hello Andy and other R-help readers, I've just realized that your function *does* answer my needs at full. (That's what happens when reading something late at night I guess...) Thanks again Andy for your help! Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Tue, Jan 25, 2011 at 12:33 AM, Tal Galili tal.gal...@gmail.com wrote: Hi Andy, Thank you for your response. I've already came by this function but also noticed that the help file states that: This method does *not* currently provide classspecific measures of importance when the *response is a factor*. Which is the case I need to deal with. Any suggestions as to how to adjust this function for the factor-response case? Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jan 24, 2011 at 5:21 PM, Liaw, Andy andy_l...@merck.com wrote: Check out caret::varImp.rpart(). It's described in the original CART book. Andy From: Tal Galili Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* For example, here is some dummy code, created so you might show your solutions on it. This example is structured so that it is clear that variable x1 and x2 are important while (in some sense) x1 is more important then x2 (since x1 should apply to more cases, thus make more influence on the structure of the data, then x2). set.seed(31431) n - 400 x1 - rnorm(n) x2 - rnorm(n) x3 - rnorm(n) x4 - rnorm(n) x5 - rnorm(n) X - data.frame(x1,x2,x3,x4,x5) y - sample(letters[1:4], n, T) y - ifelse(X[,2] -1 , b, y) y - ifelse(X[,1] 0 , a, y) require(rpart) fit - rpart(y~., X) plot(fit); text(fit) info.gain.rpart(fit) # your function - telling us on each variable how important it is (references are always welcomed) Thanks! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] determining the order in which points are plotted
I make plenty of scatterplots, especially using scatterplot.matrix from library(car). One thing I don't know how to do is determine which points are plotted last. Sometimes I plot a large number of points for multiple groups represented by different colors. I would like to guarantee that point that are far from the centroid for their group are plotted last. This way they will be visible because they won't be buried under a pile of points from another group. As it stands, it looks like scatterplot.matrix (and maybe other plotting functions) lay down a group at a time, in order, so that the first group is most likely to be buried under later groups. I try to sort factor levels so that the biggest groups go first, and that helps a little, but it isn't the complete solution I'm looking for. Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with source(): invalid 'times' value
On 11-01-24 5:09 PM, mat wrote: Le 24. 01. 11 20:43, Duncan Murdoch a écrit : On 11-01-24 12:07 PM, Matthieu Stigler wrote: hi I am seeing a strange behavior I can't understand... doing: source(/tmp/RFile.r,echo=TRUE) Error in rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - : invalid 'times' value traceback() 3: rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)) 2: paste(rep.int(c(prompt.echo, continue.echo), c(leading, length(dep) - leading)), dep, sep = , collapse = \n) 1: source(/tmp/RFile.r, echo = TRUE) But the file I am trying to source is very simple... see: $ more /tmp/RFile.r ### ### chunk number 1: ### #line 516 VolStocksDec2010.Rnw path-~/Dropbox/FAO/Papers/Volatility only pathMarkov-~/Dropbox/FAO/Markov Model/ library(zoo) Any idea where it can come from? It works fine when echo=FALSE... I am using R 2.12, on Ubuntu Linux 10.4 (R from CRAN), full session info below. Should I rather send this to r-devel? There is no such version, but this looks like a bug that was fixed in 2.12.1. Are you using 2.12.0? (I might be wrong about the timing of the fix; if you're using 2.12.1, try 2.12.1-patched.) Indeed 2.12.1, sorry for imprecision! I will give a try to 2.12.1-patched, although I am not so sure how I can install it (should I compile) on linux... Bill Dunlap has already confirmed that this is not what was fixed (or what was fixed never made it into the sources). I'll get to it, but not for a couple of weeks. Duncan Murdoch thanks!! Duncan Murdoch Thanks a lot Matthieu sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_CH.utf8 LC_NUMERIC=C [3] LC_TIME=fr_CH.utf8LC_COLLATE=fr_CH.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=fr_CH.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_CH.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-17 Matrix_0.999375-45 [4] nnet_7.3-1 tsDyn_0.7-40tseries_0.10-23 [7] tseriesChaos_0.1-11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] determining the order in which points are plotted
On Jan 24, 2011, at 6:49 PM, Mike Miller wrote: I make plenty of scatterplots, especially using scatterplot.matrix from library(car). One thing I don't know how to do is determine which points are plotted last. Sometimes I plot a large number of points for multiple groups represented by different colors. ?points I would like to guarantee that point that are far from the centroid for their group are plotted last. This way they will be visible because they won't be buried under a pile of points from another group. As it stands, it looks like scatterplot.matrix (and maybe other plotting functions) lay down a group at a time, in order, so that the first group is most likely to be buried under later groups. I try to sort factor levels so that the biggest groups go first, and that helps a little, but it isn't the complete solution I'm looking for. Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] determining the order in which points are plotted
On Mon, 24 Jan 2011, David Winsemius wrote: On Jan 24, 2011, at 6:49 PM, Mike Miller wrote: I make plenty of scatterplots, especially using scatterplot.matrix from library(car). One thing I don't know how to do is determine which points are plotted last. Sometimes I plot a large number of points for multiple groups represented by different colors. ?points Thanks for the tip. I guess I would make vectors for x, y and col in the desired order and the first elements would be plotted first: Graphical parameters ‘pch’, ‘col’, ‘bg’, ‘cex’ and ‘lwd’ can be vectors (which will be recycled as needed) giving a value for each point plotted. If lines are to be plotted (e.g. for ‘type = b’/ the first element of ‘lwd’ is used. Suppose I'm plotting 10,000 points in a 10 x 10 scatterplot matrix (roughly what I'm actually doing). That's a total of 1 million points. It might take a while, but I can wait. However, I'm not sure how to get the coordinates right for additional points in a scatterplot matrix. Maybe I need to study that source code. I did figure out recently how to use transparent points to get the axes right. Color #ff00 does that trick for me -- that's white color with zero opaqueness, full transparency. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] determining the order in which points are plotted
On 2011-01-24 16:39, Mike Miller wrote: On Mon, 24 Jan 2011, David Winsemius wrote: On Jan 24, 2011, at 6:49 PM, Mike Miller wrote: I make plenty of scatterplots, especially using scatterplot.matrix from library(car). One thing I don't know how to do is determine which points are plotted last. Sometimes I plot a large number of points for multiple groups represented by different colors. ?points Thanks for the tip. I guess I would make vectors for x, y and col in the desired order and the first elements would be plotted first: Graphical parameters ‘pch’, ‘col’, ‘bg’, ‘cex’ and ‘lwd’ can be vectors (which will be recycled as needed) giving a value for each point plotted. If lines are to be plotted (e.g. for ‘type = b’/ the first element of ‘lwd’ is used. Suppose I'm plotting 10,000 points in a 10 x 10 scatterplot matrix (roughly what I'm actually doing). That's a total of 1 million points. It might take a while, but I can wait. However, I'm not sure how to get the coordinates right for additional points in a scatterplot matrix. Maybe I need to study that source code. 10 x 10 strikes me as pretty near the limit of usefulness of a pairs plot. You might want to investigate the xysplom() function in pkg:HH. You'll have to write your own panel function, possibly subsetting your data with the scale() function. Peter Ehlers I did figure out recently how to use transparent points to get the axes right. Color #ff00 does that trick for me -- that's white color with zero opaqueness, full transparency. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R package rating site?
We should really have an R package rating site, comments, reviews or such, like folks do for apps or movie reviews. Does anyone know of a site trying to do this. If i remember correctly a few R user conferences ago this was talked about but not sure if anything was ever implemented. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package rating site?
http://crantastic.org/ On 01/24/2011 09:08 PM, zubin wrote: We should really have an R package rating site, comments, reviews or such, like folks do for apps or movie reviews. Does anyone know of a site trying to do this. If i remember correctly a few R user conferences ago this was talked about but not sure if anything was ever implemented. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] crazy loop error.
Thanks for the reply Erik, As you mentioned, grouping consecutive elements of 'a' was my idea. I am unaware of any R'ish way to do it. It would be nice if someone in the community knows this. The error resulting in the NA was pretty easy to fix, and my loop works, but the results are still wrong (new script below). Ideally it should print single hello for the single letters and grouped '3 hellos' for the fives, grouped '2 hellos' for the sixes etc. Based on the run results, if the value of n is being tracked, it changes quite unpredictably. Can someone explain how the value of n changes from end of the loop to the top without anything being done to it? I cannot figure out what I am doing wrong. a-c(2,3,5,5,5,6,6,7) for(n in 1:length(a)) { print(paste(n: ,n)) z1-a[n] print(paste(z1:,z1)) #make a list container ldata-list() t=1 while(z1==a[n]) { #add dataframes to list ldata[[t]]-paste(hello) n=n+1 t=t+1 if(nlength(a)) { break; } } print(--End of while loop---) for(y in 1:length(ldata)) { print(ldata[[y]]) } print(paste(n: ,n)) print(**End of for loop) } Thanks, Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.