Re: [R] thanks
see man R example from a shell: echo -e pdf(file=\test.pdf\)\nplot(1:10,11:20)\ndev.off(dev.cur())\ncmd.R R -s cmd.R (write a file of command for R, and than feed R with it) On Tuesday 11 January 2005 15:59, Cserháti Mátyás wrote: Dear all, Thanks to those 3 people who sent me answers to my question. Got the problem solved. Great! Now, another question of mine is: I would like to run an R script from the Linux prompt. Is there any way possible to do this? The reason is, the calculation that I'm doing takes a few hours, and I would like to automatize it. Or does it mean that I have to run source within the R prompt? Or is there a way to do the automatization within the R prompt? Thanks, Matthew u.i. Köszi, Zoli! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Tristan LEFEBURE Laboratoire d'écologie des hydrosystèmes fluviaux (UMR 5023) Université Lyon I - Campus de la Doua Bat. Darwin C 69622 Villeurbanne - France Phone: (33) (0)4 26 23 44 02 Fax: (33) (0)4 72 43 15 23 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
Hello, My name is Graham, I am an engineering student and my lecturer wishes us to get a numerical summary of some data. He said use the command numerical.summary(Data), which didnt work, he suggested we try library(s20x) first, which came up with an error on my console. I have version 2.0.1 of R and i dont understand what to do. As this is part of an assignment I would really apreciate some advice. Regards - Graham __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] global objects not overwritten within function
I would suggest reading the posting guide, (http://www.r-project.org/posting-guide.html) and give a reproducible example, with the error message that you received. As is, I have no idea what you are doing here, and certainly cannot run this code. You use ... as an argument to your functions (why I have no idea), but then use ... within your function seemingly to mean code was omitted rather than using your function argument What ...obj... means I have no idea. One point though. If the f() function does not take any arguments, then why are you using the special R object ... as an argument? Furthermore, why not just do the assignment inside of fct() instead of calling another function that just runs code? Please reference the _rest_ of the R Language guide for information on correct usage of ..., especially the chapter on functions, and include a script that can be run from start to finish by anyone at an R prompt without having to decipher what the missing code does, or what the ellipsis is doing in your context. My guess to what I think is going on here is that you are trying to use dynamic scoping, when R uses lexical scoping. If you are an S user, this will be a change. The f() function is stored in .GlobalEnv, so is not aware of any objects stored in the fct() environment. When you run f(), it's looking for an obj object in its environment, probably can't find one, and then looking for the obj object in the global environment. If it finds it there, it assigns it to itself, basically doing nothing. Once again, the R language guide will explain this. You could solve this by either imbedding the f() function inside of fct(), passing in the obj object, instead of relying on dynamic scoping (which R doesn't use), or probably preferably, not have an f() function at all, as all it does is call another function. Also, I'd reference ?- for perhaps a cleaner way of doing global assignments. Using this alone may solve your problems, as it may force you to scope your code correctly. -Original Message- From: bogdan romocea [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 11, 2005 9:26 AM To: r-help@stat.math.ethz.ch Subject: [R] global objects not overwritten within function Dear useRs, I have a function that creates several global objects with assign(obj,obj,.GlobalEnv), and which I need to run iteratively in another function. The code is similar to f - function(...) { assign(obj,obj,.GlobalEnv) } fct - function(...) { for (i in 1:1000) { ... f(...) ...obj... rm(obj) #code fails without this line } } I don't understand why f(), when run in a for() loop inside fct(), does not overwrite the global object 'obj'. If I don't delete 'obj' after I use it, the code fails - the same objects created by the first iteration are used by subsequent iterations. I checked ?assign and the Evaluation chapter in 'R Language Definition' but still don't understand why the above happens. Can someone briefly explain or suggest something I should read? By the way, I don't want to use 'better' techniques (lists, functions that return values instead of creating global objects etc) - I want to create global objects with f() and overwrite them again and again within fct(). Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Graphical table in R
On Tue, 2005-01-11 at 14:59 +, Dan Bolser wrote: On 10 Jan 2005, Peter Dalgaard wrote: Dan Bolser [EMAIL PROTECTED] writes: Cheers. This is really me just being lazy (as usual). The latex function in Hmisc allows me to make a .ps file then grab a screen shot of that ps and make a .png file. I would just like to use plot so I can wrap it in a png command and not have to use the 'screen shot' in between. A screen shot of a ps file? That sounds ... weird. If you can view it, presumably you have Ghostscript and that can do png files. The thing is the ps file has teh wrong size, so I end up with a small table in the corner of a big white page (using imageMagick convert function). I havent tried ghostscript (don't know the cmd). I could set the paper size correctly if I knew the size of my table, but I don't know how to calculate that before hand and feed it into the latex commands (Hmisc). Seems like I should roll my own table with the plot command and 'primatives' (like the demo(mathplot)) - I just hoped that someone had already done the hard work for me and I could type something like... plot.xtable(x) x = any R object that makes sense to have a tabular output. Seems like such a function done correctly could be usefull for helping people write up (hem) analysis. Thanks again for the help everyone. Dan. Dan, I think that taking Peter's/Thomas' solution provides a substantial level of flexibility in formatting. I wish that I had thought of that approach... :-) For example: plot(1:10, type=n) txt - capture.output(ftable(UCBAdmissions)) par(family = mono) text(4, 8, paste(txt,collapse=\n)) text(4, 6, paste(txt,collapse=\n), cex = 0.75) text(4, 4, paste(txt,collapse=\n), cex = 0.5) Using par(cex) in the call to text() and modifying the x,y coordinates will enable you to place the table anywhere within the plot region and also adjust the overall size of the table by modifying the font size. You can also use the 'adj' and 'pos' arguments in the call to text() to adjust the placement of the table, so rather than being centered on x,y (the default) it could be moved accordingly. See ?text for more information. Finally, you can even put a frame around the table by crudely using strwidth() and strheight(). Some additional hints on this would be available by reviewing the code for legend()... # Do this for the first table (assumes 'cex = 1'): # Get table width and add 10% table.w - max(strwidth(txt)) * 1.1 # Get table height (not including space between rows) table.h - sum(strheight(txt)) rect(4 - (table.w / 2), 8 - (table.h), 4 + (table.w / 2), 8 + (table.h)) It would take some work to combine all of this into a single function, providing for additional flexibility in positioning, frame line types/color/width, adjusting for 'cex' and so on. It could be done though... This is, in effect, taking an entire R character object and plotting it. Does that help? Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RODBC package -- sqlQuery(channel,.....,nullstring=0) still gives NA's
R-help, I'm using the RODBC package to retrieve data froma ODBC database which contain NA's. By using the argument nullstring = 0 in sqlQuery() I expect to coerce them to numeric but still get NA's in my select. I'm running on Windows XP version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor0.1 year 2004 month11 day 15 language R Thank you __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Meeker's SPLIDA Reliability in R
Hi, Would anyone be aware of an R package implementing the functionality found in Meeker's SPLIDA software written for S-Plus? I don't know if anyone has tried to port the s/w to R directly, or if equivalent functions are available within another package. Thanks in Advance. - Colin Colin Cunningham D1C Ramp Statistician Intel Corporation 971.214.6623 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] thanks
On Tue, Jan 11, 2005 at 03:59:58PM +0100, Cserh?ti M?ty?s wrote: I would like to run an R script from the Linux prompt. Is there any way possible to do this? The reason is, the calculation that I'm doing takes a few hours, and I would like to automatize it. Or does it mean that I have to run source within the R prompt? Or is there a way to do the automatization within the R prompt? The standard way (well, my usual way, anyway) is to just use I/O redirection: linux R --vanilla stuff.r is, for the most part (see below), equivalent to linux R source(stuff.r); The --vanilla option is necessary to suppress any interactive questions concerning workspace saving (i.e. the Save workspace image? [y/n/c] thing); differences between the automated and the interactive form may be due to your script depending on some saved environment, or some stuff in your init files. I'd like to encourage you to automate your calculations, as this enhances not only convenience but also reproducibility of your results. Best regards, Jan -- +- Jan T. Kim ---+ |*NEW*email: [EMAIL PROTECTED] | |*NEW*WWW: http://www.cmp.uea.ac.uk/people/jtk | *-= hierarchical systems are for files, not for humans =-* __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] global objects not overwritten within function
On Tue, 11 Jan 2005, bogdan romocea wrote: Dear useRs, I have a function that creates several global objects with assign(obj,obj,.GlobalEnv), and which I need to run iteratively in another function. The code is similar to f - function(...) { assign(obj,obj,.GlobalEnv) } fct - function(...) { for (i in 1:1000) { ... f(...) ...obj... rm(obj) #code fails without this line } } I don't understand why f(), when run in a for() loop inside fct(), does not overwrite the global object 'obj'. If I don't delete 'obj' after I use it, the code fails - the same objects created by the first iteration are used by subsequent iterations. I checked ?assign and the Evaluation chapter in 'R Language Definition' but still don't understand why the above happens. Can someone briefly explain or suggest something I should read? By the way, I don't want to use 'better' techniques (lists, functions that return values instead of creating global objects etc) - I want to create global objects with f() and overwrite them again and again within fct(). Since you are not using ... in the sense it is used in R, we have little idea of what your real code looks like and so what it does. Can you please give a small real example that fails. Here is one that works, yet has all the features I can deduce from your non-code: f - function(x) assign(obj, x, pos=.GlobalEnv) fct - function() { for(i in 1:2) { x - i+3 f(x) print(obj) } } fct() [1] 4 [1] 5 obj [1] 5 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help on integrate function
here is a function I wrote cdfest=function(t,lambda,delta,x,y){ a1=mean(x t) a2=mean(x t-delta) a3=mean(y1 t) s=((1-lambda)*a1+lambda*a2-a3)^2 s } when I try to integrate over t, I got this message: integrate(cdfest,0,4,lambda=0.3,delta=1,x=x,y=y1) Error in integrate(cdfest, 0, 4, lambda = 0.3, delta = 1, x = x, y = y1) : evaluation of function gave a result of wrong length but the function is definitely in one dimension. what is wrong? any suggestions are welcome. thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] useR 2005 ?
Dear R-Help-List, are there any plans to organize a useR conference in 2005? Best, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] thanks
Cserhti Mtys wrote: Dear all, Thanks to those 3 people who sent me answers to my question. Got the problem solved. Great! Now, another question of mine is: I would like to run an R script from the Linux prompt. Is there any way possible to do this? The reason is, the calculation that I'm doing takes a few hours, and I would like to automatize it. Or does it mean that I have to run source within the R prompt? Or is there a way to do the automatization within the R prompt? Thanks, Matthew u.i. Kszi, Zoli! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html See a) the manual An Introduction to R, Appendix B b) inside R type: ?BATCH c) outside R type: R CMD BATCH --help Uwe Ligges __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] getting variable names from formula
R-list, 1. Given a formula (f) w variables referencing some data set (dat), is there any easier/faster way than this to get the names (in character form) of the variables on the RHS of '~' ? dat - data.frame(x1 = x1 - rnorm(100,0,1), x2 = x2 - rnorm(100,0,1), y = x1 + x2 + rnorm(100,0,1)) f - y ~ x1 + x2 mf - model.frame(f, data=dat) mt - attr(mf, terms) predvarnames - attr(mt, term.labels) predvarnames [1] x1 x2 - 2. Also, is there an easy/fast way to do it, without having the data set (dat) available? That is, not using 'model.frame' which requires 'data'? I understand that one approach for this is to use the way formulas are stored as 'list's. For example, this works predvarnames - character() for (i in 2:length(f[[3]]) ){ predvarnames - c(predvarnames, as.character(f[[3]][[i]])) } predvarnames [1] x1 x2 but is there a better way? Thanks, Danny __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Graphical table in R
On Tue, 11 Jan 2005, Marc Schwartz wrote: On Tue, 2005-01-11 at 14:59 +, Dan Bolser wrote: On 10 Jan 2005, Peter Dalgaard wrote: Dan Bolser [EMAIL PROTECTED] writes: Cheers. This is really me just being lazy (as usual). The latex function in Hmisc allows me to make a .ps file then grab a screen shot of that ps and make a .png file. I would just like to use plot so I can wrap it in a png command and not have to use the 'screen shot' in between. A screen shot of a ps file? That sounds ... weird. If you can view it, presumably you have Ghostscript and that can do png files. The thing is the ps file has teh wrong size, so I end up with a small table in the corner of a big white page (using imageMagick convert function). I havent tried ghostscript (don't know the cmd). I could set the paper size correctly if I knew the size of my table, but I don't know how to calculate that before hand and feed it into the latex commands (Hmisc). Seems like I should roll my own table with the plot command and 'primatives' (like the demo(mathplot)) - I just hoped that someone had already done the hard work for me and I could type something like... plot.xtable(x) x = any R object that makes sense to have a tabular output. Seems like such a function done correctly could be usefull for helping people write up (hem) analysis. Thanks again for the help everyone. Dan. Dan, I think that taking Peter's/Thomas' solution provides a substantial level of flexibility in formatting. I wish that I had thought of that approach... :-) For example: plot(1:10, type=n) txt - capture.output(ftable(UCBAdmissions)) par(family = mono) text(4, 8, paste(txt,collapse=\n)) text(4, 6, paste(txt,collapse=\n), cex = 0.75) text(4, 4, paste(txt,collapse=\n), cex = 0.5) Using par(cex) in the call to text() and modifying the x,y coordinates will enable you to place the table anywhere within the plot region and also adjust the overall size of the table by modifying the font size. You can also use the 'adj' and 'pos' arguments in the call to text() to adjust the placement of the table, so rather than being centered on x,y (the default) it could be moved accordingly. See ?text for more information. Finally, you can even put a frame around the table by crudely using strwidth() and strheight(). Some additional hints on this would be available by reviewing the code for legend()... # Do this for the first table (assumes 'cex = 1'): # Get table width and add 10% table.w - max(strwidth(txt)) * 1.1 # Get table height (not including space between rows) table.h - sum(strheight(txt)) rect(4 - (table.w / 2), 8 - (table.h), 4 + (table.w / 2), 8 + (table.h)) It would take some work to combine all of this into a single function, providing for additional flexibility in positioning, frame line types/color/width, adjusting for 'cex' and so on. It could be done though... This is, in effect, taking an entire R character object and plotting it. Does that help? It certainly fits the bill. I will give it a go, but I may stick with the latex() functions in Hmisc. Thanks for all the help, it is a really elegant solution in the end :) Dan. Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Standard error for the area under a smoothed ROC curve?
Hello, I am making some use of ROC curve analysis. I find much help on the mailing list, and I have used the Area Under the Curve (AUC) functions from the ROC function in the bioconductor project... http://www.bioconductor.org/repository/release1.5/package/Source/ ROC_1.0.13.tar.gz However, I read here... http://www.medcalc.be/manual/mpage06-13b.php The 95% confidence interval for the area can be used to test the hypothesis that the theoretical area is 0.5. If the confidence interval does not include the 0.5 value, then there is evidence that the laboratory test does have an ability to distinguish between the two groups (Hanley McNeil, 1982; Zweig Campbell, 1993). But aside from early on the above article is short on details. Can anyone tell me how to calculate the CI of the AUC calculation? I read this... http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf Which talks about resampling (by showing R code), but I can't understand what is going on, or what is calculated (the example given is specific to microarray analysis I think). I think a general AUC CI function would be a good addition to the ROC package. One more thing, in calculating the AUC I see the splines function is recomended over the approx function. Here... http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html How would I rewrite the following AUC functions (adapted from bioconductor source) to use splines (or approxfun or splinefun) ... spe # Specificity [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826 [7] 0.65217391 0.76086957 0.89130435 1. 1. 1. [13] 1. sen # Sensitivity [1] 1.000 1.000 1.000 1.000 1.000 0.9302326 0.8139535 [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791 trapezint(1-spe,sen) my.integrate(1-spe,sen) ## Functions ## Nicked (and modified) from the ROC function in bioconductor. trapezint - function (x, y, a = 0, b = 1) { if (x[1] x[length(x)]) { x - rev(x) y - rev(y) } y - y[x = a x = b] x - x[x = a x = b] if (length(unique(x)) 2) return(NA) ya - approx(x, y, a, ties = max, rule = 2)$y yb - approx(x, y, b, ties = max, rule = 2)$y x - c(a, x, b) y - c(ya, y, yb) h - diff(x) lx - length(x) 0.5 * sum(h * (y[-1] + y[-lx])) } my.integrate - function (x, y, t0 = 1) { f - function(j) approx(x,y,j,rule=2,ties=max)$y integrate(f, 0, t0)$value } Thanks for any pointers, Dan. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help on integrate function
On Tue, 11 Jan 2005, Francisca xuan wrote: here is a function I wrote cdfest=function(t,lambda,delta,x,y){ a1=mean(x t) a2=mean(x t-delta) a3=mean(y1 t) s=((1-lambda)*a1+lambda*a2-a3)^2 s } when I try to integrate over t, I got this message: integrate(cdfest,0,4,lambda=0.3,delta=1,x=x,y=y1) Error in integrate(cdfest, 0, 4, lambda = 0.3, delta = 1, x = x, y = y1) : evaluation of function gave a result of wrong length but the function is definitely in one dimension. what is wrong? Please read the help page: f: an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error. Your function does not do that: it returns a scalar for a vector input of length 1, as the message clearly says. PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Breslow Day Test
Breslow-Day test A statistical test for the homogeneity of odds ratios. Homogeneity In javascript:void(0); systematic reviews homogeneity refers to the degree to which the results of studies included in a review are similar. Clinical homogeneity means that, in studies included in a review, the participants, interventions and outcome measures are similar or comparable. Studies are considered statistically homogeneous if their results vary no more than might be expected by the play of chance. See javascript:void(0); heterogeneity. Odds ratio (OR) The ratio of the odds of an event in the experimental (intervention) group to the odds of an event in the javascript:void(0); control group. Odds are the ratio of the number of people in a group with an event to the number without an event. Thus, if a group of 100 people had an javascript:void(0); event rate of 0.20, 20 people had the event and 80 did not, and the odds would be 20/80 or 0.25. An odds ratio of one indicates no difference between comparison groups. For undesirable outcomes an OR that is less than one indicates that the intervention was effective in reducing the risk of that outcome. When the event rate is small, odds ratios are very similar to javascript:void(0); relative risks. http://www.cochrane.dk/cochrane/handbook/contents.htm Bye, Judit [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] getting variable names from formula
Daniel Almirall wrote: R-list, 1. Given a formula (f) w variables referencing some data set (dat), is there any easier/faster way than this to get the names (in character form) of the variables on the RHS of '~' ? dat - data.frame(x1 = x1 - rnorm(100,0,1), x2 = x2 - rnorm(100,0,1), y = x1 + x2 + rnorm(100,0,1)) f - y ~ x1 + x2 mf - model.frame(f, data=dat) mt - attr(mf, terms) predvarnames - attr(mt, term.labels) predvarnames [1] x1 x2 - 2. Also, is there an easy/fast way to do it, without having the data set (dat) available? That is, not using 'model.frame' which requires 'data'? I understand that one approach for this is to use the way formulas are stored as 'list's. For example, this works predvarnames - character() for (i in 2:length(f[[3]]) ){ predvarnames - c(predvarnames, as.character(f[[3]][[i]])) } predvarnames [1] x1 x2 but is there a better way? Thanks, Danny __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html That's exactly what the all.vars function does. If you apply it to the formula you get all the names of variables referenced in the formula. If you only want the right hand side then apply it to the third component of the formula f - y ~ x1 + x2 all.vars(f) [1] y x1 x2 all.vars(f[[3]]) [1] x1 x2 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] transcan() from Hmisc package for imputing data
Hello: I have been trying to impute missing values of a data frame which has both numerical and categorical values using the function transcan() with little luck. Would you be able to give me a simple example where a data frame is fed to transcan and it spits out a new data frame with the NA values filled up? Or is there any other function that i could use? Thank you avneet = I believe in equality for everyone, except reporters and photographers. ~Mahatma Gandhi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] getting variable names from formula
maybe something like: f - y ~ x1 + x2 attr(terms(f), term.labels) but this wan't work if you have a more complex formula (e.g., f - y ~ x1*x2 + I(x1^2)) and you want only c(x1, x2). I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: Daniel Almirall [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Tuesday, January 11, 2005 9:55 PM Subject: [R] getting variable names from formula R-list, 1. Given a formula (f) w variables referencing some data set (dat), is there any easier/faster way than this to get the names (in character form) of the variables on the RHS of '~' ? dat - data.frame(x1 = x1 - rnorm(100,0,1), x2 = x2 - rnorm(100,0,1), y = x1 + x2 + rnorm(100,0,1)) f - y ~ x1 + x2 mf - model.frame(f, data=dat) mt - attr(mf, terms) predvarnames - attr(mt, term.labels) predvarnames [1] x1 x2 - 2. Also, is there an easy/fast way to do it, without having the data set (dat) available? That is, not using 'model.frame' which requires 'data'? I understand that one approach for this is to use the way formulas are stored as 'list's. For example, this works predvarnames - character() for (i in 2:length(f[[3]]) ){ predvarnames - c(predvarnames, as.character(f[[3]][[i]])) } predvarnames [1] x1 x2 but is there a better way? Thanks, Danny __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (no subject)
Hi, On Wed, 12 Jan 2005, Bigby wrote: Hello, numerical.summary(Data), which didnt work, he suggested we try library(s20x) first, which came up with an error on my console. I have version 2.0.1 of R library(s20x) is a package written by the Department of Statistics at the University of Auckland. It is used for their STATS 201/208 courses. It is not on CRAN. You may need to contact them for it. But you can get most of it using other commands. From memory it simply combines several other R functions, such as summary(), quantile()...etc. HTH, Kevin Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Destructor for S4 objects?
Hi Robert, It looks like there is no way to explicitly make an S4 object call a function when it is garbage collected unless you resort to tricks with reg.finalizer. It turns out that Prof. Ripley's reply (thanks!!) had enough hints in it that I was able to get the effect I wanted by using R's external pointer facility. In fact it works quite nicely. In a nutshell, I create a C++ object (with new) and then wrap its pointer with an R external pointer using SEXP rExtPtr = R_MakeExternalPtr( cPtr, aTag, R_NilValue); Where cPtr is the C++/C pointer to the object and aTag is an R symbol describing the pointer type [e.g. SEXP aTag = install(this_is_a_tag_for_a_pointer_to_my_object)]. The final argument is a value to protect. I don't know what this means, but all of the examples I saw use R_NilValue. If you want a C++ function to be called when R loses the reference to the external pointer (actually when R garbage collects it, or when R quits), do R_RegisterCFinalizerEx( rExtPtr, (R_CFinalizer_t)functionToBeCalled, TRUE ); The TRUE means that R will call the functionToBeCalled if the pointer is still around when R quits. I guess if you set it to FALSE, then you are assuming that your shell can delete memory and/or release resources when R quits. So return this external pointer to R (the function that new'ed it was called by .Call or something similar) and stick it in a slot of your object. Then when your object is garbage collected, functionToBeCalled will be called. The slot would have the type externalptr. The functionToBeCalled contains the code to delete the C++ pointer or release resources, for example... SEXP functionToBeCalled( SEXP rExtPtr ) { // Get the C++ pointer MyThing* ptr = R_ExternalPtrAddr(rExtPtr); // Delete it delete ptr; // Clear the external pointer R_ClearExternalPtr(rExtPtr); return R_NilValue; } And there you have it. There doesn't seem to be any official documentation on this stuff (at least none that I could find). The best references I found are on the R developers web page. See the links within some notes on _references, external objects, or mutable state_ for R and a _simple implementation_ of external references and finalization. Note that the documents are slightly out of date (the function names have apparently been changed somewhat). The latter one has some examples that are very helpful. And as Prof. Ripley pointed out, RODBC uses this facility too, so look at that code. Hope this was useful. Good luck. --- Adam Adam Lyon (lyon-at-fnal.gov) Fermi National Accelerator Laboratory Computing Division / D0 Experiment __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RODBC package -- sqlQuery(channel,.....,nullstring=0) stillgives NA's
PLEASE do read the help page, which says nullstring: character string to be used when reading 'SQL_NULL_DATA' character items from the database. ^^^ so this does not apply to numeric items. You can of course easily change numeric NAs to 0s, if you want to. On Wed, 12 Jan 2005, Luis Rideau Cruz wrote: There is something strange in R behaviour (perhaps). Your negative remarks are not appreciated. I have run the same select in Oracle SQL*Plus (version 10.1.0.2.0) and the output comes out with NULLs (which is what it ougth to be). But in R I still get the same result with NAs (no matter I use na.strings or nullstring arguments) An output example follows below: Using na.string=0 and nullstring=0 (sorry by the indents): Length 2003 2002 2001 2000 1999 1998 1997 1996 1995 1 32 NA1 NA NA NA NA NA2 NA 2 343 NA NA NA NA NA NA6 NA 3 35 NA NA NA NA2 NA NA NA NA 4 36 NA 12 NA NA 10 NA NA1 NA 5 3733 NA NA4 NA NA 31 NA 6 382411 126 NA 11 NA 7 394 1355 348 NA 58 13 Length 2003 2002 2001 2000 1999 1998 1997 1996 1995 32 1 2 34 3 6 35 2 3612 10 1 37 3 34 31 38 2 4 1 1 12 611 39 4 13 5 5 34 858 13 Best, Luis Prof Brian Ripley [EMAIL PROTECTED] 12/01/2005 09:14:22 On Tue, 11 Jan 2005, Luis Rideau Cruz wrote: R-help, I'm using the RODBC package to retrieve data froma ODBC database which contain NA's. By using the argument nullstring = 0 in sqlQuery() I expect to coerce them to numeric but still get NA's in my select. You need to read the help page (as the posting guide asks): it says na.strings: character string(s) to be mapped to 'NA' when reading character data. which is the opposite of what you are saying you want to do. An ODBC database cannot contain NA's. It may contain NULLs, and it may contain NA, so we have no idea what you mean. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RODBC package -- sqlQuery(channel,.....,nullstring=0)stillgives NA's
(1) I do read the posting guide (the fact that I missread o missunderstood something does not imply not reading) (2) I could change NAs to 0 (I know) but I have previously (older versions of R and SQL*Plus) used the same select with the right output (namely with 0s). (3) AFAIK strange is not a negative remark and does not seem to me at the very least but that is always a matter of taste. (4) Thank you for your replies but the door is still open so as to know a solution to the select without coercing NAs to 0s after retrieving the data Best, Luis Prof Brian Ripley [EMAIL PROTECTED] 12/01/2005 11:21:33 PLEASE do read the help page, which says nullstring: character string to be used when reading 'SQL_NULL_DATA' character items from the database. ^^^ so this does not apply to numeric items. You can of course easily change numeric NAs to 0s, if you want to. On Wed, 12 Jan 2005, Luis Rideau Cruz wrote: There is something strange in R behaviour (perhaps). Your negative remarks are not appreciated. I have run the same select in Oracle SQL*Plus (version 10.1.0.2.0) and the output comes out with NULLs (which is what it ougth to be). But in R I still get the same result with NAs (no matter I use na.strings or nullstring arguments) An output example follows below: Using na.string=0 and nullstring=0 (sorry by the indents): Length 2003 2002 2001 2000 1999 1998 1997 1996 1995 1 32 NA1 NA NA NA NA NA2 NA 2 343 NA NA NA NA NA NA6 NA 3 35 NA NA NA NA2 NA NA NA NA 4 36 NA 12 NA NA 10 NA NA1 NA 5 3733 NA NA4 NA NA 31 NA 6 382411 126 NA 11 NA 7 394 1355 348 NA 58 13 Length 2003 2002 2001 2000 1999 1998 1997 1996 1995 32 1 2 34 3 6 35 2 3612 10 1 37 3 34 31 38 2 4 1 1 12 611 39 4 13 5 5 34 858 13 Best, Luis Prof Brian Ripley [EMAIL PROTECTED] 12/01/2005 09:14:22 On Tue, 11 Jan 2005, Luis Rideau Cruz wrote: R-help, I'm using the RODBC package to retrieve data froma ODBC database which contain NA's. By using the argument nullstring = 0 in sqlQuery() I expect to coerce them to numeric but still get NA's in my select. You need to read the help page (as the posting guide asks): it says na.strings: character string(s) to be mapped to 'NA' when reading character data. which is the opposite of what you are saying you want to do. An ODBC database cannot contain NA's. It may contain NULLs, and it may contain NA, so we have no idea what you mean. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Breslow Day Test
On Tue, 11 Jan 2005 10:45:48 -0500 Palos, Judit [EMAIL PROTECTED] wrote: Breslow-Day test A statistical test for the homogeneity of odds ratios. [..some definitions..] Your message was not particularly clear, but if you were looking for R code to do a Breslow-Day test, Google found this for you: http://www.math.montana.edu/~jimrc/classes/stat524/Rcode/breslowday.test.r HTH, Tobias PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Standard error for the area under a smoothed ROC curve?
Dan Bolser wrote: Hello, I am making some use of ROC curve analysis. I find much help on the mailing list, and I have used the Area Under the Curve (AUC) functions from the ROC function in the bioconductor project... http://www.bioconductor.org/repository/release1.5/package/Source/ ROC_1.0.13.tar.gz However, I read here... http://www.medcalc.be/manual/mpage06-13b.php The 95% confidence interval for the area can be used to test the hypothesis that the theoretical area is 0.5. If the confidence interval does not include the 0.5 value, then there is evidence that the laboratory test does have an ability to distinguish between the two groups (Hanley McNeil, 1982; Zweig Campbell, 1993). But aside from early on the above article is short on details. Can anyone tell me how to calculate the CI of the AUC calculation? I read this... http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf Which talks about resampling (by showing R code), but I can't understand what is going on, or what is calculated (the example given is specific to microarray analysis I think). I think a general AUC CI function would be a good addition to the ROC package. One more thing, in calculating the AUC I see the splines function is recomended over the approx function. Here... http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html How would I rewrite the following AUC functions (adapted from bioconductor source) to use splines (or approxfun or splinefun) ... spe # Specificity [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826 [7] 0.65217391 0.76086957 0.89130435 1. 1. 1. [13] 1. sen # Sensitivity [1] 1.000 1.000 1.000 1.000 1.000 0.9302326 0.8139535 [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791 trapezint(1-spe,sen) my.integrate(1-spe,sen) ## Functions ## Nicked (and modified) from the ROC function in bioconductor. trapezint - function (x, y, a = 0, b = 1) { if (x[1] x[length(x)]) { x - rev(x) y - rev(y) } y - y[x = a x = b] x - x[x = a x = b] if (length(unique(x)) 2) return(NA) ya - approx(x, y, a, ties = max, rule = 2)$y yb - approx(x, y, b, ties = max, rule = 2)$y x - c(a, x, b) y - c(ya, y, yb) h - diff(x) lx - length(x) 0.5 * sum(h * (y[-1] + y[-lx])) } my.integrate - function (x, y, t0 = 1) { f - function(j) approx(x,y,j,rule=2,ties=max)$y integrate(f, 0, t0)$value } Thanks for any pointers, Dan. I don't see why the above formulas are being used. The Bamber-Hanley-McNeil-Wilcoxon-Mann-Whitney nonparametric method works great. Just get the U statistic (concordance probability) used in Wilcoxon. As Somers' Dxy rank correlation coefficient is 2*(1-C) where C is the concordance or ROC area, the Hmisc package function rcorr.cens uses U statistic methods to get the standard error of Dxy. You can easily translate this to a standard error of C. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] transcan() from Hmisc package for imputing data
avneet singh wrote: Hello: I have been trying to impute missing values of a data frame which has both numerical and categorical values using the function transcan() with little luck. Would you be able to give me a simple example where a data frame is fed to transcan and it spits out a new data frame with the NA values filled up? Or is there any other function that i could use? Thank you avneet It's in the help file for transcan. But multiple imputation is much better, and transcan does not do multiple imputation as well as the newer Hmisc function aregImpute. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [survey] R for Reporting - the R Output MAnager (ROMA) project
Hi R UseRs, I am interested in providing Reporting abilities to R and have initiated a project called R Output MAnager (ROMA). My starting point was my R2HTML package which provides (rough) HTML exportations. I began with trying to mimic it for LaTeX but fastly did realize that it was a bad idea. Thus, I started again from scratch a new package and did spend a lot of time reading about this topic, looking at what other software do (SAS ODS, SPlus SPXML,...) and studying technologies and formats: XML+XLST, LyX, DocBook, RTF,... What follows is a description of my plans. This email is targetted to interested useRs, in order to have a return on that. It comes with a little survey at the end that will be useful to me to target my package. If you are also interested in Reporting (Output, Formats, Exchange,...), please read the following and answer the survey. If not, you can skip this message - apologies for sending it to R-help, I hope you don't mind. --- As a matter of fact, I have implemented something that shows promises (according to me). Currently, from the following output description: *** data(iris) mm=as.matrix(iris[1:5,1:4]) out = emptyContent() out = out + Section(A title here) out = out + diag(2) out = out + Comment(comment: yes!) out = out + list(un=1,pi) out = out + Then a boolean: + TRUE out = out + Section(Default matrix,level=2) out = out + mm out = out + Section(Custom matrix + Footnote(It works!),level=2) out = out + ROMA(mm,style=custommatrix,rowstyle=paste(color,row(mm)[,1]%%2,sep=),align=left) out = out + Section(An other title) out = out + ROMAgenerated()# ROMAgenerated is a predefined function *** You can generate a proper HTML file by the following command: Export(out) (see result: http://www.stat.ucl.ac.be/ROMA/sample.htm) The same output object could be exported to (tex+dvi+ps+pdf) with: Export(out,driver=latex) (see result: http://www.stat.ucl.ac.be/ROMA/sample.pdf / Change extension for other formats: tex and ps) --- Survey --- IMPORTANT: ONLY DO REPLY TO ME, NOT TO R-HELP MAILING LIST Simply fill in the questions you want to asnwer to: 1. I am interesting in Reporting abilities for R [ ] Definitively [ ] Rather Yes [ ] Rather No [ ] Not at all 2. I have some knowledge about those different formats / specifications [ ] rtf [ ] LaTeX [ ] LyX [ ] html [ ] css[ ] xHTML [ ] XML [ ] XLST [ ] DocBook 3. I have some knowledge about those tools [ ] SAS ODS [ ] SPlus SPXML library [ ] XLST + XLST-FO chain 4. I would be specially interested in the following formats (multiple choices possible) [ ] rtf [ ] tex [ ] lyx [ ] XML, with a DTD specific to R [ ] XML, with the DTD from SPlus (compatible with SPXML library) [ ] XML, DocBook flavor [ ] HTML + css (good xHTML) [ ] Word (doc) [ ] OpenOffice (oo) [ ] Plain text [ ] Other: 4bis: If several formats, the best (according to me and my needs) one would be: 5. The approach is to fully separate content from formating. So, XML would be an ideal output format. Nevertheless, few people who use R may also mater XLST to produce nice formatted output. Thus, a way to handle styles (bold, colors, fonts, etc.) from R would also be great. It may not be a priority. Statistical output do have some specific issues: mathematics, complicated tables, graphs, and so on. For each of the following items, please tell me how important the issue is for you: 0: I don't need that (and think I will never need it) 1: Not really important ... 5: Crucial - I can't leave without that point anymore 5.1 - Beeing able to read the document in any OS: Importance: __ 5.2 - Having an object that describes the output within R (as in the example), so that I could add/remove things, reexport it Importance: __ 5.3 - Beeing able to define basics formatting also within R (bold, colors, fonts, and so on) Importance: __ 5.4 - Beeing able to include mathematics, as (La)TeX codes or MathML Importance: __ 5.5 - Beeing able to build complicated tables, with merged cells, embedding lists, eventually sub-tables Importance: __ 6. Here are some conceptual objects that a report may contain. Are there any more you can think to which may be important? Tables (containing Rows and Cells), Lists, Titles, Footnotes, Comment, Abbreviations / Acronyms, Code, Links, Graphs, Layout (to have 2 or 3 columns), Mathematics (equations), Table of Contents, Index Other that could be added: 7. Two different tools allow to create dynamic or alike documents: Sweave (for LaTeX and HTML) and Rpad (HTML, with a server). I would be interested in beeing able to describe the structure of a document that would be exportable to: 7.1 - Sweave [ ] Yes[ ] No 7.2 - Rpad [ ] Yes[ ] No If you are interested in contributing to the project, please let me know also.
Re: [R] thanks
On Tue, Jan 11, 2005 at 04:24:11PM +0100, Lefebure Tristan wrote: example from a shell: echo -e pdf(file=\test.pdf\)\nplot(1:10,11:20)\ndev.off(dev.cur())\ncmd.R R -s cmd.R (write a file of command for R, and than feed R with it) This may be on the verge of becoming offtopic, but let me remark that the technique proposed here should be used for illustrative purposes only. For real life, use pipes: echo 'print(mean(rnorm(10)));' | R --vanilla This is equivalent to echo ''print(mean(rnorm(10)));' cmd.R R --vanilla cmd.R *as long as only one shell is executing this sequence at any given time*. The reason I mention this here is that I've seen it happen a few times that this temporary command file approach has made it from examples into shell scripts of which then, later on, multiple instances were run at a time, resulting in very rare, very irreproducible, and most inexplicable erroneous results. Best regards, Jan -- +- Jan T. Kim ---+ |*NEW*email: [EMAIL PROTECTED] | |*NEW*WWW: http://www.cmp.uea.ac.uk/people/jtk | *-= hierarchical systems are for files, not for humans =-* __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CUSUM SQUARED structural breaks approach?
On Tue, 11 Jan 2005 19:33:41 + Rick Ram wrote: Groundwork for the choice of break method in my specific application has already been done - otherwise I would need to rework the wheel (make a horribly detailed comparison of performance of break approaches in context of modelling post break) If it interests you, Pesaran Timmerman 2002 compared CUSUM Squared, BaiPerron and a time varying approach to detect singular previous breaks in reverse ordered financial time series so as to update a forecasting model. Yes, I know that paper. And if I recall correctly they are mainly interested in modelling the time period after the last break. For this, the reverse ordered recursive CUSUM approach works because they essentially look back in time to see when their predictions break down. And for their application looking for variance changes also makes sense. The approach is surely valid and sound in this context...but it might be possible to do something better (but I would have to look much closer at the particular application to have an idea what might be a way to go). This works fine i.e. the plot looks correct. The problem is how to appropriately normalise these to rescale them to what the CUSUM squared procedure expects (this looks to be a different and more complicated procedure than the normalisation used for the basic CUSUM). I am from an IT background and am slightly illiterate in terms of math notation... guidance from anyone would be appreciated I just had a brief glance at BDE75, page 154, Section 2.4. If I haven't missed anything important on reading it very quickly, you just need to do something like the following (a reproducible example, based on data from strucchange, using a notation similar to BDE's): ## load GermanM1 data and model library(strucchange) data(GermanM1) M1.model - dm ~ dy2 + dR + dR1 + dp + ecm.res + season ## compute squared recursive residuals w2 - recresid(M1.model, data = GermanM1)^2 ## compute CUSUM of squares process sr - ts(cumsum(c(0, w2))/sum(w2), end = end(GermanM1$dm), freq = 12) ## the border (r-k)/(T-k) border - ts(seq(0, 1, length = length(sr)), start = start(sr), freq = 12) ## nice plot plot(sr, xaxs = i, yaxs = i, main = CUSUM of Squares) lines(border, col = grey(0.5)) lines(0.4 + border, col = grey(0.5)) lines(- 0.4 + border, col = grey(0.5)) Instead of 0.4 you would have to use the appropriate critical values from Durbin (1969) if my reading of the paper is correct. hth, Z Does anyone know if this represents some commonly performed type of normalisation than exists in another function?? I will hunt out the 1969 paper for the critical values but prior to doing this I am a bit confused as to how they will implemented/interpreted... the CUSUM squared plot does/should run diagonally up from left to right and there are two straight lines that one would put around this from the critical values. Hence, a different interpretation/implementation of confidence levels than in other contexts. I realise this is not just a R thing but a problem with my theoretical background. Thanks for detailed reply! Rick. But depending on the model and hypothesis you want to test, another technique than CUSUM of squares might be more appropriate and also available in strucchange. hth, Z Any help or pointers about where to look would be more than appreciated! Hopefully I have just missed obvious something in the package... Many thanks, Rick R. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [survey] R for Reporting - the R Output MAnager (ROMA) project
Your example is sequential, ignoring the tree-like structure of most documents. Why not via a DOM or similar XML-ish structure? While I'd never advocate general purpose XML as a user format, as you note, that is what XSLT is for, and using XML as an electronic internal document representation would provide a potentially more scalable system. (i.e. use XML and the DOM internally, but provide a simple API to it). The other advantage would be that you could stick a dependency DAG (ADG) via a second set of marked edges of the document graph/tree to allow for selective regeneration of results. But then, this project isn't on my to-do list this year :-). best, -tony On Wed, 12 Jan 2005 14:16:57 +0100, Eric Lecoutre [EMAIL PROTECTED] wrote: Hi R UseRs, I am interested in providing Reporting abilities to R and have initiated a project called R Output MAnager (ROMA). My starting point was my R2HTML package which provides (rough) HTML exportations. I began with trying to mimic it for LaTeX but fastly did realize that it was a bad idea. Thus, I started again from scratch a new package and did spend a lot of time reading about this topic, looking at what other software do (SAS ODS, SPlus SPXML,...) and studying technologies and formats: XML+XLST, LyX, DocBook, RTF,... What follows is a description of my plans. This email is targetted to interested useRs, in order to have a return on that. It comes with a little survey at the end that will be useful to me to target my package. If you are also interested in Reporting (Output, Formats, Exchange,...), please read the following and answer the survey. If not, you can skip this message - apologies for sending it to R-help, I hope you don't mind. --- As a matter of fact, I have implemented something that shows promises (according to me). Currently, from the following output description: *** data(iris) mm=as.matrix(iris[1:5,1:4]) out = emptyContent() out = out + Section(A title here) out = out + diag(2) out = out + Comment(comment: yes!) out = out + list(un=1,pi) out = out + Then a boolean: + TRUE out = out + Section(Default matrix,level=2) out = out + mm out = out + Section(Custom matrix + Footnote(It works!),level=2) out = out + ROMA(mm,style=custommatrix,rowstyle=paste(color,row(mm)[,1]%%2,sep=),align=left) out = out + Section(An other title) out = out + ROMAgenerated()# ROMAgenerated is a predefined function *** You can generate a proper HTML file by the following command: Export(out) (see result: http://www.stat.ucl.ac.be/ROMA/sample.htm) The same output object could be exported to (tex+dvi+ps+pdf) with: Export(out,driver=latex) (see result: http://www.stat.ucl.ac.be/ROMA/sample.pdf / Change extension for other formats: tex and ps) --- Survey --- IMPORTANT: ONLY DO REPLY TO ME, NOT TO R-HELP MAILING LIST Simply fill in the questions you want to asnwer to: 1. I am interesting in Reporting abilities for R [ ] Definitively [ ] Rather Yes [ ] Rather No [ ] Not at all 2. I have some knowledge about those different formats / specifications [ ] rtf [ ] LaTeX [ ] LyX [ ] html [ ] css[ ] xHTML [ ] XML [ ] XLST [ ] DocBook 3. I have some knowledge about those tools [ ] SAS ODS [ ] SPlus SPXML library [ ] XLST + XLST-FO chain 4. I would be specially interested in the following formats (multiple choices possible) [ ] rtf [ ] tex [ ] lyx [ ] XML, with a DTD specific to R [ ] XML, with the DTD from SPlus (compatible with SPXML library) [ ] XML, DocBook flavor [ ] HTML + css (good xHTML) [ ] Word (doc) [ ] OpenOffice (oo) [ ] Plain text [ ] Other: 4bis: If several formats, the best (according to me and my needs) one would be: 5. The approach is to fully separate content from formating. So, XML would be an ideal output format. Nevertheless, few people who use R may also mater XLST to produce nice formatted output. Thus, a way to handle styles (bold, colors, fonts, etc.) from R would also be great. It may not be a priority. Statistical output do have some specific issues: mathematics, complicated tables, graphs, and so on. For each of the following items, please tell me how important the issue is for you: 0: I don't need that (and think I will never need it) 1: Not really important ... 5: Crucial - I can't leave without that point anymore 5.1 - Beeing able to read the document in any OS: Importance: __ 5.2 - Having an object that describes the output within R (as in the example), so that I could add/remove things, reexport it Importance: __ 5.3 - Beeing able to define basics formatting also within R (bold, colors, fonts, and so on) Importance: __ 5.4 - Beeing able to include mathematics, as (La)TeX codes or MathML Importance: __ 5.5 - Beeing able to build
Re: [R] useR 2005 ?
On Tue, 2005-01-11 at 17:39 +0100, Rau, Roland wrote: Dear R-Help-List, are there any plans to organize a useR conference in 2005? Best, Roland As I understand it, no. The next one will be in 2006, so it will be every other year, interleaved with the DSC meeting the odd years. Information on past DSC meetings is here: http://www.ci.tuwien.ac.at/Conferences/DSC.html I have not seen anything posted yet for DSC 2005, unless I missed it someplace. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
hi, I am trying to grow a classification tree on some data, but I have a little problem. In order to do so I have to use a function like tree in R and on the internet help(tree) I get the following: The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced I would like to know what is a factor in R, is it numerical value with no formula or just a word?? Thanks in advance Nicolas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] What is a factor? [was :(no subject)]
On Wed, 2005-01-12 at 15:17 +0100, [EMAIL PROTECTED] wrote: hi, I am trying to grow a classification tree on some data, but I have a little problem. In order to do so I have to use a function like tree in R and on the internet help(tree) I get the following: The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced I would like to know what is a factor in R, is it numerical value with no formula or just a word?? Thanks in advance Nicolas See ?factor and/or Chapter 4 Ordered and Unordered Factors in An Introduction to R. Also, you might want to look into the 'rpart' package for an alternative to 'tree'. rpart is included in the base R distribution: library(rpart) ?rpart HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] (no subject)
Dear help desk and R community, I have a problem on how R2.0 handle the RAM, maybe a bug In fact I used R1.9 since january 2004 with large data set using a macosx G5 with 1G of ram without problem . Then I passed to 2.0 and I found myself short in ram using virtual memory. I tried to use the program from terminal windows to avoid the GUI but it was the same. The annoying part is that even if I cancel big object from the workspace the RAM consumption do not decrease (looking at the percentage of usage in ps or the actual value in activity monitor). Only after a long time (1/2 hour ) the consumption of RAM decreased somewhat. When I use the workspace browser on the GUI and I use refresh the consumption of RAM fluctuate each time both decreasing and incresing even if the workspace do not change. For example I can pass from using 270mb of memory to 430mb (or the contrary) simply pushing several time refresh. The value is stable once I do not push refresh anymore thanks saverio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Breslow Day Test
On Wed, 12 Jan 2005, Tobias Verbeke wrote: On Tue, 11 Jan 2005 10:45:48 -0500 Palos, Judit [EMAIL PROTECTED] wrote: Breslow-Day test A statistical test for the homogeneity of odds ratios. [..some definitions..] Your message was not particularly clear, but if you were looking for R code to do a Breslow-Day test, Google found this for you: There is code for meta-analyses, including a test of homogeneity that I think is the same as the Breslow-Day one, in the rmeta package. The package does forest plots, too. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] (no subject)
I think you will find all the doc in the help files ?factor() gets The function 'factor' is used to encode a vector as a factor (the terms 'category' and 'enumerated type' are also used for factors). If 'ordered' is 'TRUE', the factor levels are assumed to be ordered. For compatibility with S there is also a function 'ordered'. 'is.factor', 'is.ordered', 'as.factor' and 'as.ordered' are the membership and coercion functions for these classes. Usage: factor(x, levels = sort(unique.default(x), na.last = TRUE), labels = levels, exclude = NA, ordered = is.ordered(x)) ordered(x, ...) etc... c'est une variable de type catégorique! whose levels (values) are strings To get help: type ?functionname or if you are under Windows see the menu Help\Html help and look under Packages. What you will want first are the Base and Statistics packages Anne - Original Message - From: [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Wednesday, January 12, 2005 3:17 PM Subject: [R] (no subject) hi, I am trying to grow a classification tree on some data, but I have a little problem. In order to do so I have to use a function like tree in R and on the internet help(tree) I get the following: The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced I would like to know what is a factor in R, is it numerical value with no formula or just a word?? Thanks in advance Nicolas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] changing langage
Hi all, I've got a small, practical question, which untill now I couldn't solve (otherwhise I wouldn't mail it, right?) First of all, I'm talking about R 2.0.1 on a winxp system (using the default graphical interface being 'Rgui'). When I make plots, using dates on the x-axis, it puts the labels in Dutch, which is nice (since it's my mother tongue) unless I want them in English... Is there a way to change this behaviour? (Can I change the labels etc to English?) tnx, Kurt Sys __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] defining lower part of distribution with covariate
I try again - perhaps it is analysis of covariance with treatment (thio,ultiva) as two categories and antime as covariate. On the basis of such a model, is then the probability of GCS = 12 larger with thio treatment ? Dear friends, forgive me a simple question, possibly related to quantreg but I failed to get it done and hope for basic instruction. I have two sets of observed Glasgow coma scores at admission to ICU after operation, and accompanying time of anesthesia (in hours). Thio is cheap and perhaps old fashioned, and ultiva expensive and rapidly terminated. The problem is to estimate the probability of GCS 12 or lower on the two treatments after taking time of anesthesia into account (antime) which is longer for thio. How would I do that in the best way ? Best wishes Troels Ring, MD Aalborg, Denmark thio GCS antime [1,] 144.5 [2,] 157.5 [3,] 117.5 [4,] 154.5 [5,] 144.5 [6,] 153.5 [7,] 155.5 [8,] 145.5 [9,] 153.5 [10,] 148.5 [11,] 134.5 [12,] 125.5 [13,] 153.5 [14,] 136.5 [15,] 98.5 [16,] 156.5 ultiva GCS antime [1,] 154.5 [2,] 154.5 [3,] 152.5 [4,] 153.5 [5,] 153.5 [6,] 125.5 [7,] 154.5 [8,] 153.5 [9,] 158.5 [10,] 134.5 [11,] 143.5 [12,] 144.5 [13,] 154.5 [14,] 142.5 [15,] 154.5 [16,] 153.5 [17,] 153.5 [18,] 144.5 [19,] 144.5 [20,] 154.5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters
Hi Kwabena I did once a simulation, generating normal distributed values (500 values) and calculating a KS test with estimated parameters. For 1 times repeating this test I got about 1 significant tests (on a level alpha=0.05 I'm expecting about 500 significant tests by chance) So I think if you estiamte the parameters from the data, you fit to good and the used distribution of the test statistic is not adequate as it is indicated in the help page you cited. There (in the help page) is some literature, but it is no easy stuff to read. Furthermore I know no implementation of an KS test which accounts for this estimation of the parameter. I recommend a graphical tool instead of a test: x - rlnorm(100) qqnorm(log(x)) See also ?qqnorm and ?qqplot. If you insist on testing a theoretical distribution be aware that a non significant test does not mean that your data has the tested distribution (especially if you have few data, there is no power in the test to detect deviations from the theoretical distribution and the conclusion that the data fits well is trappy) If there are enough data I'd prefer a chi square test to the KS test (but even there I use graphical tools instead). See ?chisq For this test you have to specify classes and this is subjective (you can't avoid this). You can reduce the DF of the expected chi square distribution (under H_0) by the number of estimated parameters from the data and will get better results. DF = number of classes - 1 - estimated parameters I think this test is more powerful than the KS test, particularly if you must estimate the parameters from data. Regards, Christoph -- Christoph Buser [EMAIL PROTECTED] Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ Kwabena Adusei-Poku writes: Hello all, Would somebody be kind enough to show me how to do a KS test in R for a lognormal distribution with ESTIMATED parameters. The R function ks.test()says the parameters specified must be prespecified and not estimated from the data Is there a way to correct this when one uses estimated data? Regards, Kwabena. Kwabena Adusei-Poku University of Goettingen Institute of Statistics and Econometrics Platz der Goettingen Sieben 5 37073 Goettingen Germany Tel: +49-(0)551-394794 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] model.response error
When I installed R 2.0.1 (replacing 1.9.0) for Windows, a code using model.response began acting up. Here are the first several lines of a code I had been tweaking for a spatial model (the code is mostly that of Roger Bivand--I am adapting it to a slightly different data structure and the problem I'm sure is with my changes, not his code). command name - function (formula, data = list(), weights, na.action = na.fail, type = lag, quiet = TRUE, zero.policy = FALSE, tol.solve = 1e-07, tol.opt = .Machine$double.eps^0.5, sparsedebug = FALSE) { mt - terms(formula, data = data) mf - lm(formula, data, na.action = na.action, method = model.frame) na.act - attr(mf, na.action) if (!is.matrix.csr(weights)) cat(\nWarning: weights matrix not in sparse form\n) switch(type, lag = if (!quiet) cat(\nSpatial lag model\n), mixed = if (!quiet) cat(\nSpatial mixed autoregressive model\n), stop(\nUnknown model type\n)) if (!quiet) cat(Jacobian calculated using weights matrix eigenvalues\n) y - model.response(mf, numeric) if (any(is.na(y))) stop(NAs in dependent variable) x - model.matrix(mt, mf) if (any(is.na(x))) stop(NAs in independent variable) if (nrow(x) != nrow(weights)) stop(Input data and weights have different dimensions) n - nrow(x) m - ncol(x) When it reads the Y variable in the command: y - model.response(mf, numeric) The error it gives is: Error in model.response(mf, numeric) : No direct or inherited method for function model.response for this call The problem is puzzling me because it is not something I encountered when I was running the same code in 1.9.0, but is causing problems in 2.0.1 Thanks, and any comments on debugging the error are welcome. Jim Well I AM missing the back of my head.you COULD cut me a little slack! -Homer Simpson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters
Christoph Buser wrote: Hi Kwabena I did once a simulation, generating normal distributed values (500 values) and calculating a KS test with estimated parameters. For 1 times repeating this test I got about 1 significant tests (on a level alpha=0.05 I'm expecting about 500 significant tests by chance) So I think if you estiamte the parameters from the data, you fit to good and the used distribution of the test statistic is not adequate as it is indicated in the help page you cited. There (in the help page) is some literature, but it is no easy stuff to read. Furthermore I know no implementation of an KS test which accounts for this estimation of the parameter. I recommend a graphical tool instead of a test: x - rlnorm(100) qqnorm(log(x)) See also ?qqnorm and ?qqplot. If you insist on testing a theoretical distribution be aware that a non significant test does not mean that your data has the tested distribution (especially if you have few data, there is no power in the test to detect deviations from the theoretical distribution and the conclusion that the data fits well is trappy) If there are enough data I'd prefer a chi square test to the KS test (but even there I use graphical tools instead). See ?chisq For this test you have to specify classes and this is subjective (you can't avoid this). You can reduce the DF of the expected chi square distribution (under H_0) by the number of estimated parameters from the data and will get better results. DF = number of classes - 1 - estimated parameters I think this test is more powerful than the KS test, particularly if you must estimate the parameters from data. Regards, Christoph It is also a good idea to ask why one compares against a known distribution form. If you use the empirical CDF to select a parametric distribution, the final estimate of the distribution will inherit the variance of the ECDF. The main reason statisticians think that parametric curve fits are far more efficient than nonparametric ones is that they don't account for model uncertainty in their final confidence intervals. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] changing langage
Kurt Sys [EMAIL PROTECTED] writes: Hi all, I've got a small, practical question, which untill now I couldn't solve (otherwhise I wouldn't mail it, right?) First of all, I'm talking about R 2.0.1 on a winxp system (using the default graphical interface being 'Rgui'). When I make plots, using dates on the x-axis, it puts the labels in Dutch, which is nice (since it's my mother tongue) unless I want them in English... Is there a way to change this behaviour? (Can I change the labels etc to English?) This type of stuff works on Linux at least: Sys.setlocale(LC_ALL,da_DK) # or en_GB, or plot(date,) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters
For the KS-test of normality with estimated parameters see ?lillie.test in package nortest. Best, Christian On Wed, 12 Jan 2005, Christoph Buser wrote: Hi Kwabena I did once a simulation, generating normal distributed values (500 values) and calculating a KS test with estimated parameters. For 1 times repeating this test I got about 1 significant tests (on a level alpha=0.05 I'm expecting about 500 significant tests by chance) So I think if you estiamte the parameters from the data, you fit to good and the used distribution of the test statistic is not adequate as it is indicated in the help page you cited. There (in the help page) is some literature, but it is no easy stuff to read. Furthermore I know no implementation of an KS test which accounts for this estimation of the parameter. I recommend a graphical tool instead of a test: x - rlnorm(100) qqnorm(log(x)) See also ?qqnorm and ?qqplot. If you insist on testing a theoretical distribution be aware that a non significant test does not mean that your data has the tested distribution (especially if you have few data, there is no power in the test to detect deviations from the theoretical distribution and the conclusion that the data fits well is trappy) If there are enough data I'd prefer a chi square test to the KS test (but even there I use graphical tools instead). See ?chisq For this test you have to specify classes and this is subjective (you can't avoid this). You can reduce the DF of the expected chi square distribution (under H_0) by the number of estimated parameters from the data and will get better results. DF = number of classes - 1 - estimated parameters I think this test is more powerful than the KS test, particularly if you must estimate the parameters from data. Regards, Christoph -- Christoph Buser [EMAIL PROTECTED] Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology)8092 Zurich SWITZERLAND phone: x-41-1-632-5414fax: 632-1228 http://stat.ethz.ch/~buser/ Kwabena Adusei-Poku writes: Hello all, Would somebody be kind enough to show me how to do a KS test in R for a lognormal distribution with ESTIMATED parameters. The R function ks.test()says the parameters specified must be prespecified and not estimated from the data Is there a way to correct this when one uses estimated data? Regards, Kwabena. Kwabena Adusei-Poku University of Goettingen Institute of Statistics and Econometrics Platz der Goettingen Sieben 5 37073 Goettingen Germany Tel: +49-(0)551-394794 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] changing langage
Kurt Sys kurt.sys at pandora.be writes: : : Hi all, : : I've got a small, practical question, which untill now I couldn't solve : (otherwhise I wouldn't mail it, right?) First of all, I'm talking about : R 2.0.1 on a winxp system (using the default graphical interface being : 'Rgui'). : When I make plots, using dates on the x-axis, it puts the labels in : Dutch, which is nice (since it's my mother tongue) unless I want them in : English... Is there a way to change this behaviour? (Can I change the : labels etc to English?) Here is an example: R Sys.setlocale(LC_TIME, en-us) [1] English_United States.1252 R format(ISOdate(2004,1:12,1),%B) [1] January February March April May June [7] July AugustSeptember October November December R Sys.setlocale(LC_TIME, du-be) [1] Dutch_Netherlands.1252 R format(ISOdate(2004,1:12,1),%B) [1] januari februari maart april mei juni [7] juli augustus september oktober november december R R.version.string # XP [1] R version 2.1.0, 2005-01-02 For more codes, google for: Microsoft language codes and look at the first result that is on a Microsoft site. This may or may not change your labels depending on precisely what you are doing. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] changing langage
It uses the language set by LC_TIME: see ?Sys.setlocale and ?format.Date which references it. On Wed, 12 Jan 2005, Kurt Sys wrote: Hi all, I've got a small, practical question, which untill now I couldn't solve (otherwhise I wouldn't mail it, right?) First of all, I'm talking about R 2.0.1 on a winxp system (using the default graphical interface being 'Rgui'). When I make plots, using dates on the x-axis, it puts the labels in Dutch, which is nice (since it's my mother tongue) unless I want them in English... Is there a way to change this behaviour? (Can I change the labels etc to English?) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] model.response error
On Wed, 12 Jan 2005, Bang wrote: When I installed R 2.0.1 (replacing 1.9.0) for Windows, a code using model.response began acting up. Here are the first several lines of a code I had been tweaking for a spatial model (the code is mostly that of Roger Bivand--I am adapting it to a slightly different data structure and the problem I'm sure is with my changes, not his code). I don't think it's the R versions, rather the SparseM versions. I think what is happening is the SparseM generic for model.response is being picked up. In the current NAMESPACE file in the spdep package I now have: importFrom(stats, model.matrix, model.response) but I'm not sure that your function is in a package. You will probably need to say that both model.response and model.matrix are from stats, at least this should give you a lead. Best wishes, Roger command name - function (formula, data = list(), weights, na.action = na.fail, type = lag, quiet = TRUE, zero.policy = FALSE, tol.solve = 1e-07, tol.opt = .Machine$double.eps^0.5, sparsedebug = FALSE) { mt - terms(formula, data = data) mf - lm(formula, data, na.action = na.action, method = model.frame) na.act - attr(mf, na.action) if (!is.matrix.csr(weights)) cat(\nWarning: weights matrix not in sparse form\n) switch(type, lag = if (!quiet) cat(\nSpatial lag model\n), mixed = if (!quiet) cat(\nSpatial mixed autoregressive model\n), stop(\nUnknown model type\n)) if (!quiet) cat(Jacobian calculated using weights matrix eigenvalues\n) y - model.response(mf, numeric) if (any(is.na(y))) stop(NAs in dependent variable) x - model.matrix(mt, mf) if (any(is.na(x))) stop(NAs in independent variable) if (nrow(x) != nrow(weights)) stop(Input data and weights have different dimensions) n - nrow(x) m - ncol(x) When it reads the Y variable in the command: y - model.response(mf, numeric) The error it gives is: Error in model.response(mf, numeric) : No direct or inherited method for function model.response for this call The problem is puzzling me because it is not something I encountered when I was running the same code in 1.9.0, but is causing problems in 2.0.1 Thanks, and any comments on debugging the error are welcome. Jim Well I AM missing the back of my head.you COULD cut me a little slack! -Homer Simpson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] defining lower part of distribution with covariate
Troels: It would be best if you discussed this with a local statistician to make sure that the data and analysis are properly addressing the scientific issues. Perhaps that is why no one replied to your previous post. Also, this is primarily a **statistical** issue, not really an ** R-issue **. Having said that, I'll take a stab at it ... Probably the most important thing to say is that there is probably not much that these data can tell you as you only have 36 cases in all and only 3 are = 12. While this probably represented a **lot** of work for you, the simple fact is that when trying to understand what influences dichotomous probabilities, you generally need lots of data (hundreds of cases, typically). Note: This remark may be subject to correction by wiser statisticians. Next, the nature of your response, GCS. It appears to be a subjective rating score that is probably best modeled as an ordered categorical response, which in R is called an ordered factor. Dichotomizing it to =12/12 loses information. Treating it as a continuous response (quantreg/ancova) seems inappropriate for your data. Finally, the model. Considering GCS to be an ordered category, a reasonable modeling strategy seems to be proportional odds logistic regression, which models the GCS response as a linear function of the anstimes and anstypes (which encompasses your ancova ideas). The results from this model would then allow you to calculate the =12 probability if you chose to do so. This model can be fit using the polr() function in the MASS package. However, I again urge you to discuss this with a local statistically knowledgeable resource -- and not to expect too much from such rather meager data. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Troels Ring (by way of Troels Ring [EMAIL PROTECTED]) Sent: Wednesday, January 12, 2005 9:11 AM To: R-help Subject: [R] defining lower part of distribution with covariate I try again - perhaps it is analysis of covariance with treatment (thio,ultiva) as two categories and antime as covariate. On the basis of such a model, is then the probability of GCS = 12 larger with thio treatment ? Dear friends, forgive me a simple question, possibly related to quantreg but I failed to get it done and hope for basic instruction. I have two sets of observed Glasgow coma scores at admission to ICU after operation, and accompanying time of anesthesia (in hours). Thio is cheap and perhaps old fashioned, and ultiva expensive and rapidly terminated. The problem is to estimate the probability of GCS 12 or lower on the two treatments after taking time of anesthesia into account (antime) which is longer for thio. How would I do that in the best way ? Best wishes Troels Ring, MD Aalborg, Denmark thio GCS antime [1,] 144.5 [2,] 157.5 [3,] 117.5 [4,] 154.5 [5,] 144.5 [6,] 153.5 [7,] 155.5 [8,] 145.5 [9,] 153.5 [10,] 148.5 [11,] 134.5 [12,] 125.5 [13,] 153.5 [14,] 136.5 [15,] 98.5 [16,] 156.5 ultiva GCS antime [1,] 154.5 [2,] 154.5 [3,] 152.5 [4,] 153.5 [5,] 153.5 [6,] 125.5 [7,] 154.5 [8,] 153.5 [9,] 158.5 [10,] 134.5 [11,] 143.5 [12,] 144.5 [13,] 154.5 [14,] 142.5 [15,] 154.5 [16,] 153.5 [17,] 153.5 [18,] 144.5 [19,] 144.5 [20,] 154.5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] transfer function models
I don't know what SAS does, but transfer functions are essentially MA/AR from an ARMA model, so you should be able to get what you want from the various ARMA estimation tools in R. Paul Gilbert Samuel Kemp (Comp) wrote: Hi, Does anyone know of a function in R that can estimate the parameters of a transfer function model with added noise like in SAS? Thanks in advance, Sam. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Standard error for the area under a smoothed ROC curve?
Dan Bolser wrote: On Wed, 12 Jan 2005, Frank E Harrell Jr wrote: Dan Bolser wrote: Hello, I am making some use of ROC curve analysis. I find much help on the mailing list, and I have used the Area Under the Curve (AUC) functions from the ROC function in the bioconductor project... http://www.bioconductor.org/repository/release1.5/package/Source/ ROC_1.0.13.tar.gz However, I read here... http://www.medcalc.be/manual/mpage06-13b.php The 95% confidence interval for the area can be used to test the hypothesis that the theoretical area is 0.5. If the confidence interval does not include the 0.5 value, then there is evidence that the laboratory test does have an ability to distinguish between the two groups (Hanley McNeil, 1982; Zweig Campbell, 1993). But aside from early on the above article is short on details. Can anyone tell me how to calculate the CI of the AUC calculation? I read this... http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf Which talks about resampling (by showing R code), but I can't understand what is going on, or what is calculated (the example given is specific to microarray analysis I think). I think a general AUC CI function would be a good addition to the ROC package. One more thing, in calculating the AUC I see the splines function is recomended over the approx function. Here... http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html How would I rewrite the following AUC functions (adapted from bioconductor source) to use splines (or approxfun or splinefun) ... spe # Specificity [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826 [7] 0.65217391 0.76086957 0.89130435 1. 1. 1. [13] 1. sen # Sensitivity [1] 1.000 1.000 1.000 1.000 1.000 0.9302326 0.8139535 [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791 trapezint(1-spe,sen) my.integrate(1-spe,sen) ## Functions ## Nicked (and modified) from the ROC function in bioconductor. trapezint - function (x, y, a = 0, b = 1) { if (x[1] x[length(x)]) { x - rev(x) y - rev(y) } y - y[x = a x = b] x - x[x = a x = b] if (length(unique(x)) 2) return(NA) ya - approx(x, y, a, ties = max, rule = 2)$y yb - approx(x, y, b, ties = max, rule = 2)$y x - c(a, x, b) y - c(ya, y, yb) h - diff(x) lx - length(x) 0.5 * sum(h * (y[-1] + y[-lx])) } my.integrate - function (x, y, t0 = 1) { f - function(j) approx(x,y,j,rule=2,ties=max)$y integrate(f, 0, t0)$value } Thanks for any pointers, Dan. I don't see why the above formulas are being used. The Bamber-Hanley-McNeil-Wilcoxon-Mann-Whitney nonparametric method works great. Just get the U statistic (concordance probability) used in Wilcoxon. As Somers' Dxy rank correlation coefficient is 2*(1-C) where C is the concordance or ROC area, the Hmisc package function rcorr.cens uses U statistic methods to get the standard error of Dxy. You can easily translate this to a standard error of C. I am sure I could do this easily, except I can't. The good thing about ROC is that I understand it (I can see it). I know why the area means what it means, and I could even imagine how sampling the data could give a CI on the area. However, I don't know why the area under the ROC curve is well known to be equivalent to the numerator of the Mann-Whitney U statistic - from http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf Nor do I know how to calculate the numerator of the Mann-Whitney U statistic. This is clear in the original Bamber or Hanley-McNeil articles. The ROC area is a linear translation of the mean rank of predicted values in one of the two outcome groups. The little somers2 function in Hmisc shows this: ##S function somers2 ## ##Calculates concordance probability and Somers' Dxy rank correlation ##between a variable X (for which ties are counted) and a binary ##variable Y (having values 0 and 1, for which ties are not counted). ##Uses short cut method based on average ranks in two groups. ## ##Usage: ## ## somers2(X,Y) ## ##Returns vector whose elements are C Index, Dxy, n and missing, where ##C Index is the concordance probability and Dxy=2(C Index-.5). ## ##F. Harrell 28 Nov 90 6 Apr 98: added weights somers2 - function(x, y, weights=NULL, normwt=FALSE, na.rm=TRUE) { if(length(y)!=length(x))stop(y must have same length as x) y - as.integer(y) wtpres - length(weights) if(wtpres (wtpres != length(x))) stop('weights must have same length as x') if(na.rm) { miss - if(wtpres) is.na(x + y + weights) else is.na(x + y) nmiss - sum(miss) if(nmiss0) { miss - !miss x - x[miss] y - y[miss] if(wtpres) weights - weights[miss] } } else nmiss - 0 u - sort(unique(y)) if(any(y %nin% 0:1)) stop('y must be binary') ## 7dec02 if(wtpres) { if(normwt) weights -
Re: [R] global objects not overwritten within function
Apparently the message below wasn't posted on R-help, so I'm sending it again. Sorry if you received it twice. --- bogdan romocea [EMAIL PROTECTED] wrote: Date: Tue, 11 Jan 2005 17:31:42 -0800 (PST) From: bogdan romocea [EMAIL PROTECTED] Subject: Re: [R] global objects not overwritten within function Thank you to everyone who replied. I had no idea that ... means something in R, I only wanted to make the code look simpler. I'm pasting below the functional equivalent of what took me yesterday a couple of hours to debug. Function f() takes several arguments (that's why I want to have the code as a function) and creates several objects. I then need to use those objects in another function fct(), and I want to overwrite them to save memory (they're pretty large). It appears that Robert's guess (dynamic/lexical scoping) explains what's going on. I've noticed though another strange (to me) issue: without indexing (such as obj1 - obj1[obj1 0] - which I need to use though), fct() prints the expected values even without removing the objects after each iteration. However, after indexing is introduced, rm() must be used to make fct() return the intended output. How would that be explained? Kind regards, b. f - function(read,position){ obj1 - 5 * read[position]:(read[position]+5) obj2 - 7 * read[position]:(read[position]+5) assign(obj1,obj1,.GlobalEnv) assign(obj2,obj2,.GlobalEnv) } fct - function(input){ for (i in 1:5) { f(input,i) obj1 - obj1[obj1 0] obj2 - obj2[obj2 0] print(obj1) print(obj2) # rm(obj1,obj2) #get intended results with this line } } a - 1:10 fct(a) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] changing langage [SOLVED]
To all that replied, thanks... I have a clue where I can change the settings. tnx, Kurt Sys - Oorspronkelijk bericht - Van : Gabor Grothendieck [mailto:[EMAIL PROTECTED] Verzonden : woensdag , januari 12, 2005 05:26 PM Aan : r-help@stat.math.ethz.ch Onderwerp : Re: [R] changing langage Kurt Sys kurt.sys at pandora.be writes: : : Hi all, : : I've got a small, practical question, which untill now I couldn't solve : (otherwhise I wouldn't mail it, right?) First of all, I'm talking about : R 2.0.1 on a winxp system (using the default graphical interface being : 'Rgui'). : When I make plots, using dates on the x-axis, it puts the labels in : Dutch, which is nice (since it's my mother tongue) unless I want them in : English... Is there a way to change this behaviour? (Can I change the : labels etc to English?) Here is an example: R Sys.setlocale(LC_TIME, en-us) [1] English_United States.1252 R format(ISOdate(2004,1:12,1),%B) [1] January February March April May June [7] July AugustSeptember October November December R Sys.setlocale(LC_TIME, du-be) [1] Dutch_Netherlands.1252 R format(ISOdate(2004,1:12,1),%B) [1] januari februari maart april mei juni [7] juli augustus september oktober november december R R.version.string # XP [1] R version 2.1.0, 2005-01-02 For more codes, google for: Microsoft language codes and look at the first result that is on a Microsoft site. This may or may not change your labels depending on precisely what you are doing. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gbm
Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. thanks, Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Finding seasonal peaks in a time series....
I have a seasonal time series. I want to calculate the annual mean value of the time series at its peak (say the mean of the three values before the peak, the peak, and the three values after the peak). The peak of the time series might change cycle slightly from year to year. # E.g., nPts - 254 foo - sin((2 * pi * 1/24) * 1:nPts) foo - foo + rnorm(nPts, 0, 0.05) bar - ts(foo, start = c(1980,3), frequency = 24) plot(bar) start(bar) end(bar) # I want to find the peak value from each year, and then get the mean of the values on either side. # So, if the peak value in the year 1981 is max.in.1981 - max(window(bar, start = c(1981,1), end = c(1981,24))) # e.g, cycle 7 or 8 window(bar, start = c(1981,1), end = c(1981,24)) == max.in.1981 # E.g. if the highest value in 1981 is in cycle 8 I want mean.in.1981 - mean(window(bar, start = c(1981,5), end = c(1981,11))) plot(bar) points(ts(mean.in.1981, start = c(1981,8), frequency = 24), col = red, pch = +) Is there a way to automate this for each year. How can I return the cycle of the max value by year? Thanks in advance. -DC __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gbm
I just got 25 hits from www.r-project.org - search - R site search. Might one or more of these help you? If they don't solve your problem, I suggest you try the posting guide! http://www.R-project.org/posting-guide.html;. If that still doesn't solve your problem, it should help you phrase your question to increase the chances of getting a helpful reply. hope this helps. spencer graves Weiwei Shi wrote: Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. thanks, Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gbm
Weiwei Shi [EMAIL PROTECTED] writes: Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. What is gbm? Green Belt Movement? Georgie Boy Manufacturing? I'm serious! Well, only sort of, but try Google on gbm and you'll find those two expansions and several others like them. I suppose you mean Gradient Boosting Machine, or Generalized Boosted regression Models. Have you followed up on the references and examples on its help page? -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gbm
You can also check out: http://www.i-pensieri.com/gregr/gbm.shtml There are reference papers on there, too. HTH, Danny On Wed, 12 Jan 2005, Peter Dalgaard wrote: Weiwei Shi [EMAIL PROTECTED] writes: Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. What is gbm? Green Belt Movement? Georgie Boy Manufacturing? I'm serious! Well, only sort of, but try Google on gbm and you'll find those two expansions and several others like them. I suppose you mean Gradient Boosting Machine, or Generalized Boosted regression Models. Have you followed up on the references and examples on its help page? -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Off Topic: Statistical philosophy rant
R-Listers. The following is a rant originally sent privately to Frank Harrell in response to remarks he made on this list. The ideas are not new or original, but he suggested I share it with the list, as he felt that it might be of wider interest, nonetheless. I have real doubts about this, and I apologize in advance to those who agree that I should have kept my remarks private. In view of this, if you wish to criticize my remarks on list, that's fine, but I won't respond (I've said enough already!). I would be happy to discuss issues (a little) further off list with anyone who wishes to bother, but not on list. Also, Frank sent me a relevant reference for those who might wish to read a more thoughtful consideration of the issues: @ARTICLE{far92cos, author = {Faraway, J. J.}, year = 1992, title = {The cost of data analysis}, journal = J Comp Graphical Stat, volume = 1, pages = {213-229}, annote = {bootstrap; validation; predictive accuracy; modeling strategy; regression diagnostics;model uncertainty} } I welcome further relevant references, pro or con! Finally, I need to emphasize that these are clearly my very personal views and do not reflect those of my company or colleagues. Cheers to all ... --- The relevant portion of Frank's original comment was in a thread about K-S tests for the goodness of fit of a parametric distribution: ... If you use the empirical CDF to select a parametric distribution, the final estimate of the distribution will inherit the variance of the ECDF. The main reason statisticians think that parametric curve fits are far more efficient than nonparametric ones is that they don't account for model uncertainty in their final confidence intervals. -- Frank Harrell My reply: That's a perceptive remark, but I would go further... You mentioned **model** uncertainty. In fact, in any data analysis in which we explore the data first to choose a model, fit the model (parametric or non..), and then use whatever (pivots from parametric analysis; bootstrapping;...) to say something about model uncertainty, we're always kidding ourselves and our colleagues because we fail to take into account the considerable variability introduced by our initial subjective exploration and subsequent choice of modeling strategy. One can only say (at best) that the stated model uncertainty is an underestimate of the true uncertainty. And very likely a considerable underestimate because of the model choice subjectivity. Now I in no way wish to discourage or abridge data exploration; only to point out that we statisticians have promulgated a self-serving and unrealistic view of the value of formal inference in quantifying true scientific uncertainty when we do such exploration -- and that there is therefore something fundamentally contradictory in our own rhetoric and methods. Taking a larger view, I think this remark is part of the deeper epistemological issue of characterizing what can be scientifically known or, indeed, defining the difference between science and art, say. My own view is that scientific certainty is a fruitless concept: we build models that we benchmark against our subjective measurements (as the measurements themselves depend on earlier scientific models) of reality. Insofar as data can limit or support our flights of modeling fancy, they do; but in the end, it is neither an objective process nor one whose uncertainty can be strictly quantified. In creating the illusion that statistical methods can overcome these limitations, I think we have both done science a disservice and relegated ourselves to an isolated, fringe role in scientific inquiry. Needless to say, opposing viewpoints to such iconclastic remarks are cheerfully welcomed. Best regards, Bert Gunter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gbm
Hi, there: Thanks a lot for all people' prompt replies. In detail, I am facing a huge amount of data: over 10,000 and 400 vars. This project is very challenging and interesting to me. I tried rpart which gives me some promising results but not good enough. So I am trying randomForest and gbm now. My plan of using gbm is like this: rt-rpart(...) gbm(formula(rt)...) Does this work? (My first question) My another CONCERN FOR GBM is the scalability since I realize R seems to load all the data into memory. (My second question) But I believe the idea above will run very slowly. (I think I might try TreeNet, though I don't like it since it is commercial.). BTW, sampling might be a good idea, but it does not seem a good idea for my project from previous experiments. I read some reference mentioned earlier by helpers before I sent my first email. But I still appreciate any helps. You guys are so nice! BTW, gbm means gradient boosting modeling :) Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Off Topic: Statistical philosophy rant
On Wed, 12 Jan 2005, Berton Gunter wrote: R-Listers. The following is a rant originally sent privately to Frank Harrell in response to remarks he made on this list. The ideas are not new or original, but he suggested I share it with the list, as he felt that it might be of wider interest, nonetheless. I have real doubts about this, and I apologize in advance to those who agree that I should have kept my remarks private. In view of this, if you wish to criticize my remarks on list, that's fine, but I won't respond (I've said enough already!). I would be happy to discuss issues (a little) further off list with anyone who wishes to bother, but not on list. Also, Frank sent me a relevant reference for those who might wish to read a more thoughtful consideration of the issues: @ARTICLE{far92cos, author = {Faraway, J. J.}, year = 1992, title = {The cost of data analysis}, journal = J Comp Graphical Stat, volume = 1, pages = {213-229}, annote = {bootstrap; validation; predictive accuracy; modeling strategy; regression diagnostics;model uncertainty} } I welcome further relevant references, pro or con! Finally, I need to emphasize that these are clearly my very personal views and do not reflect those of my company or colleagues. Cheers to all ... --- The relevant portion of Frank's original comment was in a thread about K-S tests for the goodness of fit of a parametric distribution: ... If you use the empirical CDF to select a parametric distribution, the final estimate of the distribution will inherit the variance of the ECDF. The main reason statisticians think that parametric curve fits are far more efficient than nonparametric ones is that they don't account for model uncertainty in their final confidence intervals. -- Frank Harrell My reply: That's a perceptive remark, but I would go further... You mentioned **model** uncertainty. In fact, in any data analysis in which we explore the data first to choose a model, fit the model (parametric or non..), and then use whatever (pivots from parametric analysis; bootstrapping;...) to say something about model uncertainty, we're always kidding ourselves and our colleagues because we fail to take into account the considerable variability introduced by our initial subjective exploration and subsequent choice of modeling strategy. One can only say (at best) that the stated model uncertainty is an underestimate of the true uncertainty. And very likely a considerable underestimate because of the model choice subjectivity. Now I in no way wish to discourage or abridge data exploration; only to point out that we statisticians have promulgated a self-serving and unrealistic view of the value of formal inference in quantifying true scientific uncertainty when we do such exploration -- and that there is therefore something fundamentally contradictory in our own rhetoric and methods. Taking a larger view, I think this remark is part of the deeper epistemological issue of characterizing what can be scientifically known or, indeed, defining the difference between science and art, say. My own view is that scientific certainty is a fruitless concept: we build models that we benchmark against our subjective measurements (as the measurements themselves depend on earlier scientific models) of reality. Insofar as data can limit or support our flights of modeling fancy, they do; but in the end, it is neither an objective process nor one whose uncertainty can be strictly quantified. I totally agree with the above and I am totally unqualified to comment on the below. You (and others) might find these papers interesting... http://www.santafe.edu/~chaos/chaos/pubs.htm Specifically papers like... Synchronizing to the Environment: Information Theoretic Constraints on Agent Learning. http://www.santafe.edu/~cmg/papers/stte.pdf Is Anything Ever New? Considering Emergence. http://www.santafe.edu/~cmg/papers/EverNew.pdf Observing Complexity and The Complexity of Observation http://www.santafe.edu/~cmg/papers/OCACO.pdf What Lies Between Order and Chaos? http://www.santafe.edu/~cmg/papers/wlboac.pdf And probably many more. In creating the illusion that statistical methods can overcome these limitations, I think we have both done science a disservice and relegated ourselves to an isolated, fringe role in scientific inquiry. Needless to say, opposing viewpoints to such iconclastic remarks are cheerfully welcomed. Does it make any difference to the mass of Saturn? Dan. Best regards, Bert Gunter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] gbm
From: Weiwei Shi Hi, there: Thanks a lot for all people' prompt replies. In detail, I am facing a huge amount of data: over 10,000 and 400 vars. This project is very challenging and interesting to me. I tried rpart which gives me some promising results but not good enough. So I am trying randomForest and gbm now. My plan of using gbm is like this: rt-rpart(...) gbm(formula(rt)...) Does this work? (My first question) Given a machine with sufficient memory and CPU speed, yes. My another CONCERN FOR GBM is the scalability since I realize R seems to load all the data into memory. (My second question) We have dealt with data larger than what you described. One thing to avoid is the use of the formula interface if you have _lots_ (like, hundreds) of variables. gbm.fit(), I believe, was created for that reason. But I believe the idea above will run very slowly. (I think I might try TreeNet, though I don't like it since it is commercial.). BTW, sampling might be a good idea, but it does not seem a good idea for my project from previous experiments. To me being commercial is not a crime. I judge software on quality, ease of use, access to source (if I need it), etc. To me, TreeNet failed on several of those criteria, but it works just fine for some people. I read some reference mentioned earlier by helpers before I sent my first email. But I still appreciate any helps. You guys are so nice! That's no excuse for not following the posting guide, right? BTW, gbm means gradient boosting modeling :) No. I believe Greg calls it `generalized boosting models'. Andy Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Changing the ranges for the axis in image()
There's something that you're not telling me. If you want something other than your data, why use your data. If you have another set of data that you wish to overlay on the image then you are going to have to scale one of the data sources to match the other. I'm not sure where your problem is but the code below might prove useful in you understanding how the plotting occurs. I assume that image always use the -.25, 1.25 limits, but this would need to be confirmed. I assume that when you talk about trying to use label, you are referring to the Hmisc package. I don't use this function so I can't give advice about it. However I think you probably need to be more familiar before you will get the best out of Frank's package. Someone else on the list may be able to help in the use of this function. x-matrix(c(1,1,0,1,0,1,0,1,1),3,3) # dummy secondary data y - runif(20) * 20 z - runif(20) * 20 y1 - (y/(20/1.5)) - 0.25 # rescale y from 0 to 20 to -.25 to 1.25 z1 - (z/(20/1.5)) - 0.25 x;y;z;y1;z1 par(mfrow = c(1,3)) image (x,xlim =c(-0.25,1.25),ylim = c(-0.25,1.25)) image(x,xlim =c(0,0.5),ylim = c(0.1,0.9),axes = FALSE) points(y1,z1) image (x,axes = FALSE,xlim =c(-0.25,1.25),ylim = c(-0.25,1.25)) points(y1,z1) rect(0,0.1,0.5,0.9) axis(2,at = seq(-0.25,1.25,length = 5),labels = seq(0,20, length = 5)) Tom -Original Message- From: Costas Vorlow [mailto:[EMAIL PROTECTED] Sent: Wednesday, 12 January 2005 6:22 PM To: Mulholland, Tom Subject: Re: [R] Changing the ranges for the axis in image() Dear Tom, Thanks. What happens though If I want an entirely different range than that of my data? I am trying with label() but it doesn't work properly. Best, Costas Mulholland, Tom wrote: Setting Axes = FALSE does not remove the axes, you can therefore still set the limits using xlim and ylim. x-matrix(c(1,1,0,1,0,1,0,1,1),3,3) par(mfrow = c(1,2)) image (x) image(x,xlim =c(0.5,0.8),ylim = c(0.1,0.9),axes = FALSE) Tom -Original Message- From: Costas Vorlow [ mailto:[EMAIL PROTECTED] Sent: Tuesday, 11 January 2005 9:29 PM To: r-help@stat.math.ethz.ch Subject: [R] Changing the ranges for the axis in image() Dear all, I can not find/understand the solution to this from the help pages: Say we have the following script: x-matrix(c(1,1,0,1,0,1,0,1,1),3,3) image(x) How can I change the ranges on the vertical and horizontal axis to a range of my own or at least place a box frame around the image if I choose to use axes=FALSE? Apologies for such a bsic question and thanks beforehand for your answers. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- This e-mail contains information intended for the addressee only. It may be confidential and may be the subject of legal and/or professional Privilege. Any dissemination, distribution, copyright or use of this communication without prior permission of the addressee is strictly prohibited. --- Costas E. Vorlow | Tel: +44 (0)191 33 45727 Durham Business School | Fax: +44 (0)191 33 45201 Room (324), University of Durham, | email: K.E.Vorloou(at)durham.ac.uk Mill Hill Lane,| or : costas(at)vorlow.org Durham DH1 3LB, UK.| http://www.vorlow.org http://ssrn.com/author=341149 | replace (at) with @ for my email Fingerprint: B010 577A 9EC3 9185 08AE 8F22 1A48 B4E7 9FA6 C31A How empty is theory in presence of fact! (Mark Twain, 1889) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] multivariate diagnostics
Hi, there. I have two questions about the diagnostics in multivarite statistics. 1. Is there any diagnostics tool to check if a multivariate sample is from multivariate normal distribution? If there is one, is there any function doing it in R? 2. Is there any function of testing if two multivariate distribution are same, i.e. the multivariate extension of Kolomogrov-Smirnov test? Thanks for your help. Yulei $$$ Yulei He 1586 Murfin Ave. Apt 37 Ann Arbor, MI 48105-3135 [EMAIL PROTECTED] 734-647-0305(H) 734-763-0421(O) 734-763-0427(O) 734-764-8263(fax) $$ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Off Topic: Statistical philosophy rant
I have often noted that statistics can't prove a damn thing, but they can be really useful in disproving something. Having spent most of 80s and half of the 90s with the Australian Bureau of Statistics to find out how you collect these numbers, I am disconcerted at the apparent disregard for measurement issues such as bias, input error, questionnaire design etc etc. ... Science wars ... the real world ... and the not so real world. Having only recently discovered what our esteemed J Baron does I should say that a lot of his work requires us to ask how we use (abuse?) the tools we have. Having said that some of my most influential work has come from data exploration within fields where I would describe myself as a complete novice. Using ony the phrase the data seems to indicate realtionship x with y or some variant and asking if this is an accepted norm has produced some unexpected paradigm shifts. Someone on the list has a footline of something along the lines of All models are wrong, but some of them are useful. I think this is attributed to Box. As most of us know some of the advice on this list has more sage than others. That all concludes to say the manner in which we deal with non-model uncertainty, impacts upon the degree to which we perform a disservice to science/ourselves. I think you are being unduly pessimistic, but then again I might just be a cynic masquerading as a realist. Tom -Original Message- ... That's a perceptive remark, but I would go further... You mentioned **model** uncertainty. In fact, in any data analysis in which we explore the data first to choose a model, fit the model (parametric or non..), and then use whatever (pivots from parametric analysis; bootstrapping;...) to say something about model uncertainty, we're always kidding ourselves and our colleagues because we fail to take into account the considerable variability introduced by our initial subjective exploration and subsequent choice of modelling strategy. One can only say (at best) that the stated model uncertainty is an underestimate of the true uncertainty. And very likely a considerable underestimate because of the model choice subjectivity. Now I in no way wish to discourage or abridge data exploration; only to point out that we statisticians have promulgated a self-serving and unrealistic view of the value of formal inference in quantifying true scientific uncertainty when we do such exploration -- and that there is therefore something fundamentally contradictory in our own rhetoric and methods. Taking a larger view, I think this remark is part of the deeper epistemological issue of characterizing what can be scientifically known or, indeed, defining the difference between science and art, say. My own view is that scientific certainty is a fruitless concept: we build models that we benchmark against our subjective measurements (as the measurements themselves depend on earlier scientific models) of reality. Insofar as data can limit or support our flights of modeling fancy, they do; but in the end, it is neither an objective process nor one whose uncertainty can be strictly quantified. In creating the illusion that statistical methods can overcome these limitations, I think we have both done science a disservice and relegated ourselves to an isolated, fringe role in scientific inquiry. Needless to say, opposing viewpoints to such iconclastic remarks are cheerfully welcomed. Best regards, Bert Gunter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Finding seasonal peaks in a time series....
You might find breakpoints in strucchange helpful Tom -Original Message- From: Dr Carbon [mailto:[EMAIL PROTECTED] Sent: Thursday, 13 January 2005 6:19 AM To: r-help@stat.math.ethz.ch Subject: [R] Finding seasonal peaks in a time series I have a seasonal time series. I want to calculate the annual mean value of the time series at its peak (say the mean of the three values before the peak, the peak, and the three values after the peak). The peak of the time series might change cycle slightly from year to year. # E.g., nPts - 254 foo - sin((2 * pi * 1/24) * 1:nPts) foo - foo + rnorm(nPts, 0, 0.05) bar - ts(foo, start = c(1980,3), frequency = 24) plot(bar) start(bar) end(bar) # I want to find the peak value from each year, and then get the mean of the values on either side. # So, if the peak value in the year 1981 is max.in.1981 - max(window(bar, start = c(1981,1), end = c(1981,24))) # e.g, cycle 7 or 8 window(bar, start = c(1981,1), end = c(1981,24)) == max.in.1981 # E.g. if the highest value in 1981 is in cycle 8 I want mean.in.1981 - mean(window(bar, start = c(1981,5), end = c(1981,11))) plot(bar) points(ts(mean.in.1981, start = c(1981,8), frequency = 24), col = red, pch = +) Is there a way to automate this for each year. How can I return the cycle of the max value by year? Thanks in advance. -DC __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Please unsubscribe me from you list
You stand more chance if you do it yourself https://stat.ethz.ch/mailman/listinfo/r-help -Original Message- From: Kevin Ita [mailto:[EMAIL PROTECTED] Sent: Thursday, 13 January 2005 2:25 AM To: R-help@stat.math.ethz.ch Subject: [R] Please unsubscribe me from you list Please unsubscribe me from your list. Thank you. Kevin - The all-new My Yahoo! - What will yours do? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Finding seasonal peaks in a time series....
Sorry I didn't read the question properly. Please disregard, my mind was elsewhere. Tom -Original Message- From: Mulholland, Tom Sent: Thursday, 13 January 2005 10:52 AM To: Dr Carbon; r-help@stat.math.ethz.ch Subject: RE: [R] Finding seasonal peaks in a time series You might find breakpoints in strucchange helpful Tom -Original Message- From: Dr Carbon [mailto:[EMAIL PROTECTED] Sent: Thursday, 13 January 2005 6:19 AM To: r-help@stat.math.ethz.ch Subject: [R] Finding seasonal peaks in a time series I have a seasonal time series. I want to calculate the annual mean value of the time series at its peak (say the mean of the three values before the peak, the peak, and the three values after the peak). The peak of the time series might change cycle slightly from year to year. # E.g., nPts - 254 foo - sin((2 * pi * 1/24) * 1:nPts) foo - foo + rnorm(nPts, 0, 0.05) bar - ts(foo, start = c(1980,3), frequency = 24) plot(bar) start(bar) end(bar) # I want to find the peak value from each year, and then get the mean of the values on either side. # So, if the peak value in the year 1981 is max.in.1981 - max(window(bar, start = c(1981,1), end = c(1981,24))) # e.g, cycle 7 or 8 window(bar, start = c(1981,1), end = c(1981,24)) == max.in.1981 # E.g. if the highest value in 1981 is in cycle 8 I want mean.in.1981 - mean(window(bar, start = c(1981,5), end = c(1981,11))) plot(bar) points(ts(mean.in.1981, start = c(1981,8), frequency = 24), col = red, pch = +) Is there a way to automate this for each year. How can I return the cycle of the max value by year? Thanks in advance. -DC __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] New package: MatchIt
We would like to announce the release of our software MatchIt, now available on CRAN. MatchIt implements a variety of matching methods for causal inference. Abstract: MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2004) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing data with MatchIt, researchers can use whatever parametric model they would have used without MatchIt, but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig. For more information, please see http://gking.harvard.edu/matchit/. Comments and suggestions are welcome. Sincerely, Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html