Re: [R] Tinn-R user guide (latex sources) available on GitHub
Thank you for doing this! On Wed, Nov 27, 2013 at 11:22 AM, Jose Claudio Faria joseclaudio.fa...@gmail.com wrote: Dears user, The Tinn-R User Guide is completely written in LaTeX and the idea behind this to be available on GitHub is that it has contributions from multiple users. If you find something that you would like to include or impruve: please, fell free to make it better. This User Guide have been developing under both OS: Windows and Linux. Under Windows: we have been using Tinn-R as editor and MikTeX as compiler. Under Linux: we have been using Vim (with LaTeX-Box plugin) as editor and TexLive as compiler. Link: https://github.com/jcfaria/Tinn-R-User-Guide Regards, -- ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\ Jose Claudio Faria Estatistica UESC/DCET/Brasil joseclaudio.faria at gmail.com Telefones: 55(73)3680.5545 - UESC 55(73)9100.7351 - TIM 55(73)8817.6159 - OI ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Should there be an R-beginners list?
StackOverflow has certainly its merits, although I miss a bit the good ol' Oxford sarcasm gems you find in this list. This said : Beginner's list. Bad, bad idea. First rule in my classes is: RTFI (Read The Fucking Internetzz). Anybody using R should be able to do a basic Google search. A beginner's list is not going to help them in learning that. If beginners do the effort of following the posting guidelines, netiquette or any other guide to getting help on the internet, they can safely use this list. Cheers Joris On Wed, Nov 27, 2013 at 9:47 AM, Rolf Turner r.tur...@auckland.ac.nzwrote: On 11/25/13 09:04, Rich Shepard wrote: On Sun, 24 Nov 2013, Yihui Xie wrote: Mailing lists are good for a smaller group of people, and especially good when more focused on discussions on development (including bug reports). The better place for questions is a web forum. I disagree. Mail lists push messages to subscribers while web fora require one to use a browser, log in, then pull messages. Not nearly as convenient. Well expressed Rich. I agree with you completely. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems when Apply a script to a list
Where exactly did you put the sink() statement? I tried it with a 1000 dataframes and I have no problem whatsoever. Cheers Joris On Fri, Aug 27, 2010 at 6:56 AM, ev...@aueb.gr wrote: Joris, thank you very much for your help. It is very helpful for me. I still have a problem with sink stack although I put after my last print result sink(). I want to ask you another one question. Do you know how can I have at the sink output file a message for which one set of the list are the exact results. When I use for instead of apply I had cat(\n***\nEstimation \n\nDataset Sim : , i ) Thank you in advance Evgenia Joris Meys writes: Answers below. On Thu, Aug 26, 2010 at 11:20 AM, Evgenia ev...@aueb.gr wrote: Dear users, ***I have a function f to simulate data from a model (example below used only to show my problems) f-function(n,mean1){ a-matrix(rnorm(n, mean1 , sd = 1),ncol=5) b-matrix(runif(n),ncol=5) data-rbind(a,b) out-data out} *I want to simulate 1000 datasets (here only 5) so I use S-list() for (i in 1:5){ S[[i]]-f(n=10,mean1=0)} **I have a very complicated function for estimation of a model which I want to apply to Each one of the above simulated datasets fun-function(data){data-as.matrix(data) sink(' Example.txt',append=TRUE) cat(\n***\nEstimation \n\nDataset Sim : , i ) d-data%*%t(data) s-solve(d) print(s) out-s out } results-list() for (i in 1:5){results[[i]]-fun(data=S[[i]])} My questions are: 1) for some datasets system is computational singular and this causes execution of the for to stop.By this way I have only results until this problem happens.How can I pass over the execution for this step and have results for All other datasets for which function fun is applicable? see ?try, or ?tryCatch. I'd do something in the line of for(i in 1:5){ tmp - try(fun(data=S[[i]])) results[[i]] - ifelse(is(tmp,try-error),NA,tmp) } Alternatively, you could also use lapply : results - lapply(S,function(x{ tmp - try(fun(data=x)) ifelse(is(tmp,try-error),NA,tmp) }) 2) After some runs to my program, I receive this error message someError in sink(output.txt) : sink stack is full . How can I solve this problem, as I want to have results of my program for 1000 datasets. That is because you never empty the sink. add sink() after the last line you want in the file. This will empty the sink buffer to the file. Otherwise R keeps everything in the memory, and that gets too full after a while. 3) Using for is the correct way to run my proram for a list See the lapply solution. Thanks alot Evgenia -- View this message in context: http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change value of a slot of an S4 object within a method.
OK, I admit. It will never win a beauty price, but I won't either so we go pretty well together. Seriously, I am aware this is about the worst way to solve such a problem. But as I said, rewriting the class definitions wasn't an option any more. I'll definitely take your advice for the next project. Cheers Joris On Wed, Aug 25, 2010 at 9:36 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Howdy, My eyes start to gloss over on their first encounter of `substitute` ... add enough `eval`s and (even) an `expression` (!) to that, and you'll probably see me running for the hills ... but hey, if it makes sense to you, more power to you ;-) Glad you found a fix that works, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems when Apply a script to a list
Answers below. On Thu, Aug 26, 2010 at 11:20 AM, Evgenia ev...@aueb.gr wrote: Dear users, ***I have a function f to simulate data from a model (example below used only to show my problems) f-function(n,mean1){ a-matrix(rnorm(n, mean1 , sd = 1),ncol=5) b-matrix(runif(n),ncol=5) data-rbind(a,b) out-data out} *I want to simulate 1000 datasets (here only 5) so I use S-list() for (i in 1:5){ S[[i]]-f(n=10,mean1=0)} **I have a very complicated function for estimation of a model which I want to apply to Each one of the above simulated datasets fun-function(data){data-as.matrix(data) sink(' Example.txt',append=TRUE) cat(\n***\nEstimation \n\nDataset Sim : , i ) d-data%*%t(data) s-solve(d) print(s) out-s out } results-list() for (i in 1:5){results[[i]]-fun(data=S[[i]])} My questions are: 1) for some datasets system is computational singular and this causes execution of the for to stop.By this way I have only results until this problem happens.How can I pass over the execution for this step and have results for All other datasets for which function fun is applicable? see ?try, or ?tryCatch. I'd do something in the line of for(i in 1:5){ tmp - try(fun(data=S[[i]])) results[[i]] - ifelse(is(tmp,try-error),NA,tmp) } Alternatively, you could also use lapply : results - lapply(S,function(x{ tmp - try(fun(data=x)) ifelse(is(tmp,try-error),NA,tmp) }) 2) After some runs to my program, I receive this error message someError in sink(output.txt) : sink stack is full . How can I solve this problem, as I want to have results of my program for 1000 datasets. That is because you never empty the sink. add sink() after the last line you want in the file. This will empty the sink buffer to the file. Otherwise R keeps everything in the memory, and that gets too full after a while. 3) Using for is the correct way to run my proram for a list See the lapply solution. Thanks alot Evgenia -- View this message in context: http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing arguments between S4 methods fails within a function:bug? example with raster package.
Dear all, This problem came up initially while debugging a function, but it seems to be a more general problem of R. I hope I'm wrong, but I can't find another explanation. Let me illustrate with the raster package. For an object RasterLayer (which inherits from Raster), there is a method xyValues defined with the signature (object=RasterLayer,xy=matrix). There is also a method with signature (object=Raster,xy=vector). The only thing this method does, is change xy into a matrix and then pass on to the next method using callGeneric again. Arguments are passed. Now this all works smoothly, as long as you stay in the global environment : require(raster) a - raster() a[] - 1:ncell(a) origin - c(-80,50) eff.dist - 10 unlist(xyValues(a,xy=origin,buffer=eff.dist)) [1] 14140 14141 14500 14501 Now let's make a very basic test function : test - function(x,orig.point){ eff.distance - 10 p - unlist(xyValues(x,xy=orig.point,buffer=eff.distance)) return(p) } This gives the following result : test(a,origin) Error in .local(object, xy, ...) : object 'eff.distance' not found huh? Apparently, eff.distance got lost somewhere in the parsetree (am I saying this correctly?) The funny thing is when we change origin to a matrix : origin - matrix(origin,ncol=2) unlist(xyValues(a,xy=origin,buffer=eff.dist)) [1] 14140 14141 14500 14501 test(a,origin) [1] 14140 14141 14500 14501 It all works again! So something goes wrong with passing the arguments from one method to another using callGeneric. Is this a bug in R or am I missing something obvious? The relevant code from the raster package : setMethod(xyValues, signature(object='Raster', xy='vector'), function(object, xy, ...) { if (length(xy) == 2) { callGeneric(object, matrix(xy, ncol=2), ...) } else { stop('xy coordinates should be a two-column matrix or data.frame, or a vector of two numbers.') } } ) setMethod(xyValues, signature(object='RasterLayer', xy='matrix'), function(object, xy, method='simple', buffer=NULL, fun=NULL, na.rm=TRUE) { if (dim(xy)[2] != 2) { stop('xy has wrong dimensions; it should have 2 columns' ) } if (! is.null(buffer)) { return( .xyvBuf(object, xy, buffer, fun, na.rm=na.rm) ) } if (method=='bilinear') { return(.bilinearValue(object, xy)) } else if (method=='simple') { cells - cellFromXY(object, xy) return(.readCells(object, cells)) } else { stop('invalid method argument. Should be simple or bilinear.') } } ) -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change value of a slot of an S4 object within a method.
Dear all, I have an S4 class with a slot extra which is a list. I want to be able to add an element called name to that list, containing the object value (which can be a vector, a dataframe or another S4 object) Obviously setMethod(add.extra,signature=c(PM10Data,character,vector), function(object,name,value){ obj...@extra[[name]] - value } ) Contrary to what I would expect, the line : eval(eval(substitute(expression(obj...@extra[[name]] - value gives the error : Error in obj...@extra[[name]] - value : object 'object' not found Substitute apparently doesn't work any more in S4 methods... I found a work-around by calling the initializer every time, but this influences the performance. Returning the object is also not an option, as I'd have to remember to assign that one each time to the original name. Basically I'm trying to do some call by reference with S4, but don't see how I should do that. How would I be able to tackle this problem in an efficient and elegant way? Thank you in advance Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change value of a slot of an S4 object within a method.
Hi Steve, thanks for the tip. I'll definitely take a closer look at your solution for implementation for future use. But right now I don't have the time to start rewriting my class definitions. Luckily, I found where exactly things were going wrong. After reading into the documentation about the evaluation in R, I figured out I have to specify the environment where substitute should look explicitly as parent.frame(1). I still don't understand completely why exactly, but it does the job. Thus : eval(eval(substitute(expression(obj...@extra[[name]] - value should become : eval( eval( substitute( expression(obj...@extra[[name]] - value) ,env=parent.frame(1) ) ) ) Tried it out and it works. Cheers Joris On Wed, Aug 25, 2010 at 6:21 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi Joris, On Wed, Aug 25, 2010 at 11:56 AM, Joris Meys jorism...@gmail.com wrote: Dear all, I have an S4 class with a slot extra which is a list. I want to be able to add an element called name to that list, containing the object value (which can be a vector, a dataframe or another S4 object) Obviously setMethod(add.extra,signature=c(PM10Data,character,vector), function(object,name,value){ obj...@extra[[name]] - value } ) Contrary to what I would expect, the line : eval(eval(substitute(expression(obj...@extra[[name]] - value gives the error : Error in obj...@extra[[name]] - value : object 'object' not found Substitute apparently doesn't work any more in S4 methods... I found a work-around by calling the initializer every time, but this influences the performance. Returning the object is also not an option, as I'd have to remember to assign that one each time to the original name. Basically I'm trying to do some call by reference with S4, but don't see how I should do that. How would I be able to tackle this problem in an efficient and elegant way? In lots of my own S4 classes I define a slot called .cache which is an environment for this exact purpose. Using this solution for your scenario might look something like this: setMethod(add.extra,signature=c(PM10Data,character,vector), function(object, name, value) { obj...@.cache$extra[[name]] - value }) I'm not sure what your entire problem looks like, but to get your extra list, or a value form it, you could: setMethod(extra, signature=PM10Data, function(object, name=NULL) { if (!is.null(name)) { obj...@.cache$extra[[name]] } else { obj...@.cache$extra }) ... or something like that. The last thing you have to be careful of is that you nave to make sure that each new(PM10Data) object you have initializes its *own* cache: setClass(PM10Data, representation(..., .cache='environment')) setMethod(initialize, PM10Data, function(.Object, ..., .cache=new.env()) { callNextMethod(.Object, .cache=.cache, ...) }) Does that make sense? -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Dear all, I run a gamm with following call : result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month ),data=tmp) ) with mgcv version 1.6.2 No stress about the data, the error is not data-related. I get : Error in isS4(x) : object 'VM' not found What so? I did define the dataframe to be used, and the dataframe contains a variable VM : str(tmp) 'data.frame': 4014 obs. of 12 variables: $ values : num 73.45 105.45 74.45 41.45 -4.55 ... $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... $ day: num -178 -177 -176 -175 -174 ... $ month : Factor w/ 156 levels 1996-April,1996-August,..: 17 17 17 17 17 17 17 17 17 17 ... $ julday : num -2241 -2240 -2239 -2238 -2237 ... $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... $ RH : num 55.2 61.4 59.8 64.1 60.7 ... $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... $ RF : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, means)= num Any idea what I'm missing here? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Dear all, it gets even more weird. After restarting R, the code I used works just fine. The call is generated in a function that I debugged using browser(). Problem is solved, but I have no clue whatsoever how that error came about. It must have something to do with namespaces, but the origin is dark. I tried to regenerate the error, but didn't succeed. Somebody an idea as to where I have to look for a cause? Cheers Joris On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys jorism...@gmail.com wrote: Dear all, I run a gamm with following call : result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month ),data=tmp) ) with mgcv version 1.6.2 No stress about the data, the error is not data-related. I get : Error in isS4(x) : object 'VM' not found What so? I did define the dataframe to be used, and the dataframe contains a variable VM : str(tmp) 'data.frame': 4014 obs. of 12 variables: $ values : num 73.45 105.45 74.45 41.45 -4.55 ... $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... $ day : num -178 -177 -176 -175 -174 ... $ month : Factor w/ 156 levels 1996-April,1996-August,..: 17 17 17 17 17 17 17 17 17 17 ... $ julday : num -2241 -2240 -2239 -2238 -2237 ... $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... $ RH : num 55.2 61.4 59.8 64.1 60.7 ... $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... $ RF : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, means)= num Any idea what I'm missing here? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???
Follow up: I finally succeeded to more or less reproduce the error. The origin lied in the fact that I accidently loaded a function while being in browser mode for debugging that function. So something went very much wrong with the namespaces. Teaches me right... Cheers Joris On Wed, Jul 28, 2010 at 2:27 PM, Joris Meys jorism...@gmail.com wrote: Dear all, it gets even more weird. After restarting R, the code I used works just fine. The call is generated in a function that I debugged using browser(). Problem is solved, but I have no clue whatsoever how that error came about. It must have something to do with namespaces, but the origin is dark. I tried to regenerate the error, but didn't succeed. Somebody an idea as to where I have to look for a cause? Cheers Joris On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys jorism...@gmail.com wrote: Dear all, I run a gamm with following call : result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month ),data=tmp) ) with mgcv version 1.6.2 No stress about the data, the error is not data-related. I get : Error in isS4(x) : object 'VM' not found What so? I did define the dataframe to be used, and the dataframe contains a variable VM : str(tmp) 'data.frame': 4014 obs. of 12 variables: $ values : num 73.45 105.45 74.45 41.45 -4.55 ... $ dates :Class 'Date' num [1:4014] 9862 9863 9864 9865 9866 ... $ year : num -5.65 -5.65 -5.65 -5.65 -5.65 ... $ day : num -178 -177 -176 -175 -174 ... $ month : Factor w/ 156 levels 1996-April,1996-August,..: 17 17 17 17 17 17 17 17 17 17 ... $ julday : num -2241 -2240 -2239 -2238 -2237 ... $ weekend: num -0.289 -0.289 -0.289 0.711 0.711 ... $ VM : num 0.139 -1.451 0.349 0.839 -0.611 ... $ RH : num 55.2 61.4 59.8 64.1 60.7 ... $ TT : num -23.4 -23.6 -19.5 -16.1 -15.3 ... $ PP : num 6.17 4.27 -4.93 -9.23 -2.63 ... $ RF : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, means)= num Any idea what I'm missing here? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R^2 in loess and predict?
I do not agree with your interpretation of the adjusted R^2. The R^2 is no more than the ratio of the explained variance by the total variance, expressed in sums of squares. The adjusted R^2 is adjusted for the degrees of freedom, and can only be used for selection purposes. The interpretation towards the final model is hard, and definitely not a measure of how well it models the population. For a loess regression this can be calculated as well. But the loess is a local regression technique, highly flexible, and highly dependent on the window you use. In these cases, R^2 (or any other goodness of fit test) tells you even less. You can get an R^2 of 1 if you chose the window small enough. If you want to do inference on nonlinear regression techniques, I strongly suggest you use Generalized Additive Models, eg from the package mgcv. There you can use the framework of likelihood ratio tests for determination of goodness of fit by comparing models. Cheers Joris On Fri, Jul 9, 2010 at 10:42 AM, Ralf B ralf.bie...@gmail.com wrote: Parametric regression produces R^2 as a measure of how well the model predicts the sample and adjusted R^2 as a measure of how well it models the population. What is the equalvalent for non-parametric regression (e.g. loess function) ? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric regression
Just to be correct : gam is mentioned on the page Tal linked to, but is a semi-parametric approach using maximum likelihood. It stays valid though. Another thing : you detect non-normality. But can you use a Poisson distribution for example? The framework of generalized linear models and generalized additive models allows you to deal with non-normality of your data. In any case, I suggest you contact a statistician nearby for guidance. Cheers Joris On Fri, Jul 9, 2010 at 10:26 AM, Tal Galili tal.gal...@gmail.com wrote: From reviewing the first google page result for Non-parametric regression R, I hope this link will prove useful: http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Jul 9, 2010 at 11:01 AM, Ralf B ralf.bie...@gmail.com wrote: I have two data sets, each a vector of 1000 numbers, each vector representing a distribution (i.e. 1000 numbers each of which representing a frequency at one point on a scale between 1 and 1000). For similfication, here an short version with only 5 points. a - c(8,10,8,12,4) b - c(7,11,8,10,5) Leaving the obvious discussion about causality aside fro a moment, I would like to see how well i can predict b from a using a regression. Since I do not know anything about the distribution type and already discovered non-normality I cannot use parametric regression or anything GLM for that matter. How should I proceed in using non-parametric regression to model vector a and see how well it predicts b? Perhaps you could extend the given lines into a short example script to give me an idea? Are there any other options? Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split with list
One solution is to put these unwanted entries to repor$9853312 [1:2,2:3] - Cheers Joris On Fri, Jul 9, 2010 at 12:18 PM, n.via...@libero.it n.via...@libero.it wrote: Dear List I would like to ask you something concenting a better print of the R output: I have a bit data frame which has the following structure: CFISCALE RAGSOCB ANNO VAR1 VAR2. 9853312 astra 2005 6 45 9853312 astra 2006 78 45 9853312 astra 2007 55 76 9653421 geox 2005 35 89 9653421 geox 2006 24 33 9653421 geox 2007 54 55 The first thing I did is to split my data frame for CFISCALE. The result is that R has transformed my data frame into a list. The second step was to transpose each element of my list. repo=split(rep,rep$CFISCALE) repor=lapply(repo,function(x){ t(x)}) When I print my list the format is the following $9853312 1 2 3 CFISCALE 9853312 9853312 9853312 RAGSOCB astra astra astra ANNO 2005 2006 2007 VAR1 6 78 55 VAR2 45 45 76 There is a way to remove the first row I mean 1, 2 , 3 and to have just one CFISCALE and RAGSOCB??? For the second problem I tried to use unique but it seems that it doesnt work for list. So what I would like to get is: $9853312 CFISCALE 9853312 RAGSOCB astra ANNO 2005 2006 2007 VAR1 6 78 55 VAR2 45 45 76 This is because I next run xtable on my list in order to get a table in Latex, which I woud like to be in a nice format. Thanks a lot for your attention! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random sample from arrays
Could you elaborate? Both x - 1:4 set - matrix(nrow = 50, ncol = 11) for(i in c(1:11)){ set[,i] -sample(x,50) print(c(i,-, set), quote = FALSE) } and x - 1:4 set - matrix(nrow = 50, ncol = 11) for(i in c(1:50)){ set[i,] -sample(x,11) print(c(i,-, set), quote = FALSE) } run perfectly fine on my computer. Cheers On Fri, Jul 9, 2010 at 3:10 PM, Assa Yeroslaviz fry...@gmail.com wrote: Hi Joris, I guess i did it wrong again. but your example didn't work either. I still get the error massage. but replicate function just fine. I can even replicate the whole array lines. THX Assa On Thu, Jul 8, 2010 at 15:20, Joris Meys jorism...@gmail.com wrote: Don't know what exactly you're trying to do, but you make a matrix with 11 columns and 50 rows, then treat it as a vector. On top of that, you try to fill 50 rows/columns with 50 values. Off course that doesn't work. Did you check the warning messages when running the code? Either do : for(i in c(1:11)){ set[,i] -sample(x,50) print(c(i,-, set), quote = FALSE) } or for(i in c(1:50)){ set[i,] -sample(x,11) print(c(i,-, set), quote = FALSE) } Or just forget about the loop altogether and do : set - replicate(11,sample(x,50)) or set - t(replicate(50,sample(x,11))) cheers On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello R users, I'm trying to extract random samples from a big array I have. I have a data frame of over 40k lines and would like to produce around 50 random sample of around 200 lines each from this array. this is the matrix ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T yyy_1c yyy_1c _2c 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 2.115634 2.189372 1.643328 1.563523 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 11.650336 11.995504 13.024494 12.776322 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 2.033982 2.111937 1.606340 1.548940 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 9.160038 9.104121 10.079177 9.828757 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 3.559434 3.156093 4.308437 4.045098 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 7.410085 7.022582 7.668616 7.953706 ... I tried to do it with a for loop: genelist - read.delim(/user/R/raw_data.txt) rownames(genelist) - genelist[,1] genes - rownames(genelist) x - 1:4 set - matrix(nrow = 50, ncol = 11) for(i in c(1:50)){ set[i] -sample(x,50) print(c(i,-, set), quote = FALSE) } which basically do the trick, but I just can't save the results outside the loop. After having the random sets of lines it wasn't a problem to extract the line from the arrays using subset. genSet1 -sample(x,50) random1 - genes %in% genSet1 subsetGenelist - subset(genelist, random1) is there a different way of creating these random vectors or saving the loop results outside tjhe loop so I cn work with them? Thanks a lot Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relation in aggregated data
Depending on the data and the research question, a meta-analytic approach might be appropriate. You can see every campaign as a study. See the package metafor for example. You can only draw very general conclusions, but at least your inference will be closer to correct. Cheers Joris On Thu, Jul 8, 2010 at 10:03 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Thank you Actually when I do this myself I always try to make day or week averages if possible. However this was done by one of my colleagues and basically the aggregation was done on basis of campaigns. There is 4 to 6 campaigns per year and sometimes there is apparent relationship in aggregated data sometimes is not. My opinion is that I can not say much about exact relations until I have other clues or ways like expected underlaying laws of physics. Thanks again Best regards Petr Joris Meys jorism...@gmail.com napsal dne 07.07.2010 17:33:55: You examples are pretty extreme... Combining 120 data points in 4 points is off course never going to give a result. Try : fac - rep(1:8,each=15) xprum - tapply(x, fac, mean) yprum - tapply(y, fac, mean) plot(xprum, yprum) Relation is not obvious, but visible. Yes, you lose information. Yes, your hypothesis changes. But in the case you describe, averaging the x-values for every day (so you get an average linked to 1 y value) seems like a possibility, given you take that into account when formulating the hypothesis. Optimally, you should take the standard error on the average into account for the analysis, but this is complicated, often not done and in most cases ignoring this issue is not influencing the results to that extent it becomes important. YMMV Cheers On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Dear all My question is more on statistics than on R, however it can be demonstrated by R. It is about pros and cons trying to find a relationship by aggregated data. I can have two variables which can be related and I measure them regularly during some time (let say a year) but I can not measure them in a same time - (e.g. I can not measure x and respective value of y, usually I have 3 or more values of x and only one value of y per day). I can make a aggregated values (let say quarterly). My questions are: 1. Is such approach sound? Can I use it? 2. What could be the problems 3. Is there any other method to inspect variables which can be related but you can not directly measure them in a same time? My opinion is, that it is not much sound to inspect aggregated values and there can be many traps especially if there are only few aggregated values. Below you can see my examples. If you have some opinion on this issue, please let me know. Best regards Petr Let us have a relation x/y set.seed(555) x - rnorm(120) y - 5*x+3+rnorm(120) plot(x, y) As you can see there is clear relation which can be seen from plot. Now I make a factor for aggregation. fac - rep(1:4,each=30) xprum - tapply(x, fac, mean) yprum - tapply(y, fac, mean) plot(xprum, yprum) Relationship is completely gone. Now let us make other fake data xn - runif(120)*rep(1:4, each=30) yn - runif(120)*rep(1:4, each=30) plot(xn,yn) There is no visible relation, xn and yn are independent but related to aggregation factor. xprumn - tapply(xn, fac, mean) yprumn - tapply(yn, fac, mean) plot(xprumn, yprumn) Here you can see perfect relation which is only due to aggregation factor. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
Hi, I am running Windows 7 and R 2.11.1, and everything is installing just fine for me. Did you install R in the Program Files folder? If so, uninstall and try to re-install R in another folder (e.g. c:\R\R2.11.1\ like on my computer). I noticed in the past that the access control of Windows treats the Program Files folder different compared to other folders. Cheers Joris On Thu, Jul 8, 2010 at 1:15 PM, Dave Bickel davidbickel.com+re...@gmail.com wrote: Neither biocLite nor the GUI menus can install packages on my system. Here is relevant output: version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 11.1 year 2010 month 05 day 31 svn rev 52157 language R version.string R version 2.11.1 (2010-05-31) source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). biocLite(ALL) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] ALL Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). Warning messages: 1: In safeSource() : Redefining ‘biocinstall’ 2: In safeSource() : Redefining ‘biocinstallPkgGroups’ 3: In safeSource() : Redefining ‘biocinstallRepos’ biocLite() Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] affy affydata affyPLM affyQCReport [5] annaffy annotate Biobase biomaRt [9] Biostrings DynDoc gcrma genefilter [13] geneplotter GenomicRanges hgu95av2.db limma [17] marray multtest vsn xtable Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed biocLite() Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] affy affydata affyPLM affyQCReport [5] annaffy annotate Biobase biomaRt [9] Biostrings DynDoc gcrma genefilter [13] geneplotter GenomicRanges hgu95av2.db limma [17] marray multtest vsn xtable Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed library(Biobase) Error in library(Biobase) : there is no package called 'Biobase' biocLite(Biobase) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] Biobase Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). Warning messages: 1: In safeSource() : Redefining ‘biocinstall’ 2: In safeSource() : Redefining ‘biocinstallPkgGroups’ 3: In safeSource() : Redefining ‘biocinstallRepos’ biocLite(Biobase) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] Biobase Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed utils:::menuInstallLocal() # Install package(s) from local zip files... Error in if (ok) { : missing value where TRUE/FALSE needed utils:::menuInstallPkgs() # Install package(s)... --- Please select a CRAN mirror for use in this session --- Error in if (ok) { : missing value where TRUE/FALSE needed I would appreciate any assistance. David -- David R. Bickel, PhD Associate Professor Ottawa Institute of Systems Biology Biochem., Micro. and I. Department Mathematics and Statistics Department University of Ottawa 451 Smyth Road Ottawa, Ontario K1H 8M5 http://www.statomics.com Office Tel: (613) 562-5800 ext. 8670 Office Fax: (613) 562-5185 Office Room: RGN 4510F (Follow the signs to the elevator, and take it to the fourth floor. Turn left and go all the way to the end of the hall, and enter the door to the OISB area.) Lab Tel.: (613) 562-5800 ext. 8304 Lab Room: RGN 4501T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9
Re: [R] how to determine order of arima bt acf and pacf
Did you read these already? http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf http://www.barigozzi.eu/ARIMA.pdf Cheers Joris On Thu, Jul 8, 2010 at 11:45 AM, vijaysheegi vijay.she...@gmail.com wrote: Hi R community, I am new to R.Have some doubts on ACF and pacf.Appying acf and pacf to dataframe or table.How to determine ARIMA degrees which suits our example . Please assist me on this and also please give me link for the same so that i will also try understand the solution. Thanks in advance vijaysheegi student -- View this message in context: http://r.789695.n4.nabble.com/how-to-determine-order-of-arima-bt-acf-and-pacf-tp2282053p2282053.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strsplit(dia ma, \\b) splits characterwise
l guess this is expected behaviour, although counterintuitive. \b represents an empty string indicating a word boundary, but is coerced to character and thus simply the empty string. This means the output you get is the same as strsplit(dia ma, ,perl=T) [[1]] [1] d i a m a I'd use the seperating character as split in strsplit, eg strsplit(dia ma, \\s) [[1]] [1] dia ma If you need the space in the list as well, you'll have to go around it I guess. test - as.vector(gregexpr(\\b, dia ma, perl=TRUE)[[1]]) test [1] 1 4 5 7 apply(embed(test,2),1,function(x) substr(dia ma,x[2],x[1]-1)) [1] dia ma It would be nice if special characters like \b would be recognized by strsplit as well though. Cheers Joris On Thu, Jul 8, 2010 at 10:15 AM, Suharto Anggono Suharto Anggono suharto_angg...@yahoo.com wrote: \b is word boundary. But, unexpectedly, strsplit(dia ma, \\b) splits character by character. strsplit(dia ma, \\b) [[1]] [1] d i a m a strsplit(dia ma, \\b, perl=TRUE) [[1]] [1] d i a m a How can that be? This is the output of 'gregexpr'. gregexpr(\\b, dia ma) [[1]] [1] 1 2 3 4 5 6 attr(,match.length) [1] 0 0 0 0 0 0 gregexpr(\\b, dia ma, perl=TRUE) [[1]] [1] 1 4 5 7 attr(,match.length) [1] 0 0 0 0 The output from gregexpr(\\b, dia ma, perl=TRUE) is what I expect. I expect 'strsplit' to split at that points. This is in Windows. R was installed from binary. sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base R 2.8.1 shows the same 'strsplit' behavior, but the behavior of default 'gregexpr' (i.e. perl=FALSE) is different. strsplit(dia ma, \\b) [[1]] [1] d i a m a strsplit(dia ma, \\b, perl=TRUE) [[1]] [1] d i a m a gregexpr(\\b, dia ma) [[1]] [1] 1 4 5 7 attr(,match.length) [1] 0 0 0 0 gregexpr(\\b, dia ma, perl=TRUE) [[1]] [1] 1 4 5 7 attr(,match.length) [1] 0 0 0 0 sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MON ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
As far as my experience goes, there is no need whatsoever to run R as an administrator for installation of any package, including BioConductor, provided you stay away from the Program Files folder. Cheers Joris On Thu, Jul 8, 2010 at 2:55 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 08/07/2010 7:15 AM, Dave Bickel wrote: Neither biocLite nor the GUI menus can install packages on my system. This is probably a permissions problem. Are you running R as an administrator when you try the install? If not, you won't be able to install to the default location, but you should be able to set up your own library somewhere else, and install there. The error message should be fixed; we shouldn't give Error in if (ok) { : missing value where TRUE/FALSE needed but I doubt if that's the source of your problem. BTW, in future Bioconductor questions should go to their list. In this case it looks as though it's not a BioC problem, but that just means there's no need to list all the biocLite.R material at the start: short simple bug reports are easier to deal with. Duncan Murdoch Here is relevant output: version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 11.1 year 2010 month 05 day 31 svn rev 52157 language R version.string R version 2.11.1 (2010-05-31) source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). biocLite(ALL) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] ALL Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). Warning messages: 1: In safeSource() : Redefining ‘biocinstall’ 2: In safeSource() : Redefining ‘biocinstallPkgGroups’ 3: In safeSource() : Redefining ‘biocinstallRepos’ biocLite() Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] affy affydata affyPLM affyQCReport [5] annaffy annotate Biobase biomaRt [9] Biostrings DynDoc gcrma genefilter [13] geneplotter GenomicRanges hgu95av2.db limma [17] marray multtest vsn xtable Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed biocLite() Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] affy affydata affyPLM affyQCReport [5] annaffy annotate Biobase biomaRt [9] Biostrings DynDoc gcrma genefilter [13] geneplotter GenomicRanges hgu95av2.db limma [17] marray multtest vsn xtable Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed library(Biobase) Error in library(Biobase) : there is no package called 'Biobase' biocLite(Biobase) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] Biobase Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed source(http://bioconductor.org/biocLite.R;) BioC_mirror = http://www.bioconductor.org Change using chooseBioCmirror(). Warning messages: 1: In safeSource() : Redefining ‘biocinstall’ 2: In safeSource() : Redefining ‘biocinstallPkgGroups’ 3: In safeSource() : Redefining ‘biocinstallRepos’ biocLite(Biobase) Using R version 2.11.1, biocinstall version 2.6.7. Installing Bioconductor version 2.6 packages: [1] Biobase Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11' Error in if (ok) { : missing value where TRUE/FALSE needed utils:::menuInstallLocal() # Install package(s) from local zip files... Error in if (ok) { : missing value where TRUE/FALSE needed utils:::menuInstallPkgs() # Install package(s)... --- Please select a CRAN mirror for use in this session --- Error in if (ok) { : missing value where TRUE/FALSE needed I would appreciate any assistance. David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied
Re: [R] random sample from arrays
Don't know what exactly you're trying to do, but you make a matrix with 11 columns and 50 rows, then treat it as a vector. On top of that, you try to fill 50 rows/columns with 50 values. Off course that doesn't work. Did you check the warning messages when running the code? Either do : for(i in c(1:11)){ set[,i] -sample(x,50) print(c(i,-, set), quote = FALSE) } or for(i in c(1:50)){ set[i,] -sample(x,11) print(c(i,-, set), quote = FALSE) } Or just forget about the loop altogether and do : set - replicate(11,sample(x,50)) or set - t(replicate(50,sample(x,11))) cheers On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello R users, I'm trying to extract random samples from a big array I have. I have a data frame of over 40k lines and would like to produce around 50 random sample of around 200 lines each from this array. this is the matrix ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T yyy_1c yyy_1c _2c 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 2.115634 2.189372 1.643328 1.563523 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 11.650336 11.995504 13.024494 12.776322 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 2.033982 2.111937 1.606340 1.548940 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 9.160038 9.104121 10.079177 9.828757 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 3.559434 3.156093 4.308437 4.045098 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 7.410085 7.022582 7.668616 7.953706 ... I tried to do it with a for loop: genelist - read.delim(/user/R/raw_data.txt) rownames(genelist) - genelist[,1] genes - rownames(genelist) x - 1:4 set - matrix(nrow = 50, ncol = 11) for(i in c(1:50)){ set[i] -sample(x,50) print(c(i,-, set), quote = FALSE) } which basically do the trick, but I just can't save the results outside the loop. After having the random sets of lines it wasn't a problem to extract the line from the arrays using subset. genSet1 -sample(x,50) random1 - genes %in% genSet1 subsetGenelist - subset(genelist, random1) is there a different way of creating these random vectors or saving the loop results outside tjhe loop so I cn work with them? Thanks a lot Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using nlm or optim
Without data I can't check, but try : mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean)) With a random dataset I get : Mean - rnorm(136) Var - 1 + rnorm(136)^2 mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean)) Error in optim(start, f, method = method, hessian = TRUE, ...) : initial value in 'vmmin' is not finite This might be just a data problem, but again, I'm not sure. Cheers Joris On Thu, Jul 8, 2010 at 3:11 AM, Anita Narwani anitanarw...@gmail.com wrote: Hello, I am trying to use nlm to estimate the parameters that minimize the following function: Predict-function(M,c,z){ + v = c*M^z + return(v) + } M is a variable and c and z are parameters to be estimated. I then write the negative loglikelihood function assuming normal errors: nll-function(M,V,c,z,s){ n-length(Mean) logl- -.5*n*log(2*pi) -.5*n*log(s) - (1/(2*s))*sum((V-Predict(Mean,c,z))^2) return(-logl) } When I put the Mean and Variance (variables with 136 observations) into this function, and estimates for c,z, and s, it outputs the estimate for the normal negative loglikelihood given the data, so I know that this works. However, I am unable to use mle to estimate the parameters c, z, and s. I do not know how or where the data i.e. Mean (M) and Variance (V) should enter into the mle function. I have tried variations on mle(nll,start=list(c=0.01,z=2.1,s=200)) including mle(nll,start=list(M=Mean,V=Var, c=0.01,z=2.1,s=200)) I keep getting errors and am quite certain that I just have a syntax error in the script because I don't know how to enter the variables into MLE. Thanks for your help, Anita. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package installation for Windows 7
On Thu, Jul 8, 2010 at 3:39 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 08/07/2010 9:26 AM, David Bickel wrote: Yes, the User into which I logged in before launching RGui is an Administrator. Correct, the problem is not limited to Bioconductor packages. On Windows 7, it's not enough to have the user be an administrator, you need to run programs specifically with administrative rights. You do this by right clicking on the icon, and choosing Run as administrator. Reason for this is that Program Files is a protected directory, and changes can only be done by programs that get administrator rights (which is not the same as running a program in an administrator account). Without those administrator rights, R cannot access the default folder for the packages, as that one is also included in the Program Files. Also changing the Rprofile.site becomes a hassle, as so many other tweaks. Actually, the only programs getting there are Microsoft related applications. If there's no strict need to place a program there, I stay away from that folder as far as possible. Vista has the same problem by the way, and obviously the same solutions. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different goodness of fit tests leads to contradictory conclusions
The first two options are GOF-tests for ungrouped data, the latter two can only be used for grouped data. According to my experience, the G^2 test is more influenced by this than the X^2 test (gives -wrongly- significant deviations more easily when used for ungrouped data). If you started from binary data, you can only use the Hosmer tests. Cheers Joris On Wed, Jul 7, 2010 at 4:00 PM, Kiyoshi Sasaki skiyoshi2...@yahoo.com wrote: I am trying to test goodness of fit for my legalistic regression using several options as shown below. Hosmer-Lemeshow test (whose function I borrowed from a previous post), Hosmer–le Cessie omnibus lack of fit test (also borrowed from a previous post), Pearson chi-square test, and deviance test. All the tests, except the deviance tests, produced p-values well above 0.05. Would anyone please teach me why deviance test produce p-values such a small value (0.001687886) that suggest lack of fit, while other tests suggest good fit? Did I do something wrong? Thank you for your time and help! Kiyoshi mod.fit - glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = logit), data = no.NA, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) # Option 1: Hosmer-Lemeshow test mod.fit - glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = logit), data = no.NA, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) hosmerlem - function (y, yhat, g = 10) { cutyhat - cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), include.lowest = T) obs - xtabs(cbind(1 - y, y) ~ cutyhat) expect - xtabs(cbind(1 - yhat, yhat) ~ cutyhat) chisq - sum((obs - expect)^2/expect) P - 1 - pchisq(chisq, g - 2) c(X^2 = chisq, Df = g - 2, P(Chi) = P) } hosmerlem(no.NA$repcnd, fitted(mod.fit)) X^2 Df P(Chi) 7.8320107 8.000 0.4500497 # Option 2 - Hosmer–le Cessie omnibus lack of fit test: library(Design) lrm.GOF - lrm(formula = no.NA$repcnd ~ no.NA$svl, data = no.NA, y = T, x = T) resid(lrm.GOF,type = gof) Sum of squared errors Expected value|H0 SD Z P 48.3487115 48.3017399 0.1060826 0.4427829 0.6579228 # Option 3 - Pearson chi-square p-value: pp - sum(resid(lrm.GOF,typ=pearson)^2) 1-pchisq(pp, mod.fit$df.resid) [1] 0.4623282 # Option 4 - Deviance (G^2) test: 1-pchisq(mod.fit$deviance, mod.fit$df.resid) [1] 0.001687886 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relation in aggregated data
You examples are pretty extreme... Combining 120 data points in 4 points is off course never going to give a result. Try : fac - rep(1:8,each=15) xprum - tapply(x, fac, mean) yprum - tapply(y, fac, mean) plot(xprum, yprum) Relation is not obvious, but visible. Yes, you lose information. Yes, your hypothesis changes. But in the case you describe, averaging the x-values for every day (so you get an average linked to 1 y value) seems like a possibility, given you take that into account when formulating the hypothesis. Optimally, you should take the standard error on the average into account for the analysis, but this is complicated, often not done and in most cases ignoring this issue is not influencing the results to that extent it becomes important. YMMV Cheers On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Dear all My question is more on statistics than on R, however it can be demonstrated by R. It is about pros and cons trying to find a relationship by aggregated data. I can have two variables which can be related and I measure them regularly during some time (let say a year) but I can not measure them in a same time - (e.g. I can not measure x and respective value of y, usually I have 3 or more values of x and only one value of y per day). I can make a aggregated values (let say quarterly). My questions are: 1. Is such approach sound? Can I use it? 2. What could be the problems 3. Is there any other method to inspect variables which can be related but you can not directly measure them in a same time? My opinion is, that it is not much sound to inspect aggregated values and there can be many traps especially if there are only few aggregated values. Below you can see my examples. If you have some opinion on this issue, please let me know. Best regards Petr Let us have a relation x/y set.seed(555) x - rnorm(120) y - 5*x+3+rnorm(120) plot(x, y) As you can see there is clear relation which can be seen from plot. Now I make a factor for aggregation. fac - rep(1:4,each=30) xprum - tapply(x, fac, mean) yprum - tapply(y, fac, mean) plot(xprum, yprum) Relationship is completely gone. Now let us make other fake data xn - runif(120)*rep(1:4, each=30) yn - runif(120)*rep(1:4, each=30) plot(xn,yn) There is no visible relation, xn and yn are independent but related to aggregation factor. xprumn - tapply(xn, fac, mean) yprumn - tapply(yn, fac, mean) plot(xprumn, yprumn) Here you can see perfect relation which is only due to aggregation factor. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with write.table, involving loops paste statement
To be correct, everything is written to all folders according to my testing. There is absolutely no need whatsoever to use l_ply. And in any case, take as much as possible outside the loop, like the library statement and the max.col. Following is _a_ solution, not the most optimal, but as close as feasible to your code : A_var_df - data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)), from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1)) # I make some data and make sure I can adjust the number of dirs and the steps Dchr1 -matrix(rep(1:100,each=10),ncol=100) dirs - 20 max.col - ncol(Dchr1) steps = ceiling(max.col/dirs) cols - seq(1, max.col, by=steps) for(i in 1:length(A_var_df[,1])) { k - cols[i] print(as.data.frame(Dchr1[,k:min(k+steps, max.col)])) print(paste(./A_,A_var_df[i,1], /k.csv, sep=)) # I use print here just to show which dataframe is going to which directory, # you can reconstruct the write.table yourself. } Cheers Joris On Wed, Jul 7, 2010 at 9:32 PM, CC turtysm...@gmail.com wrote: Hi! I want to write portions of my data (3573 columns at a time) to twenty folders I have available titled A_1 to A_20 such that the first 3573 columns will go to folder A_1, next 3573 to folder A_2 and so on. This code below ensures that the data is written into all 20 folders, but only the last iteration of the loop (last 3573 columns) is being written into ALL of the folders (A_1 to A_20) rather than the sequential order that I would like. How can I fix this? Thank you! * Code: A_var_df - data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)), from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1)) for(i in 1:length(A_var_df[,1])) { library(plyr) max.col - ncol(Dchr1) l_ply(seq(1, max.col, by=3573), function(k) write.table(as.data.frame(Dchr1[,k:min(k+3572, max.col)]), paste(./A_, A_var_df[i,1], /k.csv, sep=), sep=,, row.names=F, quote=F) ) } ** -- Thanks, CC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] when all I have is a contingency table....
See Teds answer for histogram (I'd go with barplot). For most statistical procedures there is a weighted version (e.g. weighted.mean() for the mean). Your counts are valid weights for most procedures. Cheers Joris On Wed, Jul 7, 2010 at 10:39 PM, Andrei Zorine zoav1...@gmail.com wrote: Hello, I just need a hint here: Suppose I have no raw data, but only a frequency table I have, and I want to run basic statistical procedures with it, like histogram, descriptive statistics, etc. How do I do this with R? For example, how do I plot a histogram for this table for a sample of size 60? Value Count 1 10 2 8 3 12 4 9 5 14 6 7 Thanks, A.Z. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] acf
Your question is a bit confusing. acfresidfit is an object, of which we don't know the origin. with your test file, I arrive at the first correlations (but with integer headings) : residfit - read.table(fileree2_test_out.txt) acf(residfit) acfresid - acf(residfit) acfresid Autocorrelations of series ‘residfit’, by lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1.000 -0.015 0.010 0.099 0.048 -0.014 -0.039 -0.019 0.040 0.018 0.042 0.078 -0.029 0.028 -0.016 -0.021 -0.109 17 18 19 20 21 22 23 24 25 0.000 -0.038 -0.006 0.015 -0.032 -0.002 0.014 -0.226 -0.030 Could you please check where the object acfresidfit is coming from and how you generated it? Cheers Joris On Tue, Jul 6, 2010 at 9:47 AM, nuncio m nunci...@gmail.com wrote: Hi list, I have the following code to compute the acf of a time series acfresid - acf(residfit), where residfit is the series when I type acfresid at the prompt the follwoing is displayed Autocorrelations of series ‘residfit’, by lag 0. 0.0833 0.1667 0.2500 0. 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 1.000 -0.015 0.010 0.099 0.048 -0.014 -0.039 -0.019 0.040 0.018 0.042 0.9167 1. 1.0833 1.1667 1.2500 1. 1.4167 1.5000 1.5833 1.6667 1.7500 0.078 -0.029 0.028 -0.016 -0.021 -0.109 0.000 -0.038 -0.006 0.015 -0.032 1.8333 1.9167 2. 2.0833 -0.002 0.014 -0.226 -0.030 Residfit is a timeseries object at monthly interval (0.0833), Here I understand R computed the correlation at lags 0 to 2 years. What is surprising to me is if I type acfresidfit at the prompt the following is displayed Autocorrelations of series ‘residfit’, by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 -0.004 0.011 0.041 -0.056 0.019 -0.052 -0.027 -0.008 -0.012 -0.034 11 12 13 14 15 16 17 18 19 20 21 0.024 -0.005 0.006 -0.045 0.031 -0.035 -0.011 -0.021 -0.020 -0.010 -0.007 22 23 24 25 -0.038 0.017 0.051 0.038 From the header I understand both are autocorrelation computed at the same lags. but the correlations are different where am I going wrong and which is the correct one. file residfit is also attached(filename-fileree2_test_out.txt) Thanks nuncio -- Nuncio.M Research Scientist National Center for Antarctic and Ocean research Head land Sada Vasco da Gamma Goa-403804 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] timeseries
Difficult to guess why, but I reckon you should use ts() instead of as.ts. Otherwise set the tsp-attribute correctly. Eg : x - cumsum(1+round(rnorm(20),2)) as.ts(x) Time Series: Start = 1 End = 20 Frequency = 1 [1] 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 tseries - ts(x,frequency=12,start=c(2004,3)) tseries Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2004 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 2005 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 tsp(x) - c(2004+2/12,2005.75,12) as.ts(x) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2004 0.87 3.51 4.08 4.20 3.25 4.63 6.30 6.89 9.28 9.93 2005 10.19 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47 See ?ts to get the options right. I suggest to use the function ts() instead of assigning the tsp attribute. ) Cheers Joris On Mon, Jul 5, 2010 at 9:35 AM, nuncio m nunci...@gmail.com wrote: Dear useRs, I am trying to construct a time series using as.ts function, surprisingly when I plot the data the x axis do not show the time in years, however if I use ts(data), time in years are shown in the x axis. Why such difference in the results of both the commands Thanks nuncio -- Nuncio.M Research Scientist National Center for Antarctic and Ocean research Head land Sada Vasco da Gamma Goa-403804 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculation on series with different time-steps
Look at ?ifelse en ?abs, eg : data_frame$new_column_in_dataframe - ifelse(stage$julian_day == baro$julian_day abs(stage$time - baro$hour) = 30, stage$stage.cm - baro$pressure, NA ) Cheers Joris On Thu, Jul 1, 2010 at 10:09 PM, Jeana Lee jeana@colorado.edu wrote: Hello, I have two series, one with stream stage measurements every 5 minutes, and the other with barometric pressure measurements every hour. I want to subtract each barometric pressure measurement from the 12 stage measurements closest in time to it (6 stage measurements on either side of the hour). I want to do something like the following, but I don't know the syntax. If the Julian day of the stage measurement is equal to the Julian day of the pressure measurement, AND the absolute value of the difference between the time of the stage measurement and the hour of the pressure measurement is less than or equal to 30 minutes, then subtract the pressure measurement from the stage measurement (and put it in a new column in the stage data frame). if ( stage$julian_day = baro$julian_day |stage$time - baro$hour| = 30 ) then (stage$stage.cm - baro$pressure) Can you help me? Thanks, JL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help With ANOVA
We're missing the samp1 etc. in order to be able to test the code. Where did you get the other p-value? Cheers Joris On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel amitrh...@yahoo.co.uk wrote: Hi I needed some help with ANOVA I have a problem with My ANOVA analysis. I have a dataset with a known ANOVA p-value, however I can not seem to re-create it in R. I have created a list (zzzanova) which contains 1)Intensity Values 2)Group Number (6 Different Groups) 3)Sample Number (54 different samples) this is created by the script in Appendix 1 I then conduct ANOVA with the command zzz.aov - aov(Intensity ~ Group, data = zzzanova) I get a p-value of Pr(F)1 0.9483218 The expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly or have put in a wrong formula. I am trying to do an ANOVA analysis across all 6 Groups. Is there something wrong with my formula. But I think I have made a mistake in the formula rather than anything else. APPENDIX 1 datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, 0.023703, -4.60517, 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) zzzanova - structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)), Group = structure(c(1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5, 6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, Group4, Group5, Group6), class = factor), Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) )) , .Names = c(Intensity, Group, Sample), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54),class = data.frame) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help With ANOVA
I still can't reproduce your example. The aov output gives me the following : anova(aov(Intensity ~ Group, data = zzzanova)) Analysis of Variance Table Response: Intensity Df Sum Sq Mean Sq F value Pr(F) Group 5 98.85 19.771 2.1469 0.07576 . Residuals 48 442.03 9.209 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Next to that I noticed you have truncated data, which has implications for the analysis as well. If you use a Kruskal-Wallis test, the p-value becomes larger : kruskal.test(Intensity ~ Group, data = zzzanova) Kruskal-Wallis rank sum test data: Intensity by Group Kruskal-Wallis chi-squared = 6.6955, df = 5, p-value = 0.2443 Which is to be expected, as you have almost 50% truncated data. So a p-value of 0.005 seems very wrong to me. Cheers Joris On Tue, Jul 6, 2010 at 7:11 PM, Amit Patel amitrh...@yahoo.co.uk wrote: Hi Joris Sorry i had a misprint in the appendix code in the last email datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, 0.023703, -4.60517, 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) zzzanova - structure(list(Intensity = datalist, Group = structure(c(1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5, 6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, Group4, Group5, Group6), class = factor), Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) )) , .Names = c(Intensity, Group, Sample), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54),class = data.frame) Thanks for your reply - Original Message From: Joris Meys jorism...@gmail.com To: Amit Patel amitrh...@yahoo.co.uk Cc: r-help@r-project.org Sent: Tue, 6 July, 2010 17:04:40 Subject: Re: [R] Help With ANOVA We're missing the samp1 etc. in order to be able to test the code. Where did you get the other p-value? Cheers Joris On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel amitrh...@yahoo.co.uk wrote: Hi I needed some help with ANOVA I have a problem with My ANOVA analysis. I have a dataset with a known ANOVA p-value, however I can not seem to re-create it in R. I have created a list (zzzanova) which contains 1)Intensity Values 2)Group Number (6 Different Groups) 3)Sample Number (54 different samples) this is created by the script in Appendix 1 I then conduct ANOVA with the command zzz.aov - aov(Intensity ~ Group, data = zzzanova) I get a p-value of Pr(F)1 0.9483218 The expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly or have put in a wrong formula. I am trying to do an ANOVA analysis across all 6 Groups. Is there something wrong with my formula. But I think I have made a mistake in the formula rather than anything else. APPENDIX 1 datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, 0.023703, -4.60517, 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) zzzanova - structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)), Group = structure(c(1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5, 6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, Group4, Group5, Group6), class = factor), Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) )) , .Names = c(Intensity, Group, Sample), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42
Re: [R] PCA and Regression
PCA components are orthogonal by definition so no, that doesn't make sense at all. Do yourself a favor and get a book on multivariate data analysis. Two books come to mind: Obviously the one of Hair and colleagues, called multivariate data analysis and easily found in a university library (or on the internet). The second one is An R and S-Plus Companion to Multivariate Analysis by Everitt. This one you have less chance of finding in the library, but you find it online for sale. This is a nice introduction as well : https://netfiles.uiuc.edu/miguez/www/Teaching/MultivariateRGGobi.pdf Never use a chainsaw without reading the manual. Cheers Joris On Tue, Jul 6, 2010 at 9:07 PM, Marino Taussig De Bodonia, Agnese agnese.marin...@imperial.ac.uk wrote: Hello, I am currently analyzing responses to questionnaires about general attitudes. I have performed a PCA on my data, and have retained two Principal Components. Now I would like to use the scores of both the principal comonents in a multiple regression. I would like to know if it makes sense to use the scores of one principal component to explain the variance in the scores of another principal component: lm(scores of principal component 1~scores of principal component 2+ age, gender, etc..) Please could you let me know if this is statistically sound? Thank you in advance for you time, Agnese __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S4 classes and debugging - Is there a summary?
Dear all, I'm getting more and more frustrated with the whole S4 thing and I'm looking for a more advanced summary on how to deal with them. Often I get error messages that don't make sense at all, or the code is not doing what I think it would do. Far too often inspecting the code requires me to go to the source, which doesn't really help in easily finding the bit of code you're interested in. Getting the code with getAnywhere() doesn't always work. Debug() doesn't work. Using trace() and browser() is not an option, as I can't find the correct method. eg : library(raster) getAnywhere(xyValues) A single object matching ‘xyValues’ was found It was found in the following places package:raster namespace:raster with value standardGeneric for xyValues defined from package raster function (object, xy, ...) standardGeneric(xyValues) environment: 0x04daf14c Methods may be defined for arguments: object, xy Use showMethods(xyValues) for currently available ones. showMethods(xyValues) Function: xyValues (package raster) object=Raster, xy=data.frame object=Raster, xy=SpatialPoints object=Raster, xy=vector object=RasterLayer, xy=matrix object=RasterStackBrick, xy=matrix And now...? Is there an overview that actually explains how you get the information you're looking for without strolling through the complete source? Sorry if I sound frustrated, but this is costing me huge amounts of time, to that extent that I rather write a custom function than use one in an S4 package if I'm not absolutely sure about what it does and how it achieves it. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 classes and debugging - Is there a summary?
Correction: trace off course works when specifying the signature. so eg: trace(xyValues,tracer=browser,signature=c(object=RasterLayer, xy=matrix)) allows me to browse through the function. Should have specified the signature, I overlooked that. Still, if there's a manual on S4 that anybody likes to mention, please go ahead. I find bits and pieces on the internet, but haven't found an overview yet dealing with all peculiarities of these classes. Cheers Joris On Fri, Jul 2, 2010 at 2:05 PM, Joris Meys jorism...@gmail.com wrote: Dear all, I'm getting more and more frustrated with the whole S4 thing and I'm looking for a more advanced summary on how to deal with them. Often I get error messages that don't make sense at all, or the code is not doing what I think it would do. Far too often inspecting the code requires me to go to the source, which doesn't really help in easily finding the bit of code you're interested in. Getting the code with getAnywhere() doesn't always work. Debug() doesn't work. Using trace() and browser() is not an option, as I can't find the correct method. eg : library(raster) getAnywhere(xyValues) A single object matching ‘xyValues’ was found It was found in the following places package:raster namespace:raster with value standardGeneric for xyValues defined from package raster function (object, xy, ...) standardGeneric(xyValues) environment: 0x04daf14c Methods may be defined for arguments: object, xy Use showMethods(xyValues) for currently available ones. showMethods(xyValues) Function: xyValues (package raster) object=Raster, xy=data.frame object=Raster, xy=SpatialPoints object=Raster, xy=vector object=RasterLayer, xy=matrix object=RasterStackBrick, xy=matrix And now...? Is there an overview that actually explains how you get the information you're looking for without strolling through the complete source? Sorry if I sound frustrated, but this is costing me huge amounts of time, to that extent that I rather write a custom function than use one in an S4 package if I'm not absolutely sure about what it does and how it achieves it. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 classes and debugging - Is there a summary?
On Fri, Jul 2, 2010 at 4:01 PM, Martin Morgan mtmor...@fhcrc.org wrote: selectMethod(xyValues, c('RasterLayer', 'matrix')) would be my choice. Thanks, that's a discovery. I guess I misunderstood the help pages on that one. I don't really have the right experience, but Chamber's 2008 Software for Data Analysis... and Gentleman's 2008 R Programming for Bioinformatics... books would be where I'd start. ?Methods and ?Classes are I think under-used. I did read the posting guide ;-) but I have to admit the help pages are not always that clear. Which is one of the reasons why S4 gets me frustrated so easily. I also have Chamber's book and Gentleman's book is here on the department as well. I'll go again through the relevant chapters. Thanks Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] left end or right end
First of all, read the posting guide carefully : http://www.R-project.org/posting-guide.html Your question is far from clear. When you say that the lengths of P and Q are different, you mean the length of the data or the difference between start and end? That makes a world of difference. Regarding the statistical test, that depends on what your data represents. Is it possible for P to fall close to the left and the right : P- Q --- For example. You should also specify which test you want to use. Then people on the list will be able to tell you whether that is available in R. You can off course construct your own test with the tools R provides, but again, this requires a lot more information. Next to that, the list is actually not intended for statistical advice, but for advice regarding R code. Maybe somebody will join in with some statistical guidance, but if you don't know what to do, you better consult a statistician at your departement. Cheers Joris On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar ravikumarsuku...@gmail.com wrote: Dear all, I am a biologist. I have two sets of distance P(start1, end1) and Q(start2, end2). The distance will be like this. P Q I want to know whether P falls closely to the right end or left end of Q. P and Q are of different lengths for each data point. There are more than 1 pairs of P and Q. Is there any test or function in R to bring a statistically significant conclusion. Thanks for all, Suku [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] run R
Welcome to a new world. You're using the _programming language_ R with thousands of packages, and first of all you should be reading the introduction material thoroughly : http://cran.r-project.org/doc/manuals/R-intro.pdf http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf Regarding your question : Check the help files of source : ?source if you have a script say foo.R in a folder c:\bar, do : source(c:/bar/foo.R) mind the slashes instead of the backslashes. I assume you're on a Windows system... Cheers Joris On Wed, Jun 30, 2010 at 3:08 PM, jorge.conr...@cptec.inpe.br wrote: Hi, I'm starting use the R Package and I have some .R scripts. How can I run these .R scripts. Conrado __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use of processor by R 32bit on a 64bit machine
Dear all, I've recently purchased a new 64bit system with an intel i7 quadcore processor. As I understood (maybe wrongly) that to date the 32bit version of R is more stable than the 64bit, I installed the 32bit version and am happily using it ever since. Now I'm running a whole lot of models, which goes smoothly, and I thought out of curiosity to check how much processor I'm using. I would have thought I used 25% (being one core), as on my old dual core R uses 50% of the total processor capacity. Funny, it turns out that R is currently using only 12-13% of my cpu, which is about half of what I expected. Did I miss something somewhere? Should I change some settings? I'm running on a Windows 7 enterprise. I looked around already, but I have the feeling I overlooked something. Cheers Joris sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0 survival_2.35-7 loaded via a namespace (and not attached): [1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57 tools_2.10.1 -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use of processor by R 32bit on a 64bit machine
*slaps forehead* Thanks. So out it goes, that hyperthreading. Who invented hyperthreading on a quad-core anyway? Cheers Joris 2010/6/29 Uwe Ligges lig...@statistik.tu-dortmund.de: On 29.06.2010 15:30, Joris Meys wrote: Dear all, I've recently purchased a new 64bit system with an intel i7 quadcore processor. As I understood (maybe wrongly) that to date the 32bit version of R is more stable than the 64bit, I installed the 32bit version and am happily using it ever since. Now I'm running a whole lot of models, which goes smoothly, and I thought out of curiosity to check how much processor I'm using. I would have thought I used 25% (being one core), as on my old dual core R uses 50% of the total processor capacity. Funny, it turns out that R is currently using only 12-13% of my cpu, which is about half of what I expected. An Intel Core i7 Quadcore has 8 virtual cores since it supports hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores won't be twice as fast since they are running on the same physical core. Hence this is expected. Uwe Ligges Did I miss something somewhere? Should I change some settings? I'm running on a Windows 7 enterprise. I looked around already, but I have the feeling I overlooked something. Cheers Joris sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0 Hmisc_3.7-0 survival_2.35-7 loaded via a namespace (and not attached): [1] cluster_1.12.3 grid_2.10.1 lattice_0.18-3 svMisc_0.9-57 tools_2.10.1 -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (New) Popularity of R, SAS, SPSS, Stata...
Dear Robert, I've tried to acces that link, but to no prevail. Seems the server r4stats.com is down, as he doesn't respond. This link got me to the site : http://sites.google.com/site/r4statistics/popularity Cheers Joris On Mon, Jun 28, 2010 at 3:52 PM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: Greeting Listserv Readers, At http://r4stats.com/popularity I have added plots, data, and/or discussion of: 1. Scholarly impact of each package across the years 2. The number of subscribers to some of the listservs 3. How popular each package is among Google searches across the years 4. Survey results from a Rexer Analytics poll 5. Survey results from a KDnuggests poll 6. A rudimentary analysis of the software skills that employers are seeking Thanks very much to all the folks who helped on this project including: John Fox, Marc Schwartz, Duncan Murdoch, Martin Weiss, John (Jiangtang) HU, Andre Wielki, Kjetil Halvorsen, Dario Solari, Joris Meys, Keo Ormsby, Karl Rexer, and Gregory Piatetsky-Shapiro. If anyone can think of other angles, please let me know. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Research Computing Support Voice: (865) 974-5230 Email: muenc...@utk.edu Web: http://oit.utk.edu/research, News: http://oit.utk.edu/research/news.php Feedback: http://oit.utk.edu/feedback/ = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different standard errors from R and other software
If I understand correctly from their website, discrete choice models are mostly generalized linear models with the common link functions for discrete data? Apart from a few names I didn't recognize, all analyses seem quite standard to me. So I wonder why you would write the log-likelihood yourself for techniques that are implemented in R. Unless I missed something pretty important, or you want to do a specific analysis that wasn't clear to me, you should take a closer look at the possibilities in R for generalized linear (mixed) modelling and so on. Binary choice translates to a simple glm with a logit function. Multinomial choice can be done with eg. multinom() from nnet. Ordered choice can be done with polr() from the MASS package. A nice one to look at is the package mgcv or gamm4 in case of big datasets. They offer very flexible models that can include random terms, specific variance-covariance structures and non-linear relations in the form of splines. Apologies if this is all obvious and known to you. In that case you might want to specify what exactly it is you are comparing and how exactly you calculated it yourself. Cheers Joris On Fri, Jun 25, 2010 at 11:47 PM, Min Chen chenmin0...@gmail.com wrote: Hi all, Sorry to bother you. I'm estimating a discrete choice model in R using the maxBFGS command. Since I wrote the log-likelihood myself, in order to double check, I run the same model in Limdep. It turns out that the coefficient estimates are quite close; however, the standard errors are very different. I also computed the hessian and outer product of the gradients in R using the numDeriv package, but the results are still very different from those in Limdep. Is it the routine to compute the inverse hessian that causes the difference? Thank you very much! Best wishes. Min -- Min Chen Ph.D. Candidate Department of Agricultural, Food, and Resource Economics 125 Cook Hall Michigan State University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Sorry, missed the two variable thing. Go with the lm solution then, and you can tweak the plot yourself (the confidence intervals are easily obtained via predict(lm.object, interval=prediction) ). The function qq.plot uses robust regression, but in your case normal regression will do. Regarding the shapes : this just indicates both tails are shorter than expected, so you have a kurtosis greater than 3 (or positive, depending whether you do the correction or not) Cheers Joris On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote: Short rep: I have two distributions, data and data2; each build from about 3 million data points; they appear similar when looking at densities and histograms. I plotted qqplots for further eye-balling: qqplot(data, data2, xlab = 1, ylab = 2) and get an almost perfect diagonal line which means they are in fact very alike. Now I tried to check normality using qqnorm -- and I think I am doing something wrong here: qqnorm(data, main = Q-Q normality plot for 1) qqnorm(data2, main = Q-Q normality plot for 2) I am getting perfect S-shaped curves (??) for both distributions. Am I something missing here? | | * * * * | * | * | * | * | * | * | * * * |- Thanks, Ralf -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanen atte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
I've been looking around myself, but I couldn't find any. Maybe somebody will chime in to direct you to the correct places. I also checked the papers, and it seems not too hard to implement. If I find some time, I'll take a look at it next week. For the other two gentlemen, check: http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf http://www.getahead.psu.edu/PDF/no.1.pdf Cheers Joris On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote: In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate free approach to the analysis of form using landmark data which was developed by Subhash Lele and Joan Richstmeier. They also developed a computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow to perform several techniques including EDMA I-II but I wonder is there a package or code available in R to perform EDMA... thanks -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimizing given two vectors of data
Optim uses vectors of _parameters_, not of data. You add a (likelihood) function, give initial values of the parameters, and get the optimized parameters back. See ?optim and the examples therein. It contains an example for optimization using multiple data columns. Cheers Joris On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul ruchir_402...@infosys.com wrote: I am trying to estimate an Arrhenius-exponential model in R. I have one vector of data containing failure times, and another containing corresponding temperatures. I am trying to optimize a maximum likelihood function given BOTH these vectors. However, the optim command takes only one such vector parameter. How can I pass both vectors into the function? -- View this message in context: http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] i want create script
Please read the posting guide : http://www.R-project.org/posting-guide.html Your question is very vague. One could assume you're completely new to R and want the commands to read a csv file (see ?read.csv), and to write away a table (eg ?write.table to write your predicted data in a text format). My guess is you want to run this script in the shell without having to open R, similar to a perl scipt. For this, take a look at: http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing Cheers Joris On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi vijay.she...@gmail.com wrote: Hi R community, I want to create a script which will take the .csv table as input and do some prediction and output should be returned to some file.Inputs is exel sheet containing some tables of data.out should be table of predicted data.Will some one help me in this regards... Thanks in advance. I am using Windows R.Please advise proccedure to create Rscript. Regards - Vijay Research student Bangalore India -- View this message in context: http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu: The central limit theorem doesn't help. It just addresses type I error, not power. Frank I don't think I stated otherwise. I am aware of the fact that the wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared to the t-test in case of skewed distributions. Apologies if I caused more confusion. The problem with the wilcoxon is twofold as far as I understood this data correctly : - there are quite some ties - the wilcoxon assumes under the null that the distributions are the same, not only the location. The influence of unequal variances and/or shapes of the distribution is enhanced in the case of unequal sample sizes. The central limit theory makes that : - the t-test will do correct inference in the presence of ties - unequal variances can be taken into account using the modified t-test, both in the case of equal and unequal sample sizes For these reasons, I would personally use the t-test for comparing two samples from the described population. Your mileage may vary. Cheers Joris On 06/25/2010 04:29 AM, Joris Meys wrote: As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x-rnorm(250) hist(x,freq=F) lines(density(x),col=red) See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf 2010/6/25 Atte Tenkanenatte...@utu.fi: Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. i would like to test, whether the mean of the sample differ significantly from the population mean. According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?
Thanks for the link, very interesting book. Yet, I couldn't find the part about EDMA. It would have surprised me anyway, as the input of multidimensional scaling is one matrix with euclidean distances between your observations, whereas in EDMA the data consist of a number of distance matrices. Quite a different thing if you ask me. Neither cmdscale nor isoMDS or its derivated functions (eg metaMDS in the vegan package) are going to be of any help. Now I come to think of it, vegan has a procrustes function, but I'm not sure if it is generalized to be of use in EDMA. Cheers Joris On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen kjetilbrinchmannhalvor...@gmail.com wrote: There is a freely downloadable and very relevant ( readable) book at https://ccrma.stanford.edu/~dattorro/mybook.html Convex Optimization and Euclidean Distance geometry, and it indeed names EDMA as a form of multidimensional scaling (or maybe in the oposite way). You should have a look at the codes for multidimensional scaling in R. Kjetil On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote: thanks for your interests Joris Gokhan OCAKOGLU Uludag University Faculty of Medicine Department of Biostatistics http://www20.uludag.edu.tr/~biostat/ocakoglui.htm -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average 2 Columns when possible, or return available value
Just want to add that if you want to clean out the NA rows in a matrix or data frame, take a look at ?complete.cases. Can be handy to use with big datasets. I got curious, so I just ran the codes given here on a big dataset, before and after removing NA rows. I have to be honest, this is surely an illustration of the power of rowMeans. I'm amazed myself. DF - data.frame( A=rep(DF$A,1), B=rep(DF$B,1) ) system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 13.260.06 13.46 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.030.000.03 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape DF - DF[complete.cases(DF),] system.time(apply(DF,1,mean,na.rm=TRUE)) user system elapsed 0.390.000.39 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1)) user system elapsed 0.010.000.02 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean, + na.rm=TRUE)[,-1])) + ) user system elapsed 10.010.07 13.40 Cheers Joris On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote: Forum, Using the following data: DF-read.table(textConnection(A B 22.60 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 102.00 NA 19.20 NA 19.20 NA NA NA NA NA NA NA 11.80 NA 7.62 NA NA NA NA NA NA NA NA NA NA NA 75.00 NA NA NA 18.30 18.2 NA NA NA NA 8.44 NA 18.00 NA NA NA 12.90 NA),header=T) closeAllConnections() The second column is a duplicate reading of the first column, and when two values are available, I would like to average column 1 and 2 (example code below). But if there is only one reading, I would like to retain it, but I haven't found a good way to exclude NA's using the following code: t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1])) Currently, row 24 is the only row with a returned value. I'd like the result to return column A if it is the only available value, and average where possible. Of course, if both columns are NA, NA is the only possible result. The result I'm after would look like this (row 24 is an avg): 22.60 NA NA NA NA NA NA NA 102.00 19.20 19.20 NA NA NA 11.80 7.62 NA NA NA NA NA 75.00 NA 18.25 NA NA 8.44 18.00 NA 12.90 This is a small example from a much larger data frame, so if you're wondering what the deal is with list(), that will come into play for the larger problem I'm trying to solve. Respectfully, Eric -- View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] All a column to a data frame with a specific condition
merge(sum_plan_a,data,by=plan_a) Cheers Joris On Sat, Jun 26, 2010 at 2:27 AM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, Please first look at the codes: plan_a=c('apple','orange','apple','apple','pear','bread') plan_b=c('bread','bread','orange','bread','bread','yogurt') value=1:6 data=data.frame(plan_a,plan_b,value) library(plyr) library(reshape) mm=melt(data, id=c('plan_a','plan_b')) sum_plan_a=cast(mm,plan_a~variable,sum) ### I would like to add a new column to the data.frame named 'data', with the same sum of value for the same type of plan_a ### The result should come up like this: plan_a plan_b value sum_plan_a 1 apple bread 1 8 2 orange bread 2 2 3 apple orange 3 8 4 apple bread 4 8 5 pear bread 5 5 6 bread yogurt 6 6 Any tips? Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
I had taken the opposite tack with Google Trends by subtracting keywords like: SAS -shoes -airlines -sonar... but never got as good results as that beautiful X code for search. When you see the end-of-semester panic bumps in traffic, you know you're nailing it! I have to eat those words already. The R code for search that showed a peak every December did not have quotes around it, so it was searching for those three words not the complete phrase. When you add the quotes, the peaks vanish. Don't swallow! You're looking through search terms, not through web pages. R code for regression, regression code R etc. are all valid searches, no quotation marks needed. http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20SAS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q This one is nice too. You can see that the bump in the autumn semester for R is replacing the one for Matlab. Then in the spring semester Matlab stays high but R drops. And both the US and India always have a very large search index, whereas the rest of the world is essentially worthless. Which leads me to the conclusion that : 1) The results are probably coming from google.com, excluding local versions, and 2) in the US (and India), statistics is mainly taught in the autumn semester. Given the fact that daylight has a beneficial effect on the emotional well being, the impopularity of statistics is likely caused by unfortunate scheduling. Forget Excel. Google rocks! ;-) Cheers Joris Once you go the phrase route, you gain precision but end up with zero counts on various phrases. I avoided that by combining them with + to get enough to plot. The resulting graph shows SAS dominant until mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in order: http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22cmpt=q This might be a good one to add to http://r4stats.com/popularity Bob I see that there's a car, the R Code Mustang, that adding for gets rid of. Thanks for getting me back on a topic that I had given up on! Bob -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Thursday, June 24, 2010 7:56 PM To: Dario Solari Cc: r-help@r-project.org Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2 0 f or%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m a n ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22% 2 B %22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s a s %20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss% 2 0 software%22%2C%22sas%20software%22cmpt=q Example 3: R code, SAS code, SPSS code http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20co d e %22%2C%22sas%20code%22cmpt=q Example 4: R graph, SAS graph, SPSS graph http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20g r a ph%22%2C%22sas%20graph%22cmpt=q Example 5: R regression, SAS regression, SPSS regression http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22sps s % 20regression%22%2C%22sas%20regression%22cmpt=q Some example are cross-software (learning needs - Example1), other can be biased by the tarditional use of that software (in SPSS usually you don't manipulate graph, i think) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys
Re: [R] count data with a specific range
see ?levels eg: x - rnorm(10) y - cut(x,c(-10,0,10)) levels(y)-c(-10-0,0-10) cheers Joris On Thu, Jun 24, 2010 at 4:14 AM, Yi liuyi.fe...@gmail.com wrote: Yeap. It works. Just to make the result more beautiful. One more question. The interval is showns as (0,10]. Is there a way to change it into the format 0-10? Thanks. On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys jorism...@gmail.com wrote: see ?cut Cheers Joris On Thu, Jun 24, 2010 at 2:57 AM, Yi liuyi.fe...@gmail.com wrote: I would like to prepare the data for barplot. But I only have the data frame now. x1=rnorm(10,mean=2) x2=rnorm(20,mean=-1) x3=rnorm(15,mean=3) data=data.frame(x1,x2,x3) If there a way to put data within a specific range? The expected result is as follows: range x1 x2 x3 -10-0 2 5 1 (# points in this range) 0-10 7 9 6 ... I know the table function but I do not know how to deal with the range issue. Thanks in advance. Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Isn't that exactly what you would expect when using a _generalized_ least squares compared to a normal least squares? GLS is not the same as WLS. http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm Cheers Joris On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote: Hi all, I understand that gls() uses generalized least squares, but I thought that maybe optimum weights from gls might be used as weights in lm (as shown below), but apparently this is not the case. See: library(nlme) f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris, weights = varIdent(form = ~ 1 | Species)) aa - attributes(summary(f1)$modelStruct$varStruct)$weights f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa) summary(f1)$tTable; summary(f2) So, the two models with the very same weights do differ (in terms of standard errors). Could you please explain why? Are these different types of weights? Many thanks in advance, Stats Wolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
One way of looking at it is doing a sign test after substraction of the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect to have about as many values above as below zero. There is a sign test somewhere in one of the packages, but it's easily done using the binom.test as well : set.seed(12345) x1 - rnorm(100) x2 - rpois(100,2) binom.test((sum(x1-mean(x1)0)),length(x1)) Exact binomial test data: (sum(x1 - mean(x1) 0)) and length(x1) number of successes = 56, number of trials = 100, p-value = 0.2713 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.4571875 0.6591640 sample estimates: probability of success 0.56 binom.test((sum(x2-mean(x2)0)),length(x2)) Exact binomial test data: (sum(x2 - mean(x2) 0)) and length(x2) number of successes = 37, number of trials = 100, p-value = 0.01203 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2755666 0.4723516 sample estimates: probability of success 0.37 Cheers Joris On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen atte...@utu.fi wrote: PS. Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package? Thanks. What I have had to ask is that how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value. On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Indeed, WLS is a special case of GLS, where the error covariance matrix is a diagonal matrix. OLS is a special case of GLS, where the error is considered homoscedastic and all weights are equal to 1. And I now realized that the varIdent() indeed makes a diagonal covariance matrix, so the results should be the same in fact. Sorry for missing that one. A closer inspection shows that the results don't differ too much. The fitting method differs between both functions; lm.wfit uses the QR decomposition, whereas gls() uses restricted maximum likelihood. In Asymptopia, they should give the same result. Cheers Joris On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com wrote: Thanks for reply. Yes, they do differ, but does not gls() with the weights argument (correlation being unchanged) make the special version of GLS, as this sentence from the page you provided says: The method leading to this result is called Generalized Least Squares estimation (GLS), of which WLS is just a special case? Best, Stats Wolf On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote: Isn't that exactly what you would expect when using a _generalized_ least squares compared to a normal least squares? GLS is not the same as WLS. http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm Cheers Joris On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote: Hi all, I understand that gls() uses generalized least squares, but I thought that maybe optimum weights from gls might be used as weights in lm (as shown below), but apparently this is not the case. See: library(nlme) f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris, weights = varIdent(form = ~ 1 | Species)) aa - attributes(summary(f1)$modelStruct$varStruct)$weights f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa) summary(f1)$tTable; summary(f2) So, the two models with the very same weights do differ (in terms of standard errors). Could you please explain why? Are these different types of weights? Many thanks in advance, Stats Wolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2
On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza eugeniusz.kal...@polsl.pl wrote: Dear useRs, Thanks for any advices # I do not know where are the examples how to mark groups # based on signal occurence in the additional variable: cf. variable c2, # How to calculate different calculations for groups defined by (split by occurence of c2 characteristic data) #First example of simple data #mexample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 c0-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2-rbind(NA,Start1,NA,NA,Stop1,NA,NA,NA,NA,NA,NA,Start2,NA,NA,NA,NA,Stop2) c2 C.df-data.frame(cbind(c0,c1,c2)) colnames(C.df)-c(c0,c1,c2) C.df # preparation of form for explaining further needed result (next 3 lines are not needed indeed, they are only to explain how to obtain final result c3-rbind(NA,Start1,Start1,Start1,Start1,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2) c4-rbind(NA, Stop1, Stop1, Stop1, Stop1, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2) C.df-data.frame(cbind(c0,c1,c2,c3,c4)) colnames(C.df)-c(c0,c1,c2,c3,c4) C.df$c5-paste(C.df$c3,C.df$c4,sep=-) C.df Now this is something I don't get. The list Start2-Stop2 starts way before Start2, actually at Stop1. Sure that's what you want? I took the liberty of showing how to get the data between start and stop for every entry, and how to apply functions to it. If you don't get the code, look at ?lapply ?apply ?grep I also adjusted your example, as you caused all variables to be factors by using the cbind in the data.frame function. Never do this unless you're really sure you have to. But I can't think of a case where that would be beneficial... ... C.df-data.frame(c0,c1,c2) C.df # find positions Start - grep(Start,C.df$c2) Stop - grep(Stop,C.df$c2) # create indices idx - apply(cbind(Start,Stop),1,function(i) i[1]:i[2]) names(idx) - paste(Start,1:length(Start),-Stop,1:length(Start),sep=) # Apply the function summary and get a list back named by the interval. out - lapply(idx,function(i) summary(C.df[i,1:2])) out If you really need to start Start2 right after Stop1, you can use a similar approach. Cheers Joris # NEEDED RESULTS # needed result # for Stat1-Stop1: mean(20,30,40,50) # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) #mean: c1 c3 c4 c5 20 Start1 Stop1 Start1-Stop1 25.48585 Start2 Stop2 Start2-Stop2 #sum # for Stat1-Stop1: sum(20,30,40,50) # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) #sum: c1 c3 c4 c5 140 Start1 Stop1 Start1-Stop1 280.3444 Start2 Stop2 Start2-Stop2 # for Stat1-Stop1: max(20,30,40,50) # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) #max: c1 c3 c4 c5 50 Start1 Stop1 Start1-Stop1 60 Start2 Stop2 Start2-Stop2 # place of max (in Start1-Stop1: 4 th element in gruop Start1-Stop1 # place of max (in Start1-Stop1: 2 nd element in gruop Start1-Stop1 c0 c3 c4 c5 4 Start1 Stop1 Start1-Stop1 2 Start2 Stop2 Start2-Stop2 Thanks for any suggestion, Kaluza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on WLS (gls vs lm)
Thanks! I was getting confused as wel, Wolf really had a point. I have to admit that this is all a bit counterintuitive. I would expect those weights to be the inverse of the variances, as GLS uses the inverse of the variance-covariance matrix. I finally found in the help files ?nlme::varWeights the needed information, after you pointed out the problem and after strolling through quite a part of the help files... Does this mean that the GLS algorithm uses 1/sd as weights (can't imagine that...), or is this one of the things in R that just are like that? Cheers Joris On Thu, Jun 24, 2010 at 3:20 PM, Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@stat.unimaas.nl wrote: The weights in 'aa' are the inverse standard deviations. But you want to use the inverse variances as the weights: aa - (attributes(summary(f1)$modelStruct$varStruct)$weights)^2 And then the results are essentially identical. Best, -- Wolfgang Viechtbauer http://www.wvbauer.com/ Department of Methodology and Statistics Tel: +31 (0)43 388-2277 School for Public Health and Primary Care Office Location: Maastricht University, P.O. Box 616 Room B2.01 (second floor) 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) Original Message From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent: Thursday, June 24, 2010 15:00 To: Joris Meys Cc: r-help@r-project.org Subject: Re: [R] Question on WLS (gls vs lm) Indeed, they should give the same results, and hence I was worried to see that the results were not that same. Suffice it to look at standard errors and p-values: they do differ, and the differences are not really that small. Thanks, Stats Wolf On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys jorism...@gmail.com wrote: Indeed, WLS is a special case of GLS, where the error covariance matrix is a diagonal matrix. OLS is a special case of GLS, where the error is considered homoscedastic and all weights are equal to 1. And I now realized that the varIdent() indeed makes a diagonal covariance matrix, so the results should be the same in fact. Sorry for missing that one. A closer inspection shows that the results don't differ too much. The fitting method differs between both functions; lm.wfit uses the QR decomposition, whereas gls() uses restricted maximum likelihood. In Asymptopia, they should give the same result. Cheers Joris On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com wrote: Thanks for reply. Yes, they do differ, but does not gls() with the weights argument (correlation being unchanged) make the special version of GLS, as this sentence from the page you provided says: The method leading to this result is called Generalized Least Squares estimation (GLS), of which WLS is just a special case? Best, Stats Wolf On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote: Isn't that exactly what you would expect when using a _generalized_ least squares compared to a normal least squares? GLS is not the same as WLS. http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square s_generalized.htm Cheers Joris On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote: Hi all, I understand that gls() uses generalized least squares, but I thought that maybe optimum weights from gls might be used as weights in lm (as shown below), but apparently this is not the case. See: library(nlme) f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris, weights = varIdent(form = ~ 1 | Species)) aa - attributes(summary(f1)$modelStruct$varStruct)$weights f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa) summary(f1)$tTable; summary(f2) So, the two models with the very same weights do differ (in terms of standard errors). Could you please explain why? Are these different types of weights? Many thanks in advance, Stats Wolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https
Re: [R] How to say if error
An old-fashioned and I guess also advised-against method would be to use try() itself, eg : set.seed(1) x - rnorm(1:10) y - letters[1:10] z - rnorm(1:10) for (i in list(x,y,z)){ cc - try(sum(i), silent=T) if(is(cc,try-error)) {next} print(cc) } Put silent=F if you want to see the error methods. See also ?try (and ?is ) Cheers Joris On Thu, Jun 24, 2010 at 3:34 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 24/06/2010 7:06 AM, Paul Chatfield wrote: I've had a look at the conditions in base and I can't get the ones to work I've looked at but it is all new to me. For example, I can work the examples for tryCatch, but it won't print a finally message for me when I apply it to my model. Even if I could get this to work, I think it would still cause a break e.g. for (j in 1:10) {tryCatch(ifelse(j==5, stop(j), j), finally=print(oh dear))} Thanks for the suggestion though - any others? I think you don't want to use finally, which is just code that's guaranteed to be executed at the end. You want to catch the errors and continue. For example, for (j in 1:10) { tryCatch(ifelse(j==5, stop(j), print(j)), error=function(e) {print(caught error); print(e)}) } Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If-error-resume-next
http://r.789695.n4.nabble.com/How-to-say-if-error-td2266619.html#a2266619 Cheers On Thu, Jun 24, 2010 at 4:30 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: In VBA there is a syntax if error resume next which sometimes acts as very beneficial on ignoring error. Is there any similar kind of function/syntax here in R as well? Thanks for your attention [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging half hourly data to hourly
See ?aggregate Cheers J On Thu, Jun 24, 2010 at 10:45 AM, Jennifer Wright j.wright...@sms.ed.ac.uk wrote: Hi all, I have some time-series data in half hourly time steps and I want to be able to either average or sum the two half hours together into hours. Any help greatly appreciated. Thanks, Jenny -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to say if error
You could do that using the options, eg : set.seed(1) x - rnorm(1:10) y - letters[1:10] z - rnorm(1:10) warn -getOption(warn) options(warn=2) for (i in list(x,y,z)){ cc - try(mean(i), silent=T) if(is(cc,try-error)) {next} print(cc) } options(warn=warn) see ?options under warn Cheers Joris On Thu, Jun 24, 2010 at 5:12 PM, Paul Chatfield p.s.chatfi...@reading.ac.uk wrote: On a similar issue, how can you detect a warning in a loop - e.g. the following gives a warning, so I'd like to set up code to recognise that and then carry on in a loop x-rnorm(2);y-c(1,0) ff-glm(y/23~x, family=binomial) so this would be incorporated into a loop that might be x-rnorm(10);y-rep(c(1,0),5) for (i in 1:10) {ee-glm(y~x, family=binomial) ff-glm(y/23~x, family=binomial)} from which I would recognise the warning in ff and not those in ee, saving results from ee and not from ff. The last bit would be easy adding a line if(there_is_a_warning_message) {newvector-NA} else {use results} but how do you detect the warning message? Thanks all for your feedback so far, Paul -- View this message in context: http://r.789695.n4.nabble.com/How-to-say-if-error-tp2266619p2267140.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PD: ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2
Same trick : c0-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D) c2 C.df-data.frame(c0,c1,c2) C.df pos - which(!is.na(C.df$c2)) idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1)) names(idx) - sapply(2:length(pos), function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]])) out - lapply(idx,function(i) summary(C.df[i,1:2])) out 2010/6/24 Eugeniusz Kałuża eugeniusz.kal...@polsl.pl: Dear useRs, Thanks for advice from Joris Meys, Now will try to think how to make it working for less specyfic case, to make the problem more general. Then the result should be displayed for every group between non empty string in c2 i.e. not only result for: #mean: c1 c3 c4 c5 20 Start1 Stop1 Start1-Stop1 25.48585 Start2 Stop2 Start2-Stop2 but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one string i.e.: #mean: c1 c3 c4 c5 20 Start1 Stop1 Start1-Stop1 .. Stop1 Start2 Stop1-Start2 25.48585 Start2 Stop2 Start2-Stop2 i.e. to rewrite this maybe for another simpler version of command but also for every one group created by space between two closest strings in c2, that contains only seriess of Na, NA, NA, separated from time to time by one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA i.e.: #mean: c1 c3 c4 c5 20 A B A-B .. B C B-C 25.48585 C D C-D ... Looking for more general method (function), grouping between these letters in c2, I will now try to study solution proposed by Joris Meys Thanks for immediate aswer Kaluza -Wiadomo¶æ oryginalna- Od: Joris Meys [mailto:jorism...@gmail.com] Wys³ano: Cz 2010-06-24 15:14 Do: Eugeniusz Ka³u¿a DW: r-help@r-project.org Temat: Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2 On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza eugeniusz.kal...@polsl.pl wrote: Dear useRs, Thanks for any advices # I do not know where are the examples how to mark groups # based on signal occurence in the additional variable: cf. variable c2, # How to calculate different calculations for groups defined by (split by occurence of c2 characteristic data) #First example of simple data #mexample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 c0-rbind( 1, 2 , 3, 4, 5, 6, 7, 8, 9,10,11, 12,13,14,15,16,17 ) c0 c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50, 30,10, 0,NA,20,10.3444) c1 c2-rbind(NA,Start1,NA,NA,Stop1,NA,NA,NA,NA,NA,NA,Start2,NA,NA,NA,NA,Stop2) c2 C.df-data.frame(cbind(c0,c1,c2)) colnames(C.df)-c(c0,c1,c2) C.df # preparation of form for explaining further needed result (next 3 lines are not needed indeed, they are only to explain how to obtain final result c3-rbind(NA,Start1,Start1,Start1,Start1,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2) c4-rbind(NA, Stop1, Stop1, Stop1, Stop1, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2) C.df-data.frame(cbind(c0,c1,c2,c3,c4)) colnames(C.df)-c(c0,c1,c2,c3,c4) C.df$c5-paste(C.df$c3,C.df$c4,sep=-) C.df Now this is something I don't get. The list Start2-Stop2 starts way before Start2, actually at Stop1. Sure that's what you want? I took the liberty of showing how to get the data between start and stop for every entry, and how to apply functions to it. If you don't get the code, look at ?lapply ?apply ?grep I also adjusted your example, as you caused all variables to be factors by using the cbind in the data.frame function. Never do this unless you're really sure you have to. But I can't think of a case where that would be beneficial... ... C.df-data.frame(c0,c1,c2) C.df # find positions Start - grep(Start,C.df$c2) Stop - grep(Stop,C.df$c2) # create indices idx - apply(cbind(Start,Stop),1,function(i) i[1]:i[2]) names(idx) - paste(Start,1:length(Start),-Stop,1:length(Start),sep=) # Apply the function summary and get a list back named by the interval. out - lapply(idx,function(i) summary(C.df[i,1:2])) out If you really need to start Start2 right after Stop1, you can use a similar approach. Cheers Joris # NEEDED RESULTS # needed result # for Stat1-Stop1: mean(20,30,40,50) # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T) #mean: c1 c3 c4 c5 20 Start1 Stop1 Start1-Stop1 25.48585
Re: [R] Wilcoxon signed rank test and its requirements
I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, That is a key point. I was assuming that you were using the paired sample version of the WSRT and I may have been misleading the OP. For the one-sample situation, the assumption of symmetry is needed but for the paired sampling version of the test, the location shift becomes the tested hypothesis, and no assumptions about the form of the hypothesis are made except that they be the same. I believe you mean the form of the distributions. The assumption that the distributions of both samples are the same (or similar, it is a robust test) implies that the differences x_i - y_i are more or less symmetrically distributed. Key point here that we're not talking about the distribution of the populations/samples (as done in the OP) but about the distribution of the difference. I may not have been clear enough on that one. Cheers Joris Any consideration of median or mean (which will be the same in the case of symmetric distributions) gets lost in the paired test case. -- David. the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : set.seed(100) y - rnorm(100,1,2)^2 wilcox.test(y,mu=median(y)) Wilcoxon signed rank test with continuity correction data: y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 wilcox.test(y,mu=mean(y)) Wilcoxon signed rank test with continuity correction data: y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote: You are being misled. Simply finding a statement on a statistics software website, even one as reputable as Graphpad (???), does not mean that it is necessarily true. My understanding (confirmed reviewing Nonparametric statistical methods for complete and censored data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does not require that the underlying distributions be symmetric. The above quotation is highly inaccurate. -- David. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help, bifurcation diagram efficiency
Basically, don't write loops. Think vectors, matrices,... The R Inferno of Patrick Burns contains a lot of valuable information on optimizing code : http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf Cheers Joris On Thu, Jun 24, 2010 at 7:51 PM, Tyler Massaro tyler.mass...@gmail.com wrote: Hello all - This code will run, but it bogs down my computer when I run it for finer and finer time increments and more generations. I was wondering if there is a better way to write my loops so that this wouldn't happen. Thanks! -Tyler # # Bifurcation diagram # Using Braaksma system of equations # We have however used a Fourier analysis # to get a forcing function similar to # cardiac action potential... # require(odesolve) # We get this s_of_t function from Maple ws s_of_t = function(t) { (1/10) * (( (1/2) + (1/2) * (sin((1/4)*pi*t))/(abs(sin((1/4)*pi*t * ( 6.588315815*sin((1/4)*pi*t) - 1.697435362*sin((1/2)*pi*t) - 1.570296922*sin ((3/4)*pi*t) + 0.3247901958*sin(pi*t) + 0.7962749105*sin((5/4)*pi*t) + 0.07812230515*sin((3/2)*pi*t) - 0.3424877143*sin((7/4)*pi*t) - 0.1148306748* sin(2*pi*t) + 0.1063696962*sin((9/4)*pi*t) + 0.02812403009*sin((5/2)*pi*t))) } ModBraaksma = function(t, n, p) { dx.dt = (1/0.01)*(n[2]-((1/2)*n[1]^2+(1/3)*n[1]^3)) dy.dt = -(n[1]+p[alpha]) + 0.032 * s_of_t(t) list(c(dx.dt, dy.dt)) } initial.values = c(0.1, -0.02) alphamin = 0.01 alphamax = 0.02 alphas = seq(alphamin, alphamax, by = 0.1) TimeInterval = 100 times = seq(0.001, TimeInterval, by = 0.001) plot(1, xlim = c(alphamin, alphamax), ylim = c(0,0.3), type = n,xlab = Values of alpha, ylab = Approximate loop size for a limit cycle, main = Bifurcation Diagram) for (i in 1:length(alphas)){ params = c(alpha=alphas[i]) out = lsoda(initial.values, times, ModBraaksma, params) X = out[,2] Y = out[,3] for(j in 200:length(times)){ if (abs(X[j]) 0.001) { points(alphas[i], Y[j], pch = .) } } } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple qqplot question
Also take a look at qq.plot in the package car. Gives you exactly what you want. Cheers Joris On Fri, Jun 25, 2010 at 12:55 AM, Ralf B ralf.bie...@gmail.com wrote: More details... I have two distributions which are very similar. I have plotted density plots already from the two distributions. In addition, I created a qqplot that show an almost straight line. What I want is a line that represents the ideal case in which the two distributions match perfectly. I would use this line to see how much the errors divert at different stages of the plot. Ralf On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick ssef...@gmail.com wrote: You are going to have to define the question a little better. Also, please provide a reproducible example. On Thu, Jun 24, 2010 at 4:44 PM, Ralf B ralf.bie...@gmail.com wrote: I am a beginner in R, so please don't step on me if this is too simple. I have two data sets datax and datay for which I created a qqplot qqplot(datax,datay) but now I want a line that indicates the perfect match so that I can see how much the plot diverts from the ideal. This ideal however is not normal, so I think qqnorm and qqline cannot be applied. Perhaps you can help? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick | Auburn University | | Department of Biological Sciences | | 331 Funchess Hall | | Auburn, Alabama | | 36849 | |___| | sas0...@auburn.edu | | http://www.auburn.edu/~sas0025 | |___| Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install package automatically if not there?
I've never seen R been Perl'd this nice before. On Thu, Jun 24, 2010 at 10:32 PM, Albert-Jan Roskam fo...@yahoo.com wrote: require(pkg) || install.packages(pkg) Cheers!! Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ --- On Thu, 6/24/10, Ralf B ralf.bie...@gmail.com wrote: From: Ralf B ralf.bie...@gmail.com Subject: [R] Install package automatically if not there? To: r-help@r-project.org r-help@r-project.org Date: Thursday, June 24, 2010, 9:25 PM Hi fans, is it possible for a script to check if a library has been installed? I want to automatically install it if it is missing to avoid scripts to crash when running on a new machine... Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install package automatically if not there?
If you want to make changes more permanent, you should take a look at the Rprofile.site file. This one gets loaded in R at the startup of the console. You can set the CRAN there if you want too. Cheers Joris On Fri, Jun 25, 2010 at 1:32 AM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hello Ralf, Glad it works for you. As far as avoiding any prompting if packages are out-of-date; I am not sure. It honestly seems like a risky idea to me to have old packages being overwritten without a user knowing. However, I added a few lines of code here to set the CRAN mirror, and revert it back to whatever it was when the function ends to make it as inobtrusive as possible. load.fun - function(x) { old.repos - getOption(repos) on.exit(options(repos = old.repos)) #this resets the repos option when the function exits new.repos - old.repos new.repos[CRAN] - http://cran.stat.ucla.edu; #set your favorite CRAN Mirror here options(repos = new.repos) x - as.character(substitute(x)) if(isTRUE(x %in% .packages(all.available=TRUE))) { eval(parse(text=paste(require(, x, ), sep=))) } else { update.packages() # recommended before installing so that dependencies are the latest version eval(parse(text=paste(install.packages(', x, '), sep=))) eval(parse(text=paste(require(, x, ), sep=))) } } Best regards, Josh On Thu, Jun 24, 2010 at 3:34 PM, Ralf B ralf.bie...@gmail.com wrote: Thanks, works great. By the way, is there a say to even go further, and avoid prompting (e.g. picking a particiular server by default and updating without further prompt) ? I could understand if such a feature does not exist for security reasons... Thanks again, Ralf On Thu, Jun 24, 2010 at 5:12 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: On Thu, Jun 24, 2010 at 1:51 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hello Ralf, This is a little function that you may find helpful. If the package is already installed, it just loads it, otherwise it updates the existing packages and then installs the required package. As in require(), 'x' does not need to be quoted. load.fun - function(x) { x - as.character(substitute(x)) if(isTRUE(x %in% .packages(all.available=TRUE))) { eval(parse(text=paste(require(, x, ), sep=))) } else { update.packages() # recommended before installing so that dependencies are the latest version eval(parse(text=paste(install.packages(', x, '), sep=))) } } I miscopied the last line; it should be ### load.fun - function(x) { x - as.character(substitute(x)) if(isTRUE(x %in% .packages(all.available=TRUE))) { eval(parse(text=paste(require(, x, ), sep=))) } else { update.packages() # recommended before installing so that dependencies are the latest version eval(parse(text=paste(install.packages(', x, '), sep=))) eval(parse(text=paste(require(, x, ), sep=))) } } ### HTH, Josh On Thu, Jun 24, 2010 at 12:25 PM, Ralf B ralf.bie...@gmail.com wrote: Hi fans, is it possible for a script to check if a library has been installed? I want to automatically install it if it is missing to avoid scripts to crash when running on a new machine... Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles -- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles -- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20forcmpt=q On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22cmpt=q Example 3: R code, SAS code, SPSS code http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22cmpt=q Example 4: R graph, SAS graph, SPSS graph http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22cmpt=q Example 5: R regression, SAS regression, SPSS regression http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22cmpt=q Some example are cross-software (learning needs - Example1), other can be biased by the tarditional use of that software (in SPSS usually you don't manipulate graph, i think) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
dangit, tab in the way... On Fri, Jun 25, 2010 at 1:56 AM, Joris Meys jorism...@gmail.com wrote: Nice idea, but quite sensitive to search terms, if you compare your result on ... code with ... code for: http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20forcmpt=q This one actually shows quite nicely when students start panicking for their projects. Cheers Joris On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote: First: excuse for my english My opinion: a useful font for measuring popoularity can be Google Insights for Search - http://www.google.com/insights/search/# Every person using a software like R, SAS, SPSS needs first to learn it. So probably he make a web-search for a manual, a tutorial, a guide. One can measure the share of this kind of serach query. This kind of results can be useful to determine trends of popularity. Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide, SPSS tutorial/manual/guide http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22cmpt=q Example 2: R software, SAS software, SPSS software http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22cmpt=q Example 3: R code, SAS code, SPSS code http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22cmpt=q Example 4: R graph, SAS graph, SPSS graph http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22cmpt=q Example 5: R regression, SAS regression, SPSS regression http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22cmpt=q Some example are cross-software (learning needs - Example1), other can be biased by the tarditional use of that software (in SPSS usually you don't manipulate graph, i think) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write a loop for tallies
It would help if you placed r - 0; s - 0 etc. outside the loop. Same goes for cat(...). And get rid of the sum(r), sum(s) and so on, that's doing nothing (r,s,... are single numbers) This said : See Peter Langfelder's response. Cheers Joris # see ?table for a better approach r-0 s-0 t-0 u-0 v-0 a- for (i in 1:length(n)) { ifelse(n[i] == 3000, r - r+1, ifelse(n[i] == 4000, s - r+1, ifelse(n[i] == 5000, t - r+1, ifelse(n[i] == 6000, u - r+1, ifelse(n[i] == 7000, v - r+1, NA) } cat(r = , r, \n) cat(s = , s, \n) cat(t = , t, \n) cat(u = , u, \n) cat(v = , v, \n) -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] save scores from sem
I should have specified: lavaan is not familiar to me. I'm also not familiar enough with the sem package to tell you how to obtain the scores, but all information regarding the fit is in the object. With that, it shouldn't be too difficult to get the scores you want. This paper might give you some more information, in case you didn't know it yet : http://socserv.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf On a side note, sem with a single latent variable might be seen as a factor analysis with one component, but definitely not as a PCA. A PCA is constructed based on the total variance, rendering an orthogonal space with as many dimensions as there ara variables. Not so for a FA, as the matrix used to calculate the eigenvectors and eigenvalues is a reduced matrix, in essence only taking into account part of the variation in the data for calculation of the loadings. This makes PCA absolutely defined, but FA only up to a rotation. On a second side note, using the saved scores in some subsequent analysis is in my view completely against the idea behind sem. Structural equation modelling combines those observed variables exactly to be able to take the variation on the combined latent variable into account. If you use those latent variables as input in a second analysis, you lose the information regarding the variation. Cheers Joris On Wed, Jun 23, 2010 at 9:53 AM, Steve Powell st...@promente.net wrote: Dear Joris, thanks for your reply - it is the sem case which interests me. Suppose for example I use sem to construct a CFA for a set of variables, with a single latent variable, then this could be equivalent to a PCA with a single component, couldn't it? From the PCA I could save the scores as new variables; is there an equivalent with sem? This would be particularly useful if e.g. in sem I let some of the errors covary and then wanted to use the saved scores in some subsequent analysis. By the way, lavaan is at cran.r-project.org/web/packages/lavaan/index.html Best Wishes Steve www.promente.org | skype stevepowell99 | +387 61 215 997 On Tue, Jun 22, 2010 at 7:08 PM, Joris Meys jorism...@gmail.com wrote: PCA and factor analysis is implemented in the core R distribution, no extra packages needed. When using princomp, they're in the object. pr.c - princomp(USArrests,scale=T) pr.c$scores # gives you the scores see ?princomp When using factanal, you can ask for regression scores or Bartlett scorse. See ?factanal. Mind you, you will get different -i.e. more correct- results than those obtained by SPSS. I don't understand what you mean with scores in the context of structural equation modelling. Lavaan is unknown to me. Cheers Joris On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell st...@promente.net wrote: Dear expeRts, sorry for such a newbie question - in PCA/factor analysis e.g. in SPSS it is possible to save scores from the factors. Is it analogously possible to save the implied scores from the latent variables in a measurement model or structural model e.g. using the sem or lavaan packages, to use in further analyses? Best wishes Steve Powell www.promente.org | skype stevepowell99 | +387 61 215 997 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (help) This is an R workspace memory processing question
http://lmgtfy.com/?q=R+large+data+sets Cheers Joris On Wed, Jun 23, 2010 at 6:38 AM, 나여나 dllm...@hanmail.net wrote: Hi all! This is an R workspace memory processing question. There is a method which from the R will control 10GB data at 500MB units? my work environment : R version : 2.11.1 OS : WinXP Pro sp3 Thanks and best regards. Park, Young-Ju from Korea. [1][rKWLzcpt.zNp8gmPEwGJCA00] [...@from=dllmainrcpt=r%2Dhelp%40r%2Dproject%2Eorgmsgid=%3C20100623133828%2EH M%2E0Zq%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E] References 1. mailto:dllm...@hanmail.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about a program
Most of the computation time is in the functions qvnorm. You can win a little bit by optimizing the code, but the gain is relatively small. You can also decrease the interval used to evaluate qvnorm to win some speed there. As you look for the upper tail, no need to evaluate the function in negative numbers. Eg : pair_kFWER2=function(m,alpha,rho,k,pvaluessort){ library(mvtnorm) cc_z - numeric(m) Var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) lpart - 1:k # first part hpart - (k+1):m # second part # make first part of the cc_z cc_z[lpart] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper, interval=c(0,4),sigma=Var)$quantile # make second part of the cc_z cc_z[hpart] - sapply(hpart,function(i){ qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,interval=c(0,4), tail=upper, sigma=Var)$quantile }) crit_cons - 1-pnorm(cc_z) # does the test whether values of crit_cons[i] are smaller than # values pvaluessort[i] and returns a vector # which says at which positions does not occur # tail takes the last position. I guess that's what you wanted out - tail(which(!(crit_cons pvaluessort)),1) return(out) } system.time(replicate(100,pair_kFWER(m,alpha,rho,k,pvaluessort))) user system elapsed 5.970.046.04 system.time(replicate(100,pair_kFWER2(m,alpha,rho,k,pvaluessort))) user system elapsed 4.430.004.44 Check whether the function works correctly. It gives the same result as the other one in my tests. Only in the case where your function returns 0, the modified returns integer(0) Cheers Joris On Wed, Jun 23, 2010 at 2:21 PM, li li hannah@gmail.com wrote: Dear all, I have the following program for a multiple comparison procedure. There are two functions for the two steps. First step is to calculate the critical values, while the second step is the actual procedure [see below: program with two functions]. This work fine. However, However I want to put them into one function for the convenience of later use [see below: program with one function]. Some how the big function works extremely slow. For example I chose m - 10 rho - 0.1 k - 2 alpha - 0.05 pvaluessort - sort(1-pnorm(rnorm(10)) Is there anything that I did wrong? Thank you! Hannah ##Program with two functions ## first step library(mvtnorm) cc_f - function(m, rho, k, alpha) { cc_z - numeric(m) var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) for (i in 1:m){ if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper, sigma=var)$quantile} else {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha, tail=upper, sigma=var)$quantile} } cc - 1-pnorm(cc_z) return(cc) } ## second step pair_kFWER=function(m,crit_cons,pvaluessort){ k=0 while((k m)(crit_cons[m-k] pvaluessort[m-k])){ k=k+1 } return(m-k) } ### ##Program with one function ## pair_kFWER=function(m,alpha,rho,k,pvaluessort){ library(mvtnorm) cc_z - numeric(m) var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T) for (i in 1:m){ if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper, sigma=var)$quantile} else {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha, tail=upper, sigma=var)$quantile} } crit_cons - 1-pnorm(cc_z) k=0 while((k m)(crit_cons[m-k] pvaluessort[m-k])){ k=k+1 } return(m-k) } # [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing distributions
A qqplot would indeed help. ?ks.test for more formal testing, but be aware: You should also think about what you call similar distributions. See following example : set.seed(12345) x1 - c(rnorm(100),rnorm(150,3.3,0.7)) x2 - c(rnorm(140,1,1.2),rnorm(110,3.3,0.6)) x3 - c(rnorm(140,2,1.2),rnorm(110,4.3,0.6)) d1 -density(x1) d2 - density(x2) d3 - density(x3) xlim - 1.2*c(min(x1,x2,x3),max(x1,x2,x3)) ylim - 1.2*c(0,max(d1$y,d2$y,d3$y)) op - par(mfrow=c(1,3)) plot(d1,xlim=xlim,ylim=ylim) lines(d2,col=red) lines(d3,col=blue) qqplot(x1,x2) qqplot(x2,x3) par(op) # formal testing ks.test(x1,x2) ks.test(x2,x3) # relocate x3 x3b - x3 - mean(x3-x2) x3c - x3 - mean(x3-x1) # formal testing ks.test(x2,x3b) ks.test(x1,x3c) # test location t.test(x2-x1) t.test(x3-x2) t.test(x3-x1) Cheers Joris On Wed, Jun 23, 2010 at 9:33 PM, Ralf B ralf.bie...@gmail.com wrote: I am trying to do something in R and would appreciate a push into the right direction. I hope some of you experts can help. I have two distributions obtrained from 1 datapoints each (about 1 datapoints each, non-normal with multi-model shape (when eye-balling densities) but other then that I know little about its distribution). When plotting the two distributions together I can see that the two densities are alike with a certain distance to each other (e.g. 50 units on the X axis). I tried to plot a simplified picture of the density plot below: | | * | * * | * + * | * + + * | * + * + + * | * +* + * + + * | * + * + +* | * + +* | * + +* | * + + * | * + + * |___ What I would like to do is to formally test their similarity or otherwise measure it more reliably than just showing and discussing a plot. Is there a general approach other then using a Mann-Whitney test which is very strict and seems to assume a perfect match. Is there a test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or are there any other similarity measures that could give me a statistic about how close these two distributions are to each other ? All I can say from eye-balling is that they seem to follow each other and it appears that one distribution is shifted by a amount from the other. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon signed rank test and its requirements
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote: Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in Wilcoxon signed rank test with continuity correction: wilcox.test(Sample,mu=mean(All), alt=two.sided) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 wilcox.test(Sample,mu=mean(All), alt = greater) Wilcoxon signed rank test with continuity correction data: AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? wikipedia says: The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... it also talks about the assumptions. What can we say according these results? p-value for the less is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself What was the question and is this test giving me an answer on that question? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probabilities from survfit.coxph:
On Wed, Jun 23, 2010 at 9:03 PM, Parminder Mankoo pkmanko...@gmail.com wrote: Hello: In the example below (or for a censored data) using survfit.coxph, can anyone point me to a link or a pdf as to how the probabilities appearing in bold under summary(pred$surv) are calculated? Do these represent acumulative probability distribution in time (not including censored time)? predicted survival. Hence the column name survival... Cheers Joris Thanks very much, parmee *fit - coxph(Surv(futime, fustat) ~ age, data = ovarian)* *pred - survfit(fit, newdata=data.frame(age=60))* *summary(pred)* time n.risk n.event survival std.err lower 95% CI upper 95% CI 59 26 1 *0.978* 0.0240 0.932 1.000 115 25 1 *0.952* 0.0390 0.878 1.000 156 24 1 *0.917* 0.0556 0.814 1.000 268 23 1 *0.880* 0.0704 0.752 1.000 329 22 1 *0.818* 0.0884 0.662 1.000 353 21 1 *0.760* 0.0991 0.588 0.981 365 20 1 *0.698* 0.1079 0.516 0.945 431 17 1 *0.623* 0.1187 0.429 0.905 464 15 1 *0.549* 0.1248 0.352 0.858 475 14 1 *0.480* 0.1267 0.286 0.805 563 12 1 *0.382* 0.1332 0.193 0.757 638 11 1 *0.297* 0.1292 0.127 0.697 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list operation
Another variation on the same theme : lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc')) set - c('a','c') f -function(lst,set) sapply(lst,function(x) sum(set %in% x)==length(set) ) i - f(lst,set) names(i[i]) Doesn't serve anybody but keeps my mind fresh. For long lists, you might benefit from first calculating the length of set, to avoid having to do that n times for a list of length n. Cheers Joris On Wed, Jun 23, 2010 at 11:01 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Yu One possibility: lst[sapply(lst, function(x) length(x[x%in% c('a','c')])==2)] HTH ... Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Yuan Jian Sent: Thursday, 24 June 2010 1:35 a.m. To: r-help@r-project.org Subject: [R] list operation Hi, it seems a simple problem, but I can not find a clear way. I have a list: lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc')) lst $m [1] a b c $n [1] c a $l [1] a bc how can I get list elements that include a given subset? for example, for given subset {'a','c'}, the answer should be 'm' and 'n'. thanks Yu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginning Eigen System question.
Which other Linear Algebra system, and which function did you use in R? Cheers Joris On Thu, Jun 24, 2010 at 12:32 AM, rkevinbur...@charter.net wrote: Forgive me if I missunderstand a basic Eigensystem but when I present the following matrix to most any other LinearAlgebra system: 1 3 1 1 2 2 1 1 3 I get an answer like: //$values //[1] 5.00e+00 1.00e+00 -5.536207e-16 //$vectors // [,1] [,2] [,3] //[1,] 0.5773503 -0.8451543 -0.9428090 //[2,] 0.5773503 -0.1690309 0.2357023 //[3,] 0.5773503 0.5070926 0.2357023 But R gives me: //$values //[1] 5.00e+00 1.00e+00 -5.536207e-16 //$vectors // [,1] [,2] [,3] //[1,] -0.5773503 -0.8451543 -0.9428090 //[2,] -0.5773503 -0.1690309 0.2357023 //[3,] -0.5773503 0.5070926 0.2357023 The only difference seems to be the sign on the first eigen vector. What am I missing? Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginning Eigen System question.
On Thu, Jun 24, 2010 at 12:41 AM, Joris Meys jorism...@gmail.com wrote: Which other Linear Algebra system, and which function did you use in R? Cheers Joris Never mind, off course you used eigen()... Eigenvectors are only determined up to a constant. If I'm not mistaken (but check the help files on it), R normalizes the eigenvectors (and so does your other Linear Algebra system). This makes the eigenvectors defined up to the sign. Cheers Joris On Thu, Jun 24, 2010 at 12:32 AM, rkevinbur...@charter.net wrote: Forgive me if I missunderstand a basic Eigensystem but when I present the following matrix to most any other LinearAlgebra system: 1 3 1 1 2 2 1 1 3 I get an answer like: //$values //[1] 5.00e+00 1.00e+00 -5.536207e-16 //$vectors // [,1] [,2] [,3] //[1,] 0.5773503 -0.8451543 -0.9428090 //[2,] 0.5773503 -0.1690309 0.2357023 //[3,] 0.5773503 0.5070926 0.2357023 But R gives me: //$values //[1] 5.00e+00 1.00e+00 -5.536207e-16 //$vectors // [,1] [,2] [,3] //[1,] -0.5773503 -0.8451543 -0.9428090 //[2,] -0.5773503 -0.1690309 0.2357023 //[3,] -0.5773503 0.5070926 0.2357023 The only difference seems to be the sign on the first eigen vector. What am I missing? Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem to building R (datasets)
Why compile from source? 2.11.1 installs fine on XP from the binary, so that's the more obvious solution. Cheers Joris On Thu, Jun 24, 2010 at 12:39 AM, Geun Seop Lee clarmas...@gmail.com wrote: Dear all, While I was trying to build R source, I found an error at datasets package (there was no error before that) ../../../library/datasets/R/datasets is unchanged Error in dir.create(Rdatadir, showWarnings = FALSE) : file name conversion problem Calls: Anonymous - Anonymous - dir.create Execution halted make[2]: *** [all] Error 1 make[1]: *** [R] Error 1 make: *** [all] Error 2 And it was caused by �...@$(ECHO) tools:::data2LazyLoadDB(\$(pkg)\, compress=3) | \ R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) /dev/null at the Makefile.win in the src/datasets directory I am using Window XP and tried to compile 2.11.1 version. I can't imagine how I can solve this problem. Any hints or suggestions will be appreciated. Thank you. Lee. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
see ?cut Cheers Joris On Thu, Jun 24, 2010 at 2:57 AM, Yi liuyi.fe...@gmail.com wrote: I would like to prepare the data for barplot. But I only have the data frame now. x1=rnorm(10,mean=2) x2=rnorm(20,mean=-1) x3=rnorm(15,mean=3) data=data.frame(x1,x2,x3) If there a way to put data within a specific range? The expected result is as follows: range x1 x2 x3 -10-0 2 5 1 (# points in this range) 0-10 7 9 6 ... I know the table function but I do not know how to deal with the range issue. Thanks in advance. Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?
On Mon, Jun 21, 2010 at 10:01 PM, Bert Gunter gunter.ber...@gene.com wrote: 1. This discussion probably belongs on the sig-mixed-models list. Correct, but I'll keep it here now for archive purposes. 2. Your claim is incorrect, I think. The structure of the random errors = model covariance can be parameterized in various ways, and one can try to test significance of nested parameterizations (for a particular fixed effects parameterizaton). Whether you can do so meaningfully especially in the gamm context -- is another issue, but if you check e.g. Bates and Pinheiro, anova for different random effects parameterizations is advocated, and is implemented in the anova.lme() nlme function. It is a matter of debate and interpretation. Mind you that I never said it can't be done. I just wanted to pointed out that in most cases, it shouldn't be done. As somebody else noted, both Wood and Pinheiro and Bates suggest testing the random components _if the fixed effects are the same_ . Yet, even then it gets difficult to offer the correct hypothesis. In the example of Carlo, H0 is actually not correct : 1) The 7 extra random components are not really 7 extra random components, as they are definitely not independent, but actually all part of the same spline fit. 2) According to my understanding, the degrees of freedom for a likelihood is determined by the amount of parameters fitted, not the amount of variables in the model. The parameters linked to the random effect are part of the variance structure, and the number of parameters to be fitted in this variance structure is not dependent on the number of knots, but on the number of smooths. Indeed, in the gamm model the variance of the parameters b for the random effect is considered to be equal for all b's related to the same smooth. 3) it appears to me that you violate the restriction both Pinheiro/Bates and Wood put on the testing of models with LR : you can only do so if none of the variance parameters are restriced to the edge of the feasible parameter space. Again as I see it, the model with less knots implies the assumption that some variance components are zero. Hence, you cannot use a LR test to compare both models. The anova.lme won't carry out the test anyway : library(mgcv) library(nlme) set.seed(123) x - runif(100,1,10)# regressor b0 - rep(rnorm(10,mean=1,sd=2),each=10) # random intercept id - rep(1:10, each=10) # identifier y - b0 + x - 0.1 * x^3 + rnorm(100,0,1) # dependent variable f1 - gamm(y ~ s(x, k=3, bs=cr), random = list(id=~1), method=ML ) f2 - gamm(y ~ s(x, k=10, bs=cr), random = list(id=~1),method=ML ) anova(f1$lme,f2$lme) Model df AIC BIClogLik f1$lme 1 5 466.6232 479.6491 -228.3116 f2$lme 2 5 347.6293 360.6551 -168.8146 If you change the random term not related to the splines, it does carry out the test. In this case, you can test the random effects as explained by Pinheiro/Bates. f3 - gamm(y ~ s(x, k=10, bs=cr),method=ML ) anova(f2$lme,f3$lme) Model df AIC BIClogLik Test L.Ratio p-value f2$lme 1 5 347.6293 360.6551 -168.8146 f3$lme 2 4 446.2704 456.6910 -219.1352 1 vs 2 100.6411 .0001 As an illustration, the variance of the random effects : f1$lme ... Random effects: Formula: ~Xr.1 - 1 | g.1 Xr.1 StdDev: 163.8058 Formula: ~1 | id %in% g.1 (Intercept) Residual StdDev:1.686341 2.063269 Number of Observations: 100 Number of Groups: g.1 id %in% g.1 1 10 f2$lme ... Random effects: Formula: ~Xr.1 - 1 | g.1 Structure: pdIdnot Xr.11Xr.12Xr.13Xr.14Xr.15Xr.16Xr.17Xr.18 StdDev: 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 ... Clearly, the SD for all parameters b is exactly the same, and hence only 1 parameter in the likelihood function. So both models won't differ in df when using the ML estimation. This does not mean that the b-coefficients for the random effects cannot be obtained. They are just not part of the minimalization in the likelihood function. Or, as Wood said about random effects : you don't estimate them, although you might want to predict them. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
Hehe, You do have a point in not calling R a statistical language. It is indeed far more than that; Yet, I don't agree that statistics is done by stuffy professors. Wished it was so, but alas, last time I looked at my paycheck I had to conclude that I might be stuffy, but I'm far from being paid as a professor... Cheers Joris On Tue, Jun 22, 2010 at 11:34 AM, Patrick Burns pbu...@pburns.seanet.com wrote: I'll expand my statement slightly. Yes, Peter, you are the archetypical stuffy professor. The truth hurts. By any reasonable metric that I've thought of my company name is at least one-third statistics, from which a common (and I think correct) inference would be that I'm not anti-statistics. There are two aspects of why I think that R should not be called a statistical program: marketing and reality. Marketing Identifying with the most dreaded experience in university is not so good for sales. (Reducing stuffiness might reduce the root problem here.) Reality R really is used for more than statistics. Almost all of my use of R is outside the realm of statistics. Maybe the field of statistics should have claim on a lot of that, but as of now that isn't the case. A Fusion R's real competition is not SAS or SPSS, but Excel. As Brian has pointed out before, the vast majority of statistics is actually done in Excel. Is Excel a statistics program? I don't think many people think that -- neither statisticians nor non-statisticians. Pat On 21/06/2010 10:32, Joris Meys wrote: On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns pbu...@pburns.seanet.com wrote: (Statistics is what stuffy professors do, I just look at my data and try to figure out what it means.) Often those stuffy professors have a reason to do so. When they want an objective view on the data for example, or an objective measure of the significance of a hypothesis. But you're right, who cares about objectiveness these days? It doesn't sell you a paper, does it? Cheers Joris -- Patrick Burns pbu...@pburns.seanet.com http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} RE: {Spam?} RE: {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?
Hi Carlos, there is no possible way you can compare both models using a classical statistical framework, be it ML, REML or otherwise. The assumptions are violated. Regarding the df, see my previous mail. In your case, I'd resort to the AIC/BIC criteria, and if prediction is the main focus, compare the predictive power of both models using a crossvalidation approach. Wood suggests in his book also a MCMC approach for more difficult comparisons. Cheers Joris On Tue, Jun 22, 2010 at 1:31 AM, Carlo Fezzi c.fe...@uea.ac.uk wrote: Hi Christos, thanks for your kind reply, I agree entirely with your interpreation. In the first model comparison, however, anova does seem to work according to our interpretation, i.e. the df are equal in the two model. My intuition is that the anova command does a fixed-effect test rather than a random effect one. This is the results I get: anova(f1$lme,f2$lme) Model df AIC BIC logLik f1$lme 1 5 466.6232 479.6491 -228.3116 f2$lme 2 5 347.6293 360.6551 -168.8146 Hence I was not sure our interpretation was correct. On your second regarding mode point I totally agree about the appealing of GAMs... howver, I am working in a specific application where the quadratic function is the established benchmark and I think that testing against it will show even more strongly the appeal of a gamm approach. Any idea of which bases could work? Finally thansk for the tip regarding gamm4, unfortunately I need to fit a bivariate smooth so I cannot use it. Best wishes, Carlo -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Project
Presuming you're talking about Perl, I might be able to help with specific issues, but read the posting guide before you actually call upon our help. http://www.R-project.org/posting-guide.html One book I definitely can recommend is Phil Spectors Data Manipulation with R http://www.springer.com/statistics/computanional+statistics/book/978-0-387-74730-9 It covers most you need to know about data structures in R, as they're quite different from Perl. Pay attention to the vectorization and the use of indices in R, as both concepts differ substantially from Perl. In essence, avoid for-loops as much as you can. One other source you might look at is the R inferno of Patrick Burns : http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf Cheers Joris On Tue, Jun 22, 2010 at 4:35 AM, Colton Hunt coltonhu...@gmail.com wrote: I am an intern in Virginia Beach and I am currently working on a project that involves converting a script of Pearl to R. The code takes one input text file and outputs six text files. If you think you can help, I will be able to provide more detailed information. Thank you for your time! Colton Hunt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do you install R package?
open the R console. type: ?install.packages. press enter. read. say Doh, that's easy. do what you read. cheers Joris On Tue, Jun 22, 2010 at 5:38 PM, Amy Li a...@ucsd.edu wrote: Hi I'm a new user. I need to install some new packages. Can someone show me? Amy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replication time series using permutations of the each y values
Read the posting guide please. What's your data structure? That's quite important. As I see it, I can easily get a matrix with what you want by : x1 - rep(a,b,each=3) x2 - rep(c,d,f,times=2) x3 - rep(g,6) x4 - rep(h,6) result - rbind(x1,x2,x3,x4) But that's not what you want probably. Cheers Joris 2010/6/22 Josué Polanco jom...@gmail.com: Dear All, I'm trying to create this: I've these data (a, b,c, etc. are numbers) : 1 a b 2 c d f 3 g 4 h ... N I'm trying to create the all time series permutations (length = N) taking account the possibilities for each value. For example (N=4), 1 a 2 c 3 g 4 h next, 1 b 2 c 3 g 4 h next 1 a 2 d 3 g 4 h and so on. For this example, there are 2*3 different (possibilities) time series, but I need to do, v. gr., N = 100, and 2^14 different (possibilities) time series. I am wondering if somebody could give me a hint (suggestion) to make this using a function in R. thank you so much best regards -- Josue Polanco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicate dates in zoo objects
Try this : x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1 x - zoo(1:5, x.Date) x 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 1 2 3 4 5 i - match(unique(index(x)),rev(index(x))) x[i] 2003-02-01 2003-02-03 2003-02-07 2003-02-14 1 2 4 5 Cheers Joris On Tue, Jun 22, 2010 at 4:35 PM, Research risk2...@ath.forthnet.gr wrote: Hello, I have a zoo time series read from an excel file which has some dates the same, such as the following example: 02/10/1995 4925.5 30/10/1995 4915.9 23/01/1996 4963.5 23/01/1996 5009.2 04/03/1996 5031.9 # here 04/03/1996 5006.5 # here 03/04/1996 5069.2 03/05/1996 5103.7 31/05/1996 5107.1 01/07/1996 5153.1 02/08/1996 5151.7 Is there a simple way to keep the last price of the ones that have the same dates? 04/03/1996 5031.9 04/03/1996 5006.5 i.e., keep only the 04/03/1996 5006.5 price and discard the previous one... Is there an implicit function that does that or do I need some sort of recursive algorithm? You can try a solution on this example (for convenience): x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1 x - zoo(rnorm(5), x.Date) Zoo object has 2 prices with same dates. Many thanks in advance, Costas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicate dates in zoo objects
Oops, I was too fast. I meant : i - length(x)- match(unique(index(x)),rev(index(x)))+1 (one has to reverse the indices again) But in any case, the aggregate-way is indeed far cleaner. Thx for pointing that out. Cheers Joris On Tue, Jun 22, 2010 at 7:24 PM, Achim Zeileis achim.zeil...@uibk.ac.at wrote: On Tue, 22 Jun 2010, Joris Meys wrote: Try this : x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1 x - zoo(1:5, x.Date) x 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 1 2 3 4 5 i - match(unique(index(x)),rev(index(x))) x[i] 2003-02-01 2003-02-03 2003-02-07 2003-02-14 1 2 4 5 Note that this happens to do the right thing in this particular toy example but not more generally. Simply adding a single observation will make it fail. The aggregate() solution I posted previously is preferred: R x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14, 15) - 1 R x - zoo(1:6, x.Date) Warning message: In zoo(1:6, x.Date) : some methods for zoo objects do not work if the index entries in 'order.by' are not unique R x 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 2003-02-15 1 2 3 4 5 6 R aggregate(x, time(x), tail, 1) 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15 1 2 4 5 6 R i - match(unique(index(x)),rev(index(x))) R x[i] 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15 1 2 3 5 6 Best, Z Cheers Joris On Tue, Jun 22, 2010 at 4:35 PM, Research risk2...@ath.forthnet.gr wrote: Hello, I have a zoo time series read from an excel file which has some dates the same, such as the following example: 02/10/1995 4925.5 30/10/1995 4915.9 23/01/1996 4963.5 23/01/1996 5009.2 04/03/1996 5031.9 # here 04/03/1996 5006.5 # here 03/04/1996 5069.2 03/05/1996 5103.7 31/05/1996 5107.1 01/07/1996 5153.1 02/08/1996 5151.7 Is there a simple way to keep the last price of the ones that have the same dates? 04/03/1996 5031.9 04/03/1996 5006.5 i.e., keep only the 04/03/1996 5006.5 price and discard the previous one... Is there an implicit function that does that or do I need some sort of recursive algorithm? You can try a solution on this example (for convenience): x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1 x - zoo(rnorm(5), x.Date) Zoo object has 2 prices with same dates. Many thanks in advance, Costas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] save scores from sem
PCA and factor analysis is implemented in the core R distribution, no extra packages needed. When using princomp, they're in the object. pr.c - princomp(USArrests,scale=T) pr.c$scores # gives you the scores see ?princomp When using factanal, you can ask for regression scores or Bartlett scorse. See ?factanal. Mind you, you will get different -i.e. more correct- results than those obtained by SPSS. I don't understand what you mean with scores in the context of structural equation modelling. Lavaan is unknown to me. Cheers Joris On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell st...@promente.net wrote: Dear expeRts, sorry for such a newbie question - in PCA/factor analysis e.g. in SPSS it is possible to save scores from the factors. Is it analogously possible to save the implied scores from the latent variables in a measurement model or structural model e.g. using the sem or lavaan packages, to use in further analyses? Best wishes Steve Powell www.promente.org | skype stevepowell99 | +387 61 215 997 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm
On Tue, Jun 22, 2010 at 1:00 AM, Samuel Okoye samu...@yahoo.com wrote: Hi, I have the following data data1 - data.frame(count = c(0,1,1,2,4,5,13,16,14), weeks = 1:9, treat=c(rep(1mg,3),rep(5mg,3),rep(10mg,3))) and I am using library(splines) to fit glm.m - glm(count~weeks)+as.factor(treat),family=poisson,data=data1) and I am interested in predicting the count variale for the weeks 10, 11 and 12 with treat 10mg and 15mg. bad luck for you. newdat -data.frame( weeks=rep(10:12,each=2), treat=rep(c(5mg,10mg),times=3) ) preds - predict(glm.m,type=response,newdata=newdat,se.fit=T) cbind(newdat,preds) gives as expected : Warning message: In bs(weeks, degree = 3L, knots = numeric(0), Boundary.knots = c(1L, : some 'x' values beyond boundary knots may cause ill-conditioned bases weeks treat fitse.fit residual.scale 110 5mg 5.934881 5.205426 1 210 10mg 12.041639 9.514347 1 311 5mg 4.345165 6.924663 1 411 10mg 8.816168 15.805171 1 512 5mg 2.781063 8.123436 1 612 10mg 5.642667 18.221007 1 Watch the standard errors on the predicted values. No, you shouldn't predict outside your data space, especially when using splines. And when interested in 15mg, well, you shouldn't put treatment as a factor to start with. Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Verify the linear regression model used in R ( fundamental theory)
It's normally always specified, unless in the case of least squares linear regression (lm), where it is considered obvious. Cheers Joris On Wed, Jun 23, 2010 at 1:46 AM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, As I understand, Least-squares Estimate (second-moment assumption) and the Method of Maximum Likelihood (full distribtuion assumption) are used for linear regression. I do ?lm, but the help file does not tell me the model employed in R. But in the book 'Introductory Statistics with R', it indicates R estimate the parameters using the method of Least-squares. However it assumes the error is iid N(o,sigma^2). Am I correct? Is there any general way (like RSiteSearch() ) to find out what the model (theory) is for specific function? Let's say how to find out the assumption and the model used for rlm. Thanks Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'understand' R functions besides reading R codes
There is: 1) read the help files 2) read the vignette (see ?vignette ) 2) look up at www.rseek.org 3) check google 4) look at the references mentioned in the help files (or do first if you're not familiar with the method) 5) ask here If still nothing : use another function/method. Cheers Joris On Wed, Jun 23, 2010 at 2:29 AM, Yi liuyi.fe...@gmail.com wrote: Apologize for not being clearer earlier. I would like to ask again. Thank Joris and Markleeds for response. Two examples: 1. Function 'var'. In R, it is the sum of square divided by (n-1) but not by n. (I know this in R class) 2. Function 'lm'. In R, it is the residual sum of square divied by (n-2) not by n, the same as in the least squares estimate. But the assumption following that in the method of maximum likelihood. (I know this by looking up the book 'Introductory Statistics with R'). But not all functions are explained in the books. I am thinking whether there is a general way to figure out how R function works and what is the underlying theory besides looking at the codes (type var or lm at the Rprompt). Because R codes are too difficult to understand. Thanks Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis distance
Say X is your data matrix with the variable, then you could do : X - matrix(rnorm(2100),300,7) S - var(X) dist - as.dist( apply(X,1,function(i){ mahalanobis(X,i,S) } ) ) Cheers Joris On Tue, Jun 22, 2010 at 11:41 PM, yoo hoo freesuccess2...@yahoo.com wrote: I am a new R user. i have a question about Mahalanobis distance.actually i have 300 rows and 7 columns. columns are different measurements, 300 rows are genes. since genes can classify into 4 categories. i used dist() with euclidean distance and cmdscale to do MDS plot. but find out Mahalanobis distance may be better. how do i use Mahalanobis() to generate similar dist object which i can use MDS plot? second question is if should i calculate mean for every categories for every measurement first and do 4*4 distance matrix, or i should calculate pairwise distance first and then find category means since i only care about relative position of 4 categories in MDS plot. Thank you very much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compute coefficient of determination (R-squared) for GLM (maximum likelihood)
It is not available for a reason. The correct way would be to use lm() instead, if possible. This function reports an R² in the summary. In the case of glm, and if you're absolutely sure about what you're doing, you can use one of the approximations that is used when looking at prediction only, realizing very well you can't possibly use R² to compare models with a different number of variables and realizing very well that the R² doesn't mean what you think it does when using a link function : x - 1:100 y - 1:100 + rnorm(100) mod - glm(y~x) # possibility 1 R2 - cor(y,predict(mod))^2 # possibility 2 R2 - 1 - (sum((y-predict(mod))^2)/sum((y-mean(y))^2)) In the case where you use a link function, you should work on the converted data : convert the values of y, and use predict(mod,type=link) for a correct estimate. Cheers Joris On Mon, Jun 21, 2010 at 12:00 AM, elaine kuo elaine.kuo...@gmail.com wrote: Dear, I want to compute coefficient of determination (R-squared) to complement AIC for model selection of multivariable GLM. However, I found this is not a built-in function in glm. neither is it available through reviewing the question in the R-help archive. Please kindly help and thanks a lot. Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata...
On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns pbu...@pburns.seanet.com wrote: (Statistics is what stuffy professors do, I just look at my data and try to figure out what it means.) Often those stuffy professors have a reason to do so. When they want an objective view on the data for example, or an objective measure of the significance of a hypothesis. But you're right, who cares about objectiveness these days? It doesn't sell you a paper, does it? Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing elements of a list over a certain threshold
magic code: example[examplethreshold] -threshold Cheers Joris On Mon, Jun 21, 2010 at 12:34 PM, Jim Hargreaves ja...@ipec.co.uk wrote: Dear List, I have a list of length ~1000 filled with numerics. I need to replace the elements of this list that are above a certain numerical threshold with the value of the threshold. e.g example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) threshold=5 magic code goes here example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1). I have written a crude script that achieves this but it's very slow. Is there a way to do this using some R function? Crude script: http://pastebin.com/3KSfi8nD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing elements of a list over a certain threshold
You shouldn't use sapply/lapply for this but use the indices set.seed(1) r - round(runif(10,1,10)) treshold - 5 head(r) [1] 3 4 6 9 3 9 system.time( r[ rthreshold ] - threshold ) user system elapsed 0 0 0 head(r) [1] 3 4 5 5 3 5 On Mon, Jun 21, 2010 at 2:03 PM, Christos Argyropoulos argch...@hotmail.com wrote: Hi, You should use the sapply/lapply for such operations. r-round(runif(10,1,10)) head(r) [1] 3 7 6 3 2 8 filt-function(x,thres) ifelse(xthres,x,thres) system.time(r2-sapply(r,filt,thres=5)) user system elapsed 3.36 0.00 3.66 head(r2) [1] 3 5 5 3 2 5 To return a list, replace sapply with lapply Christos Date: Mon, 21 Jun 2010 11:34:01 +0100 From: ja...@ipec.co.uk To: r-help@r-project.org Subject: [R] Replacing elements of a list over a certain threshold Dear List, I have a list of length ~1000 filled with numerics. I need to replace the elements of this list that are above a certain numerical threshold with the value of the threshold. e.g example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) threshold=5 magic code goes here example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1). I have written a crude script that achieves this but it's very slow. Is there a way to do this using some R function? Crude script: http://pastebin.com/3KSfi8nD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Hotmail: Free, trusted and rich email service. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.