Re: [R] Random assignment
Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R
On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote: FATAL ERROR: unable to restore saved data in .RDATA Without more information it's hard to know what exactly went wrong. Anyway, the message most likely means that the .RData file got corrupted. Deleting it should solve the problem. Note that this means that you will get an empty workspace and have to recreate whatever data was in it before. I decided to uninstall the copy (a R2.11.0) and installed a new version (2.11.1) but I'm still receiving the same message. When I click OK the closes. Re-installation of R will most likely not fix this (unless a change in the format of the .RData files had occurred - but to my knowledge no such thing has happened, recently.) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R
On Oct 15, 2010, at 12:37 , Philipp Pagel wrote: On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote: FATAL ERROR: unable to restore saved data in .RDATA Without more information it's hard to know what exactly went wrong. Anyway, the message most likely means that the .RData file got corrupted. Deleting it should solve the problem. Note that this means that you will get an empty workspace and have to recreate whatever data was in it before. Renaming the file is a better idea. Then at least you have some chance of being able to load() it into an R Session and recover things from the workspace. No guarantees, though. -pd I decided to uninstall the copy (a R2.11.0) and installed a new version (2.11.1) but I'm still receiving the same message. When I click OK the closes. Re-installation of R will most likely not fix this (unless a change in the format of the .RData files had occurred - but to my knowledge no such thing has happened, recently.) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R
Sometimes such message appears when you try to open .RData file in environment where packages used when the file was created are not installed. Than it is possible just to install necessary packages. Without whole story it is impossible to say what is real cause for that error. Regards Petr r-help-boun...@r-project.org napsal dne 15.10.2010 12:45:30: On Oct 15, 2010, at 12:37 , Philipp Pagel wrote: On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote: FATAL ERROR: unable to restore saved data in .RDATA Without more information it's hard to know what exactly went wrong. Anyway, the message most likely means that the .RData file got corrupted. Deleting it should solve the problem. Note that this means that you will get an empty workspace and have to recreate whatever data was in it before. Renaming the file is a better idea. Then at least you have some chance of being able to load() it into an R Session and recover things from the workspace. No guarantees, though. -pd I decided to uninstall the copy (a R2.11.0) and installed a new version (2.11.1) but I'm still receiving the same message. When I click OK the closes. Re-installation of R will most likely not fix this (unless a change in the format of the .RData files had occurred - but to my knowledge no such thing has happened, recently.) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating 'all' sum contrasts
OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? I could go through and create new columns but I am using this for complex multi-factor experiments with varying levels per factor and fitting the models within other functions (e.g. regsubsets()) so an automatic solution using what is already available would be far preferable. Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi John, The word species attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart anothe...@me.com wrote: Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number of species in that family? John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create Arrays
Hi, Doug, maybe HH - c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23) TT - c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14) columnnumbers - c(0, 0, 0, 3, 0, 0, 0, 2, 0) TMP - lapply( seq( columnnumbers), function( i, CN, M) { if( CN[i] == 0) as.matrix( M[, i]) else matrix( -1, nrow( M), CN[i]) }, CN = columnnumbers, M = rbind( HH, TT)) do.call( cbind, TMP) gets close to what you want (after some adaptation, of course). HTH -- Gerrit - AOR Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eich...@math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi Michael, Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take The paper is Vamosi Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053. and the specific method i am following states :- We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk. I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :). So following your approach - if family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. Family f = Bromeliaceae , with Fn = 80, p=0.0748 random expectation = p*Fn = (0.0748*80) = 5.984 variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968 So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly? So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above? Thanks John On 15 Oct 2010, at 12:49, Michael Bedward wrote: Hi John, The word species attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart anothe...@me.com wrote: Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would
Re: [R] Nonlinear Regression Parameter Shared Across Multiple Data Sets
Looking at the source for nlrob, it looks like it saves the coefficients from the results of running an nls and then passes those coefficients back into the next nls request. The issue that it's running into is that nls returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, rather than just upper and a vector named LOGEC50. Does anyone know a way to restructure the formula/start parameter so that coef returns a vector instead of each element individually? Right now, I've had to 'hack' nlrob so it recombines similarly named elements into a vector, but was wondering if there was a way to accomplish the end goal without those measures. Thanks, Jared On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka evilamaran...@gmail.comwrote: As an addendum to my question, I'm attempting to apply the solution to the robust non-linear regression function nlrob from the robustbase package, and it doesn't work in that situation. I'm getting allRobustFit - nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5))) Error in nls(formula, data = data, start = start, algorithm = algorithm, : parameters without starting value in 'data': LOGEC50 I'm guessing this is because the nlrob function doesn't know what to do with a vector for a start value. Am I correct and is there another method of using nlrob in the same way? Thanks, Jared On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka evilamaran...@gmail.comwrote: Thanks so much! It works great. I had thought the way to do it relied on combining the data sets, but I couldn't figure out how to alter the formula to work with the combination. Jared On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell k.jew...@campden.co.ukwrote: Jared Blashka evilamaran...@gmail.com wrote in message news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com... I'm working with 3 different data sets and applying this non-linear regression formula to each of them. nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers, start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5)) Previously, all of the regressions were calculated in Prism, but I'd like to be able to automate the calculation process in a script, which is why I'm trying to move to R. The issue I'm running into is that previously, in Prism, I was able to calculate a shared value for a constraint so that all three data sets shared the same value, but have other constraints calculated separately. So Prism would figure out what single value for the constraint in question would work best across all three data sets. For my formula, each data set needs it's own LOGEC50 value, but the upper value should be the same across the 3 sets. Is there a way to do this within R, or with a package I'm not aware of, or will I need to write my own nls function to work with multiple data sets, because I've got no idea where to start with that. Thanks, Jared [[alternative HTML version deleted]] An approach which works for me (code below to illustrate principle, not tried...) 1) combine all three data sets into one dataframe with a column (e.g. dset) indicating data set (1, 2 or 3) 2) express your formula with upper as single valued and LOGEC50 as a vector inderxed by dest e.g. Y ~ upper/(1+10^(C-LOGEC50[dset])) 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start for all three LOGEC50 values start = list(start=list(upper=max(std_no_outliers$Y),LOGEC50=c(-8.5, -8.5, -8.5)) Hope that helps, Keith J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scores for a new observation from PCAgrid() in pcaPP
Hi, I a trying to compute scores for a new observation based on previously computed PCA by PCAgrid() function in the pcaPP package. My data has more variables than observations. Here is an imaginary data set to show the case: n.samples-30 n.bins-1000 x.sim-rep(0,n.bins) V.sim-diag(n.bins) mtx-array(dim=c(n.samples,n.bins)) for(i in 1:n.samples) mtx[i,]-mvrnorm(1,x.sim,V.sim) With prcomp() I can do the following: pc.pr2-prcomp(mtx,scale=TRUE) newscr.pr2-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2 $rotation The latter computes the scores for the first row of mtx. I can verify that the scores are the same as computed originally by comparing with pc.pr2$x[1,] # that will print out the scores for the first observation Now, if I tried the same with PCAgrid() as follows: pc.pp2-PCAgrid(mtx,k=min(dim(mtx)),scale=mad) newscr.pp2-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2 $loadings The newscr.pp2 do not match the scores in the pc.pp2 object as can be verified by comparing with: pc.pp2$x[1,] I wonder what I am missing? Or is it so that for the grid method such computation of scores from the loadings and original observations is not possible? For the case pn, i.e. when there are more observations than variables, the scores computed from loadings and the scores from the model object match also for the PCAgrid() method, i.e. the behaviour described above seems to relate to cases where pn. Many thanks for any help, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging and working with BIG data sets. Is sqldf the best way??
On Fri, Oct 15, 2010 at 6:14 AM, Chris Howden ch...@trickysolutions.com.au wrote: Thanks for the advice Gabor, I was indeed not starting and finishing with sqldf(). Which was why it was not working for me. Please forgive a blatantly obvious mistake. I have tried what U suggested and unfortunately R is still having problems doing the join. The problem seems to be one of memory since I am receiving the below error message when I run the natural join using sqldf. Error: cannot allocate vector of size 250.0 Mb Timing stopped at: 32.8 0.48 34.79 Specify an external database so it uses an external, rather than in memory, database. See example 6b (also shown in several other examples) on the home page: sqldf(c(create..., create..., select...), dbname = tempfile()) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hi John, I haven't read that particular paper but in answer to your question... So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above? Yes :) Michael On 15 October 2010 23:18, John Haart anothe...@me.com wrote: Hi Michael, Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take The paper is Vamosi Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053. and the specific method i am following states :- We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk. I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :). So following your approach - if family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. Family f = Bromeliaceae , with Fn = 80, p=0.0748 random expectation = p*Fn = (0.0748*80) = 5.984 variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968 So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly? So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above? Thanks John On 15 Oct 2010, at 12:49, Michael Bedward wrote: Hi John, The word species attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart anothe...@me.com wrote: Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you
Re: [R] fast rowCumsums wanted for calculating the cdf
Although I know there is another message in this thread I am replying to this message to be able to include the whole discussion to this point. Gregor mentioned the possibility of extending the compiled code for cumsum so that it would handle the matrix case. The work by Dirk Eddelbuettel and Romain Francois on developing the Rcpp package make it surprisingly easy to create compiled code for this task. I am copying the Rcpp-devel list on this in case one of the readers of that list has time to create a sample implementation before I can get to it. It's midterm season and I have to tackle the stack of blue books on my desk before doing fun things like writing code. On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi, You might look at Reduce(). It seems faster. I converted the matrix to a list in an incredibly sloppy way (which you should not emulate) because I cannot think of the simple way. probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; add - function(x) {Reduce(`+`, x, accumulate = TRUE)} system.time(F.slow - t(apply(probs, 1, cumsum))) user system elapsed 36.758 0.416 42.464 system.time(for (cc in 2:ncol(F)) { + F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; + }) user system elapsed 0.980 0.196 1.328 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) user system elapsed 0.420 0.072 0.539 .539 seconds for 10 vectors each with 1 million elements does not seem bad. For 3 each, on my system it took .014 seconds, which for 200,000 iterations, I estimated as: (20*.014)/60/60 [1] 0.778 (and this is on a stone age system with a single core processor and only 756MB of RAM) It should not be difficult to get the list back to a matrix. Cheers, Josh On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote: Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregor mailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F - t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this - probably routine - task? Thanks in advance, Gregor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split data with missing data condition
Dear all I have data like this: x y [1,] 59.74889 3.1317081 [2,] 38.77629 1.7102589 [3,] NA 2.2312962 [4,] 32.35268 1.3889621 [5,] 74.01394 1.5361227 [6,] 34.82584 1.1665412 [7,] 42.72262 2.7870875 [8,] 70.54999 3.3917257 [9,] 59.37573 2.6763249 [10,] 68.87422 1.9697770 [11,] 19.00898 2.0584415 [12,] 60.27915 2.5365194 [13,] 50.76850 2.3943836 [14,] NA 2.2862790 [15,] 39.01229 1.7924957 and I want to spit data into two set of data, data set of nonmising and data set of missing. How I can do this. Many Thanks. Jumlong -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating 'all' sum contrasts
G'day Michael, On Fri, 15 Oct 2010 12:09:07 +0100 Michael Hopkins hopk...@upstreamsystems.com wrote: OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 These two contrasts are *not* orthogonal. This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. You have to decide what you want. The contrasts for all pairwise comparaisons between the levels or all possible orthogonal contrasts? The latter is actually not well defined. For a factor with p levels, there would be p orthogonal contrasts, which are only identifiable up to rotation, hence infinitely many such sets. But there are p(p-1) pairwise comparisons. So unless p=2, yo have to decide what you want Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? Look at package multcomp, in particular functions glht and mcp, these might help. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data with missing data condition
you can do the following: mat - cbind(x = runif(15, 50, 70), y = rnorm(15, 2)) mat[sample(15, 2), x] - NA na.x - is.na(mat[, 1]) mat[na.x, ] mat[!na.x, ] I hope it helps. Best, Dimitris On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote: Dear all I have data like this: x y [1,] 59.74889 3.1317081 [2,] 38.77629 1.7102589 [3,] NA 2.2312962 [4,] 32.35268 1.3889621 [5,] 74.01394 1.5361227 [6,] 34.82584 1.1665412 [7,] 42.72262 2.7870875 [8,] 70.54999 3.3917257 [9,] 59.37573 2.6763249 [10,] 68.87422 1.9697770 [11,] 19.00898 2.0584415 [12,] 60.27915 2.5365194 [13,] 50.76850 2.3943836 [14,] NA 2.2862790 [15,] 39.01229 1.7924957 and I want to spit data into two set of data, data set of nonmising and data set of missing. How I can do this. Many Thanks. Jumlong -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data with missing data condition
Try this: a - read.table(textConnection( x y + 59.74889 3.1317081 + 38.77629 1.7102589 +NA 2.2312962 + 32.35268 1.3889621 + 74.01394 1.5361227 + 34.82584 1.1665412 + 42.72262 2.7870875 + 70.54999 3.3917257 + 59.37573 2.6763249 + 68.87422 1.9697770 + 19.00898 2.0584415 + 60.27915 2.5365194 + 50.76850 2.3943836 + NA 2.2862790 + 39.01229 1.7924957), header=TRUE) a - as.matrix(a) # good data a.good - a[complete.cases(a),, drop=FALSE] a.bad - a[!complete.cases(a),, drop=FALSE] a.good xy [1,] 59.74889 3.131708 [2,] 38.77629 1.710259 [3,] 32.35268 1.388962 [4,] 74.01394 1.536123 [5,] 34.82584 1.166541 [6,] 42.72262 2.787088 [7,] 70.54999 3.391726 [8,] 59.37573 2.676325 [9,] 68.87422 1.969777 [10,] 19.00898 2.058441 [11,] 60.27915 2.536519 [12,] 50.76850 2.394384 [13,] 39.01229 1.792496 a.bad xy [1,] NA 2.231296 [2,] NA 2.286279 On Fri, Oct 15, 2010 at 8:45 AM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear all I have data like this: x y [1,] 59.74889 3.1317081 [2,] 38.77629 1.7102589 [3,] NA 2.2312962 [4,] 32.35268 1.3889621 [5,] 74.01394 1.5361227 [6,] 34.82584 1.1665412 [7,] 42.72262 2.7870875 [8,] 70.54999 3.3917257 [9,] 59.37573 2.6763249 [10,] 68.87422 1.9697770 [11,] 19.00898 2.0584415 [12,] 60.27915 2.5365194 [13,] 50.76850 2.3943836 [14,] NA 2.2862790 [15,] 39.01229 1.7924957 and I want to spit data into two set of data, data set of nonmising and data set of missing. How I can do this. Many Thanks. Jumlong -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Color individual leaf labels in dendrogram
Hi, I have performed a clustering of a matrix and plotted the result with pltree. See code below. I want to color the labels of the leafs individually. For example I want the label name Node 2 to be plotted in red. How do I do this? Sincerely Henrik library(cluster) D - matrix(nr=4,nc=4) rownames(D) - c(Node 1,Node 2,Node 3,Node 4) D[1,] - c(0,.6,.1,.7) D[2,] - c(.6,.0,.3,.9) D[3,] - c(.1,.3,0,.9) D[4,] - c(.7,.9,.9,0) C - agnes(D,diss=T,method=complete) pltree(C) -- View this message in context: http://r.789695.n4.nabble.com/Color-individual-leaf-labels-in-dendrogram-tp2996982p2996982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data with missing data condition
Hi r-help-boun...@r-project.org napsal dne 15.10.2010 15:00:46: you can do the following: mat - cbind(x = runif(15, 50, 70), y = rnorm(15, 2)) mat[sample(15, 2), x] - NA na.x - is.na(mat[, 1]) mat[na.x, ] mat[!na.x, ] Or if you have missing data in several columns and you want select only those which are complete use complete.cases mat[complete.cases(mat},] mat[!complete.cases(mat},] Regards Petr I hope it helps. Best, Dimitris On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote: Dear all I have data like this: x y [1,] 59.74889 3.1317081 [2,] 38.77629 1.7102589 [3,] NA 2.2312962 [4,] 32.35268 1.3889621 [5,] 74.01394 1.5361227 [6,] 34.82584 1.1665412 [7,] 42.72262 2.7870875 [8,] 70.54999 3.3917257 [9,] 59.37573 2.6763249 [10,] 68.87422 1.9697770 [11,] 19.00898 2.0584415 [12,] 60.27915 2.5365194 [13,] 50.76850 2.3943836 [14,] NA 2.2862790 [15,] 39.01229 1.7924957 and I want to spit data into two set of data, data set of nonmising and data set of missing. How I can do this. Many Thanks. Jumlong -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rcpp-devel] fast rowCumsums wanted for calculating the cdf
Hi, This would probably deserve some abstraction, we had C++ versions of apply in our TODO list for some time, but here is a shot: require( Rcpp ) require( inline ) f.Rcpp - cxxfunction( signature( x = matrix ), ' NumericMatrix input( x ) ; NumericMatrix output = cloneNumericMatrix( input ) ; int nr = input.nrow(), nc = input.ncol() ; NumericVector tmp( nr ); for( int i=0; inc; i++){ tmp = tmp + input.column(i) ; NumericMatrix::Column target( output, i ) ; std::copy( tmp.begin(), tmp.end(), target.begin() ) ; } return output ; ', plugin = Rcpp ) f.R - function( x ){ t(apply(probs, 1, cumsum)) #SLOOOW! } probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites stopifnot( all.equal( f.R( probs ), f.Rcpp( probs ) ) ) require( rbenchmark ) probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites bm - benchmark( f.R( probs ), f.Rcpp( probs ), columns=c(test, elapsed, relative, user.self, sys.self), order=relative, replications=10) print( bm ) I get this on my iMac with R 2.12.0 and Rcpp as just released to CRAN. $ Rscript cumsum.R test elapsed relative user.self sys.self 2 f.Rcpp(probs) 4.614 1.0 4.3750.239 1f.R(probs) 68.645 14.8775567.7650.877 When we implement apply in C++, we will probably leverage loop unrolling to achieve greater performance. Romain Le 15/10/10 14:34, Douglas Bates a écrit : Although I know there is another message in this thread I am replying to this message to be able to include the whole discussion to this point. Gregor mentioned the possibility of extending the compiled code for cumsum so that it would handle the matrix case. The work by Dirk Eddelbuettel and Romain Francois on developing the Rcpp package make it surprisingly easy to create compiled code for this task. I am copying the Rcpp-devel list on this in case one of the readers of that list has time to create a sample implementation before I can get to it. It's midterm season and I have to tackle the stack of blue books on my desk before doing fun things like writing code. On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wileyjwiley.ps...@gmail.com wrote: Hi, You might look at Reduce(). It seems faster. I converted the matrix to a list in an incredibly sloppy way (which you should not emulate) because I cannot think of the simple way. probs- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites F- matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1]- probs[,1,drop=TRUE]; add- function(x) {Reduce(`+`, x, accumulate = TRUE)} system.time(F.slow- t(apply(probs, 1, cumsum))) user system elapsed 36.758 0.416 42.464 system.time(for (cc in 2:ncol(F)) { + F[,cc]- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; + }) user system elapsed 0.980 0.196 1.328 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) user system elapsed 0.420 0.072 0.539 .539 seconds for 10 vectors each with 1 million elements does not seem bad. For 3 each, on my system it took .014 seconds, which for 200,000 iterations, I estimated as: (20*.014)/60/60 [1] 0.778 (and this is on a stone age system with a single core processor and only 756MB of RAM) It should not be difficult to get the list back to a matrix. Cheers, Josh On Thu, Oct 14, 2010 at 11:51 PM, Gregormailingl...@gmx.at wrote: Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregormailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F- t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F- matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1]- probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc]- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this
Re: [R] interaction contrasts
..by some (extensive) trial and error reordering the contrast matrix and the reference level i figured it out myself - for anyone who might find this helpful searching for a similar contrast in the future: this should be the right one: c2-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0), fac2-effect in B=c(0,1,0,0,0,1,0,0), fac2-effect in C=c(0,1,0,0,0,0,1,0), fac2-effect in D=c(0,1,0,0,0,0,0,1), fac2-effect, A*B=c(0,0,0,0,0,1,0,0), fac2-effect, A*C=c(0,0,0,0,0,0,1,0), fac2-effect, A*D=c(0,0,0,0,0,0,0,1), fac2-effect, B*C=c(0,0,0,0,0,-1,1,0), fac2-effect, B*D=c(0,0,0,0,0,-1,0,1), fac2-effect, C*D=c(0,0,0,0,0,0,-1,1)) summary(glht(mod,c2)) Kay Cichini wrote: hello, i was shortly asking the list for help with some interaction contrasts (see below) for which i had to change the reference level of the model on the fly (i read a post that this is possible in multcomp). if someone has a clue how this is coded in multcomp; glht() - please point me there. yours, kay Kay Cichini wrote: hello list, i'd very much appreciate help with setting up the contrast for a 2-factorial crossed design. here is a toy example: library(multcomp) dat-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]), fac2=rep(c(I,II),16),y=rnorm(32,1,1)) mod-lm(y~fac1*fac2,data=dat) ## the contrasts i'm interressted in: c1-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0), fac2-effect in B=c(0,1,0,0,0,1,0,0), fac2-effect in C=c(0,1,0,0,0,0,1,0), fac2-effect in D=c(0,1,0,0,0,0,0,1), fac2-effect, A*B=c(0,0,0,0,0,1,0,0), fac2-effect, A*C=c(0,0,0,0,0,0,1,0), fac2-effect, A*D=c(0,0,0,0,0,0,0,1)) summary(glht(mod,c1)) ## now i want to add the remaining combinations ## fac2, B*C ## fac2, B*D ## fac2, C*D ## to the simultanous tests to see whether the effects ## of fac2 within the levels of fac1 differ between ## each combination of the levels of fac1, or not ?? thanks for any advise! yours, kay - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in DGEList function
Hi! I am a new R user and have no clue of this error (see below) while using edgeR package: Y - clade_reads y - Y[,c(g1,g2)] grouping - c( rep(1,length(g1)), rep(2,length(g2)) ) size - apply(y, 2, sum) d - DGEList(data = y, group = grouping, lib.size = size) Error in DGEList(data = y, group = grouping, lib.size = size) : unused argument(s) (data = y) And g1 and g2 are two defined groups. Could anyone kindly interpret this? Many thanks! mikecrux __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random assignment
Hello again John, I was going to suggest that you just use qbinom to generate the expected number of extinctions. For example, for the family with 80 spp the central 95% expectation is: qbinom(c(0.025, 0.975), 80, 0.0748) which gives 2 - 11 spp. If you wanted to do look across a large number of families you'd need to deal with multiple comparison error but as a quick first look it might be helpful. However, I've just got a copy of teh paper and it seems that the authors are calculating something different to a simple binomial expecation: they are differentiating between high-risk (red listed) and low-risk species within a family. They state that this equation (expressed here in R-ese)... choose(N, R) * p^R * b^(N - R) ...gives the probabilitiy of an entire family becoming extinct, where N is number of spp in family; R is number of those that are red listed; p is extinction probability for red list spp (presumably over some period but I haven't read the paper properly yet); b is extinction probability for other spp. Then, in their simulations they hold b constant but play around with a range of values for p. So this sounds a bit different to what you originally posted as your objective (?) Michael On 15 October 2010 22:49, Michael Bedward michael.bedw...@gmail.com wrote: Hi John, The word species attracted my attention :) Like Dennis, I'm not sure I understand your idea properly. In particular, I don't see what you need the simulation for. If family F has Fn species, your random expectation is that p * Fn of them will be at risk (p = 0.0748). The variance on that expectation will be p * (1-p) * Fn. If you do your simulation that's the result you'll get. Perhaps to initial identify families with disproportionate observed extinction rates all you need is the dbinom function ? Michael On 15 October 2010 22:29, John Haart anothe...@me.com wrote: Hi Denis and list Thanks for this , and sorry for not providing enough information First let me put the study into a bit more context : - I know the number of species at risk in each family, what i am asking is Is risk random according to family or do certain families have a disproportionate number of at risk species? My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the true data and see if there was a significant difference. So in answer to your questions, (assuming my method is correct !) Is this over all families, or within a particular family? If the former, why does a distinction of family matter? Within a particular family - this is because i am looking to see if risk in the observed data set is random in respect to family so this will provide the baseline to compare against. I guess you've stated the p, but what's the n? The number of species in each family? This varies largely, for instance i have some families that are monotypic (with 1 species) and then i have other families with 100+ species Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk. Thanks John On 15 Oct 2010, at 11:19, Dennis Murphy wrote: Hi: I don't believe you've provided quite enough information just yet... On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote: Dear List, I am doing some simulation in R and need basic help! I have a list of animal families for which i know the number of species in each family. I am working under the assumption that a species has a 7.48% chance of being at risk. Is this over all families, or within a particular family? If the former, why does a distinction of family matter? I want to simulate the number of species expected to be at risk under a random binomial distribution with 10,000 randomizations. I guess you've stated the p, but what's the n? The number of species in each family? If you're simulating on a family by family basis, then it would seem that a binomial(nspecies, 0.0748) distribution would be the reference. Assuming you have multiple families, do you want separate simulations per family, or do you want to do some sort of weighting (perhaps proportional to size) over all families? The latter is doable, but it would require a two-stage simulation: one to randomly select a family and then to randomly select a species. Dennis I am relatively knew to this field and would greatly appreciate a idiot-proof response, I.e how should the data be entered into R? I was thinking of using read.table, header = T, where the table has F = Family Name, and SP = Number
Re: [R] Error in DGEList function
On 10/15/2010 06:17 AM, Ying Ye wrote: Hi! I am a new R user and have no clue of this error (see below) while using edgeR package: edgeR is a Bioconductor pacakge so please subscribe to the Bioconductor list and ask there. http://bioconductor.org/help/mailing-list/ include the output of the sessionInfo() command after library(edgeR) Martin Y - clade_reads y - Y[,c(g1,g2)] grouping - c( rep(1,length(g1)), rep(2,length(g2)) ) size - apply(y, 2, sum) d - DGEList(data = y, group = grouping, lib.size = size) Error in DGEList(data = y, group = grouping, lib.size = size) : unused argument(s) (data = y) And g1 and g2 are two defined groups. Could anyone kindly interpret this? Many thanks! mikecrux __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] p-values from coxph?
Dear List, I each iteration of a simulation study, I would like to save the p-value generated by coxph. I fail to see how to adress the p-value. Do I have to calculate it myself from the Wald Test statistic? Cheers, Paddy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Color individual leaf labels in dendrogram
Henrik, there is an easily adaptable example in this thread: http://r.789695.n4.nabble.com/coloring-leaves-in-a-hclust-or-dendrogram-plot -tt795496.html#a795497 HTH. Bryan * Bryan Hanson Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA On 10/15/10 9:05 AM, Kennedy henrik.aldb...@gmail.com wrote: Hi, I have performed a clustering of a matrix and plotted the result with pltree. See code below. I want to color the labels of the leafs individually. For example I want the label name Node 2 to be plotted in red. How do I do this? Sincerely Henrik library(cluster) D - matrix(nr=4,nc=4) rownames(D) - c(Node 1,Node 2,Node 3,Node 4) D[1,] - c(0,.6,.1,.7) D[2,] - c(.6,.0,.3,.9) D[3,] - c(.1,.3,0,.9) D[4,] - c(.7,.9,.9,0) C - agnes(D,diss=T,method=complete) pltree(C) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create Arrays
Hi Gerrit, Almost it but I need to insert M[,i] as well as (matrix( -1, nrow( M), CN[i]) when CN[i] = 0 I know this is not correct but can something like the following be done? HH - c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23) TT - c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14) c - c(0, 0, 0, 2, 0, 0, 0, 2, 0) TMP - lapply( seq(c), function( i, CN, M) { if( CN[i] == 0) as.matrix( M[, i]) else (matrix( -1, nrow( M), CN[i]) as.matrix( M[, i])) }, CN = c, M = rbind( HH, TT)) do.call( cbind, TMP) Doug -- View this message in context: http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2997060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Many Thanks
Dear R-help mailing list and software development team. After I have used R a few weeks, I was exposed to the best of the program. In addition, the R-help mailing list a great assist new users. I do my job as I want and get great support from the R-help mailing list. Thanks R-help mailing list. Thank software development team. Jumlong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Many Thanks
Dear R-help mailing list and software development team. After I have used R a few weeks, I was exposed to the best of the program. In addition, the R-help mailing list a great assist new users. I do my job as I want and get great support from the R-help mailing list. Thanks R-help mailing list. Thanks software development team. Jumlong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boxplot issue
Hi! I am trying to produce a graph which shows overlap in latitude for a number of species. I have a dataframe which looks as follows species1,species2,species3,species4. minlat 6147947,612352,627241,6112791 maxlat 7542842,723423,745329,7634921 I want to produce a graph with one horizontal bar for each species where minlat sets minimum value and maxlat sets maximum value for the bar. I want all bars to be stacked on top of eachother to show where I have overlap. I have tried using boxplot but it only produces the standard mean and sd of the two values... Thanks! Jonas Josefsson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Debian][NetBeans] setting NetBeans to work with JRI libraries
I am having hard time properly setting NetBeans to work with JRI libs (http://rosuda.org/JRI/). Most of the instructions I have found so far are written for Eclipse or Windows (or both). I have set java.library.path variable in config: customize:VM arguments field, by specifying -Djava.library.path=/home/filip/Pobrane/JRI/. When I run the application I get the following: Cannot find JRI native library! Please make sure that the JRI native library is in a directory listed in java.library.path. java.lang.UnsatisfiedLinkError: /home/fb/Pobrane/JRI/libjri.so: libR.so: cannot open shared object file: No such file or directory (...) So I ran: locate libR with the following result: /usr/local/lib/R/lib/libRblas.so /usr/local/lib/R/lib/libRlapack.so However this command: ls /usr/local/lib/R/lib/ lists: libRblas.so libRlapack.so libR.so I have tried various customisations to the config, copying the libraries and other desperate moves. I can properly compile and run the rtest.java and rtest2.java under the examples section of my JRI installation via the ./run script. It would seem that NetBeans has a problem loading libraries, that reference to other libraries (libjri is loaded, but not libR). I have compiled R 2.12.0 with --with-blas=-lgoto2 --enable-BLAS-shlib --enable-R-shlib (shared libraries, shared BLAS). Could the BLAS be a problem? -- while(!succeed) { try(); } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tessellation from biological data in spatstat
Hi, I'm new to this mailing list so apologies if this is too basic. I have confocal images 512x512 from which I have extracted x,y positions of the coordiates of labelled cells exported from ImageJ as a.csv file. I also have images that define an underlying pattern in the tissue defined as areas of different pixel values 0 or 255 (also 512x512) I've exported these images as .txt files. I'd like to use spatstat to look at how the individual cells plot onto these patterns. #I can import my x,y data and do basic stats with spatstat by doing the following: library(spatstat) data01 - read.csv(250810.csv, skip=1) x - data01[[2]] y - data01[[3]] data02 - ppp(x, y, xrange=c(0,512), yrange=c(0,512)) unitname(data02) - c(pixel, pixels) plot(data02) #etc etc #I can also import my text images and convert them to a tessellation using the following: a - read.table(FOLLICLES.txt) win - owin(c(0,512), c(0,512)) unitname(win) - c(pixel, pixels) b - as.matrix(a, xrange=c(0,512), yrange=c(0,512)) unitname(b) - c(pixel, pixels) c - im(b, xcol=seq(ncol(b)), yrow=seq(nrow(b)), unitname=pixels) d - tess(image = c) plot(d) #I can then plot my ppp pattern on top of my tessellation using: plot(d) plot(data02, add=TRUE) #So far so good, but when I run for example: qb - quadratcount(data02, tess = d) #I get the following error: Warning message: In quadratcount.ppp(data02, tess = d) : Tessellation does not contain all the points of X #When I investigate further the following is returned for each object: d Tessellation Tessellation is determined by a factor-valued image with 2 levels window: binary image mask 512 x 512 pixel array (ny, nx) enclosing rectangle: [0.5, 512.5] x [0.5, 512.5] pixels data02 planar point pattern: 254 points window: rectangle = [0, 512] x [0, 512] units #I don't understand why my tessellation is a different size but even taking this into account my ppp should plot inside the window and does when i do plot(data02, add=TRUE). I've spent ages playing around with redefining the windows but I must be missing something (probably obvious). Any help would be appreciated, I can provide files. Kind regards Richard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract parameter estimates of variance function from lme fit
Dear R-Users, I have a question concerning extraction of parameter estimates of variance function from lme fit. To fit my simulated data, we use varConstPower ( constant plus power variance function). fm-lme(UPDRS~time,data=data.simula,random=~time,method=ML,weights=varConstPower(fixed=list(power=1))) I extract the results of this function by using the following codes: y-summary(fm) x-y$modelStruct$varStruct x Variance function structure of class varConstPower representing constpower 1.184535e-15 1.00e+00 These are the constants and power that apppear in the summary(fm), and that I want to extract. Hower I got different value of const when extracting from x: x$const const -34.36943 Does anyone know why there is such difference and how to extract the expected value (1.184535e-15) ? Thanks in advance for your help. Thu --- THAI Hoai Thu INSERM U738 - Université Paris 7 16 rue Henri Huchard 75018 Paris, FRANCE Tel: 01 57 27 75 39 Email: hoai-thu.t...@inserm.fr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot issue
A) you hijacked another thread. On Oct 15, 2010, at 9:50 AM, Jonas Josefsson wrote: Hi! I am trying to produce a graph which shows overlap in latitude for a number of species. I have a dataframe which looks as follows species1,species2,species3,species4. minlat 6147947,612352,627241,6112791 maxlat 7542842,723423,745329,7634921 B) This looks like a data input snafu. It appears that the commas in the original data were not properly specified as delimiters. I want to produce a graph with one horizontal bar for each species where minlat sets minimum value and maxlat sets maximum value for the bar. I want all bars to be stacked on top of eachother to show where I have overlap. I have tried using boxplot but it only produces the standard mean and sd of the two values... C) boxplot is designed to handle raw data and then pass plotting responsibility off to the bxp function. Once you address your data input issues you can use bxp. -- David Thanks! Jonas Josefsson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract parameter estimates of variance function from lme fit
Try this: coef(fm$modelStruct$varStruct, uncons = FALSE) On Fri, Oct 15, 2010 at 11:42 AM, Hoai Thu Thai hoai-thu.t...@inserm.frwrote: Dear R-Users, I have a question concerning extraction of parameter estimates of variance function from lme fit. To fit my simulated data, we use varConstPower ( constant plus power variance function). fm-lme(UPDRS~time,data=data.simula,random=~time,method=ML,weights=varConstPower(fixed=list(power=1))) I extract the results of this function by using the following codes: y-summary(fm) x-y$modelStruct$varStruct x Variance function structure of class varConstPower representing constpower 1.184535e-15 1.00e+00 These are the constants and power that apppear in the summary(fm), and that I want to extract. Hower I got different value of const when extracting from x: x$const const -34.36943 Does anyone know why there is such difference and how to extract the expected value (1.184535e-15) ? Thanks in advance for your help. Thu --- THAI Hoai Thu INSERM U738 - Université Paris 7 16 rue Henri Huchard 75018 Paris, FRANCE Tel: 01 57 27 75 39 Email: hoai-thu.t...@inserm.fr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract parameter estimates of variance function from lme fit
Thank you Henrique!! It works. Thu Le 15/10/2010 16:53, Henrique Dallazuanna a écrit : coef(fm$modelStruct$varStruct, uncons = FALSE) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-values from coxph?
On Oct 15, 2010, at 9:21 AM, Öhagen Patrik wrote: Dear List, I each iteration of a simulation study, I would like to save the p- value generated by coxph. I fail to see how to adress the p-value. Do I have to calculate it myself from the Wald Test statistic? No. And the most important reason is that would not give you the same value as is print()-ed by coxph(). If you ask for the the str(print(coxph(...)) you get NULL (after the side-effect of prinitng. The print function only produces side- effects. On the other hand you can use the summary function and it gives you a richer set of output. Using the first example on the help page for coxph: str(summary(coxph(Surv(time, status) ~ x + strata(sex), test1))) List of 12 $ call: language coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1) $ fail: NULL $ na.action : NULL $ n : int 7 $ loglik : num [1:2] -3.87 -3.33 $ coefficients: num [1, 1:5] 0.802 2.231 0.822 0.976 0.329 ..- attr(*, dimnames)=List of 2 .. ..$ : chr x .. ..$ : chr [1:5] coef exp(coef) se(coef) z ... $ conf.int: num [1, 1:4] 2.231 0.448 0.445 11.18 ..- attr(*, dimnames)=List of 2 .. ..$ : chr x .. ..$ : chr [1:4] exp(coef) exp(-coef) lower .95 upper .95 $ logtest : Named num [1:3] 1.087 1 0.297 ..- attr(*, names)= chr [1:3] test df pvalue $ sctest : Named num [1:3] 1.051 1 0.305 ..- attr(*, names)= chr [1:3] test df pvalue $ rsq : Named num [1:2] 0.144 0.669 ..- attr(*, names)= chr [1:2] rsq maxrsq $ waldtest: Named num [1:3] 0.95 1 0.329 ..- attr(*, names)= chr [1:3] test df pvalue $ used.robust : logi FALSE So the fifth element of coefficients leaf of the list structure has the same p-value as that print()-ed. Try: summary(fit)$coefficients[5] [1] 0.3292583 (It does seem to me that the name for that leaf of the fit object is not particularly in accord with what I would have considered coefficients., but I am really in no solid position to criticize Terry Therneau to whom we all owe a great deal of gratitude.) -- David. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Many Thanks
And thank YOU for taking the time to express your gratitude. I'm sure all those who regularly take the time to contribute to the list appreciate the appreciation. Andrew Miles On Oct 15, 2010, at 9:49 AM, Jumlong Vongprasert wrote: Dear R-help mailing list and software development team. After I have used R a few weeks, I was exposed to the best of the program. In addition, the R-help mailing list a great assist new users. I do my job as I want and get great support from the R-help mailing list. Thanks R-help mailing list. Thanks software development team. Jumlong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interaction contrasts
Should you need to do it again, you may want to look at the relevel function. I suppose that would meet the definition of some versions of on the fly but once I have a model, rerunning with a different factor leveling is generally pretty painless. -- David. On Oct 15, 2010, at 9:09 AM, Kay Cichini wrote: ..by some (extensive) trial and error reordering the contrast matrix and the reference level i figured it out myself - for anyone who might find this helpful searching for a similar contrast in the future: this should be the right one: c2-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0), fac2-effect in B=c(0,1,0,0,0,1,0,0), fac2-effect in C=c(0,1,0,0,0,0,1,0), fac2-effect in D=c(0,1,0,0,0,0,0,1), fac2-effect, A*B=c(0,0,0,0,0,1,0,0), fac2-effect, A*C=c(0,0,0,0,0,0,1,0), fac2-effect, A*D=c(0,0,0,0,0,0,0,1), fac2-effect, B*C=c(0,0,0,0,0,-1,1,0), fac2-effect, B*D=c(0,0,0,0,0,-1,0,1), fac2-effect, C*D=c(0,0,0,0,0,0,-1,1)) summary(glht(mod,c2)) Kay Cichini wrote: hello, i was shortly asking the list for help with some interaction contrasts (see below) for which i had to change the reference level of the model on the fly (i read a post that this is possible in multcomp). if someone has a clue how this is coded in multcomp; glht() - please point me there. yours, kay Kay Cichini wrote: hello list, i'd very much appreciate help with setting up the contrast for a 2-factorial crossed design. here is a toy example: library(multcomp) dat-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]), fac2=rep(c(I,II),16),y=rnorm(32,1,1)) mod-lm(y~fac1*fac2,data=dat) ## the contrasts i'm interressted in: c1-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0), fac2-effect in B=c(0,1,0,0,0,1,0,0), fac2-effect in C=c(0,1,0,0,0,0,1,0), fac2-effect in D=c(0,1,0,0,0,0,0,1), fac2-effect, A*B=c(0,0,0,0,0,1,0,0), fac2-effect, A*C=c(0,0,0,0,0,0,1,0), fac2-effect, A*D=c(0,0,0,0,0,0,0,1)) summary(glht(mod,c1)) ## now i want to add the remaining combinations ## fac2, B*C ## fac2, B*D ## fac2, C*D ## to the simultanous tests to see whether the effects ## of fac2 within the levels of fac1 differ between ## each combination of the levels of fac1, or not ?? thanks for any advise! yours, kay - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear Regression Parameter Shared Across Multiple DataSets
Hi, I've had to do something like that before. It seems to be a feature of nls (in R, but not as I recall in Splus) that it accepts a list with vector components as 'start' values, but flattens the result values to a single vector. I can't spend much time explaining, but here's a fragment of code that might get you started: --- # Fit the nls, correct coef names lost by nls val - nls(formula=formulaIn, data=DataList, start= tCoefs, control=control) CoefList - list() # initialise CoefList for(aName in names(tCoefs)) {# for each vector of coefficients tvec - get(aName, envir=val$m$getEnv()) # get it from the nls environment names(tvec) - names(tCoefs[[aName]]) # correct its names assign(aName, tvec, envir=val$m$getEnv()) # return it to nls CoefList[[aName]] - tvec# store in CoefList } As I recall, tCoefs is a list of starting values that can have vector components CoefList ends up as a similar structure and named list of result values hth Keith J Jared Blashka evilamaran...@gmail.com wrote in message news:aanlktimprowm-mne9n_bu8ty=jylitq4ewhpph1hy...@mail.gmail.com... Looking at the source for nlrob, it looks like it saves the coefficients from the results of running an nls and then passes those coefficients back into the next nls request. The issue that it's running into is that nls returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, rather than just upper and a vector named LOGEC50. Does anyone know a way to restructure the formula/start parameter so that coef returns a vector instead of each element individually? Right now, I've had to 'hack' nlrob so it recombines similarly named elements into a vector, but was wondering if there was a way to accomplish the end goal without those measures. Thanks, Jared On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka evilamaran...@gmail.comwrote: As an addendum to my question, I'm attempting to apply the solution to the robust non-linear regression function nlrob from the robustbase package, and it doesn't work in that situation. I'm getting allRobustFit - nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5))) Error in nls(formula, data = data, start = start, algorithm = algorithm, : parameters without starting value in 'data': LOGEC50 I'm guessing this is because the nlrob function doesn't know what to do with a vector for a start value. Am I correct and is there another method of using nlrob in the same way? Thanks, Jared On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka evilamaran...@gmail.comwrote: Thanks so much! It works great. I had thought the way to do it relied on combining the data sets, but I couldn't figure out how to alter the formula to work with the combination. Jared On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell k.jew...@campden.co.ukwrote: Jared Blashka evilamaran...@gmail.com wrote in message news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com... I'm working with 3 different data sets and applying this non-linear regression formula to each of them. nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers, start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5)) Previously, all of the regressions were calculated in Prism, but I'd like to be able to automate the calculation process in a script, which is why I'm trying to move to R. The issue I'm running into is that previously, in Prism, I was able to calculate a shared value for a constraint so that all three data sets shared the same value, but have other constraints calculated separately. So Prism would figure out what single value for the constraint in question would work best across all three data sets. For my formula, each data set needs it's own LOGEC50 value, but the upper value should be the same across the 3 sets. Is there a way to do this within R, or with a package I'm not aware of, or will I need to write my own nls function to work with multiple data sets, because I've got no idea where to start with that. Thanks, Jared [[alternative HTML version deleted]] An approach which works for me (code below to illustrate principle, not tried...) 1) combine all three data sets into one dataframe with a column (e.g. dset) indicating data set (1, 2 or 3) 2) express your formula with upper as single valued and LOGEC50 as a vector inderxed by dest e.g. Y ~ upper/(1+10^(C-LOGEC50[dset])) 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start for all three LOGEC50 values start = list(start=list(upper=max(std_no_outliers$Y),LOGEC50=c(-8.5, -8.5, -8.5)) Hope that helps, Keith J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] specify data frame by name
Also look at the get function, it may be a bit more straight forward (and safer if there is any risk of someone specifying 'rm(ls())' as a data frame name). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of darckeen Sent: Thursday, October 14, 2010 11:53 PM To: r-help@r-project.org Subject: Re: [R] specify data frame by name nvm, i figured it out. dfm - data.frame(x=1:10) testfunc - function(data=dfm) { dat - eval.parent(as.symbol(data)) sum(dat$x) } print(testfunc()) -- View this message in context: http://r.789695.n4.nabble.com/specify- data-frame-by-name-tp2996534p2996541.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nominal response model
Is there a way to estimate a nominal response model? To be more specific let's say I want to calibrate: \pi_{v}(\theta_j)=\frac{e^{\xi_{v}+\lambda_{v}\theta_j}}{\sum_{h=1}^m e^{\xi_{h}+\lambda_{h}\theta_j}} Where $\theta_j$ is a the dependent variable and I need to estimate $\xi_{h}$ and $\lambda_{h}$ for $h \in {1...,m}$. Thank you, Mauricio Romero Quantil S.A.S. Cel: 3112231150 www.quantil.com.co It is from the earth that we must find our substance; it is on the earth that we must find solutions to the problems that promise to destroy all life here [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with an unbalanced split plot
Hi Dennis, The first thing I did with my data was to explore it with 6 graphs (wet-high, med, and solo-; dry-high, med, and solo-) and gave me very interesting patterns: seed size in wet treatments is either negatively correlated (high and medium densities) or flat (solo). But dry treatments are all positively correlated! There is a very interesting switch there. I also figured out why I can't do three way interactions. I explored the structure of my data with str(mydata) and it shows that water treatment has three levels when it should have just two. Then I went back to the excel sheet, tried to sort the data by water treatment and I discover a single data point from the wet treatment sticking out by itself. That is why R reads three levels and since it is only one point, there cannot be any stats of course. thanks E On Thu, Oct 14, 2010 at 9:27 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Thu, Oct 14, 2010 at 7:50 PM, Eugenio Larios elari...@email.arizona.edu wrote: Hi Dennis, thank you very much for your help, I really appreciate it. I forgot to say about the imbalance, yes. I only explained the original set up, sorry. Let me explain. It is because in the process of the experiment which lasted 3 months I lost individuals within the plots and I actually ended up losing 2 whole plots (one dry and one wet) and some other individuals in other plots. That still leaves you balanced at the plot level :) Fortunately, you have enough replication. If you have missing subplots within the remaining plots, that would be another source of imbalance at the subplot level, but you should have enough subplots to be able to estimate all of the interactions unless an entire treatment in one set of plots was missing. It's worth graphing your data to anticipate which effects/interactions should be significant; graphs involving the spatial configuration of the plots and subplots would also be worthwhile. My study system has this special feature that allows me to track parental seed sizes in plants germinated in the field, a persistent ring that stays attached to the root even when the plant has germinated, so some of the plants I lost did not have this ring anymore. It happens sometimes but most of the time they have it. Also, some plants disappeared probably due to predation, etc That made my experiment imbalanced. That's common. No big deal. Do you think that will change the analysis? Also, do you think I should use least squares ANOVA (perhaps type III due to the imbalance?) instead of LMM? What about the random effects that my blocking has created? Actually, with unbalanced data it's to your advantage to use lme() over ANOVA. Just don't place too much importance on the p-values of tests; even the degrees of freedom are debatable. With unbalanced data, it's hard to predict what the sampling distribution of a given statistic will actually be, so the p-values aren't as trustworthy. You mentioned that you couldn't fit a three-way interaction; given your data configuration, that shouldn't happen. (1) Get two-way tables of water * density, one for the counts and one for the averages, something like with(mydata, table(water, density)) aggregate(log(fitness) ~ water + density, data = mydata, FUN = mean, na.rm = TRUE) In the first table, unless you have very low frequencies in some category, your data 'density' should be enough to estimate all the main effects and interactions of interest. The second table is to check that you don't have NaNs or missing cells, etc. I am new to R-help website so I wrote you this message to your email but I would like to post it on the R website, do you know how? Wag answer: I hope so, since I managed to view and respond to your message :) More seriously, in gmail, the window that opens to produce replies has an option 'Reply to all'. I don't know if your e-mail client at UofA has that feature, but if not, you could always cc R-help and put the e-mail address in by hand if necessary. Most mailers are smart enough to auto-complete an address as you type in the name, so you could see if that applies on your system. I keep a separate account for R-help because of the traffic volume - if you intend to subscribe to the list, you might want to do the same. It's not unusual for 75-100 e-mails a weekday to enter your inbox... Thanks again! Eugenio On Thu, Oct 14, 2010 at 5:34 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Thu, Oct 14, 2010 at 3:58 PM, Eugenio Larios elari...@email.arizona.edu wrote: Hi Everyone, I am trying to analyze a split plot experiment in the field that was arranged like this: I am trying to measure the fitness consequences of seed size. Factors (X): *Seed size*: a continuous variable, normally distributed. *Water*: Categorical Levels- wet and dry. *Density*: Categorical Levels- high, medium and solo *Plot*: Counts from 1 to 20 The *response variable *(Y) was
Re: [R] creating 'all' sum contrasts
On 15 Oct 2010, at 13:55, Berwin A Turlach wrote: G'day Michael, Hi Berwin Thanks for the reply On Fri, 15 Oct 2010 12:09:07 +0100 Michael Hopkins hopk...@upstreamsystems.com wrote: OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 These two contrasts are *not* orthogonal. I'm surprised. Can you please tell me how you calculated that. This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. You have to decide what you want. The contrasts for all pairwise comparaisons between the levels or all possible orthogonal contrasts? Well the pairwise contrasts are the most important as I am looking for evidence of whether they are zero (i.e. no difference between levels) or not. But see my above comment about orthogonality. The latter is actually not well defined. For a factor with p levels, there would be p orthogonal contrasts, which are only identifiable up to rotation, hence infinitely many such sets. But there are p(p-1) pairwise comparisons. So unless p=2, yo have to decide what you want Well of course the pairwise comparisons are bi-directional so in fact only p(p-1)/2 are of interest to me. Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? Look at package multcomp, in particular functions glht and mcp, these might help. Thanks I will have a look. But I want to be able to do this transparently within lm() using regsubsets() etc as I am collecting large quantities of summary stats from all possible models to use with a model choice criterion based upon true Bayesian model probabilities. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast rowCumsums wanted for calculating the cdf
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joshua Wiley Sent: Friday, October 15, 2010 12:23 AM To: Gregor Cc: r-help@r-project.org Subject: Re: [R] fast rowCumsums wanted for calculating the cdf Hi, You might look at Reduce(). It seems faster. I converted the matrix to a list in an incredibly sloppy way (which you should not emulate) because I cannot think of the simple way. probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; add - function(x) {Reduce(`+`, x, accumulate = TRUE)} system.time(F.slow - t(apply(probs, 1, cumsum))) user system elapsed 36.758 0.416 42.464 system.time(for (cc in 2:ncol(F)) { + F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; + }) user system elapsed 0.980 0.196 1.328 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) user system elapsed 0.420 0.072 0.539 One way to avoid the typing of list(probs[,1], probs[,2], ...) is to use split(probs, col(probs)). However, split() is surprisingly slow in this case. Another approach is to use a matrix multiply, by a ncol(probs) by ncol(probs) matrix with 0's in lower triangle and 1's elsewhere. Here are 4 approaches I've seen mentioned, made into functions that output the matrix requested: f1 - function (x) t(apply(x, 1, cumsum)) f2 - function (x){ if (ncol(x) 1) for (i in 2:ncol(x)) x[, i] - x[, i] + x[, i - 1] x } f3 - function (x) matrix(unlist(Reduce(`+`, split(x, col(x)), accumulate = TRUE)), ncol = ncol(x)) f4 - function (x) x %*% outer(seq_len(ncol(x)), seq_len(ncol(x)), =) I tested it as follows probs - matrix(runif(1e7), ncol=10, nrow=1e6) system.time(v1 - f1(probs)) user system elapsed 19.450.25 16.78 system.time(v2 - f2(probs)) user system elapsed 0.680.030.79 system.time(v3 - f3(probs)) user system elapsed 38.250.24 34.47 system.time(v4 - f4(probs)) user system elapsed 0.700.050.56 identical(v1,v2) identical(v1,v3) identical(v1,v4) [1] TRUE If you have a fancy optimized/multithreaded BLAS linked to your version of R there you may find that %*% is much faster (I'm using the precompiled R for Windows). As ncol(x) gets bigger the %*% approach probably loses its advantage. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com .539 seconds for 10 vectors each with 1 million elements does not seem bad. For 3 each, on my system it took .014 seconds, which for 200,000 iterations, I estimated as: (20*.014)/60/60 [1] 0.778 (and this is on a stone age system with a single core processor and only 756MB of RAM) It should not be difficult to get the list back to a matrix. Cheers, Josh On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote: Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregor mailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F - t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this - probably routine - task? Thanks in advance, Gregor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multi-line graph?
Hi, I am relatively new to R but not to graphing, which I used to do in Excel and a few other environments on the job. I'm going back to school for a PhD and am teaching myself R beforehand. So I hope this question is not unacceptably ignorant but I have perused every entry level document I can find and so far I'm out of luck. I'm trying to graph some simple music psychology data. Columns are musical intervals, rows are the initials of the subjects. Numbers are in beats per minute (this is the value at which they hear the melodic interval split into two streams). So here's my table: Tenth Fifth Third GG 112152 168 EC 100120 140 SQ 160 184NA SK 120 100 180 I want a multi-line graph where the intervals are on the X axis, and the y axis is the beats per minute, and each subject has a different line. In Excel this would be no problem but I am having trouble in R. The only way I can figure out how to plot this in R is if the columns or rows are taken as variables. But the variable is beats per minute. Any suggestions? I appreciate the help. -Eric -- View this message in context: http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multi-line graph?
barnhillec wrote: I'm trying to graph some simple music psychology data. Columns are musical intervals, rows are the initials of the subjects. Numbers are in beats per minute (this is the value at which they hear the melodic interval split into two streams). So here's my table: Tenth Fifth Third GG 112152 168 EC 100120 140 SQ 160 184NA SK 120 100 180 I want a multi-line graph where the intervals are on the X axis, and the y axis is the beats per minute, and each subject has a different line. The most difficult part of it (at least that's what my students think) is getting the data into the right format. If you have the changes to start with the data in the long format, use it. What you need is: init interval beats GC Tenth112 In this case, reformatting almost works with the default version of the melt function in package reshape. It's good that you supplied a data example, but in general it is better if you could provide it in a copy-and-paste format as shown below. I needed more time to reformat the data than to write the rest. Dieter library(lattice) library(reshape) dt = data.frame( init = c(GG,EC, SQ,SK), Tenth = c(112,100,160,120), Fifth = c(152,120,184,100), Third = c(168,140,NA,180)) # The data should look like this #init interval beats #GC Tenth112 #dtlong = melt(dt) # almost does it, but column variable is ugly dtlong = melt(dt,variable_name=beats) # better dtlong xyplot(value~variable,groups=init,data=dtlong,type=l, auto.key=list(lines=TRUE)) -- View this message in context: http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997435.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multi-line graph?
?matplot e.g., copy your data to the clipboard then library(psych) my.data - read.clipboard() my.data Tenth Fifth Third GG 112 152 168 EC 100 120 140 SQ 160 184NA SK 120 100 180 matplot(t(my.data),type=b) Bill At 10:27 AM -0700 10/15/10, barnhillec wrote: Hi, I am relatively new to R but not to graphing, which I used to do in Excel and a few other environments on the job. I'm going back to school for a PhD and am teaching myself R beforehand. So I hope this question is not unacceptably ignorant but I have perused every entry level document I can find and so far I'm out of luck. I'm trying to graph some simple music psychology data. Columns are musical intervals, rows are the initials of the subjects. Numbers are in beats per minute (this is the value at which they hear the melodic interval split into two streams). So here's my table: Tenth Fifth Third GG 112152 168 EC 100120 140 SQ 160 184NA SK 120 100 180 I want a multi-line graph where the intervals are on the X axis, and the y axis is the beats per minute, and each subject has a different line. In Excel this would be no problem but I am having trouble in R. The only way I can figure out how to plot this in R is if the columns or rows are taken as variables. But the variable is beats per minute. Any suggestions? I appreciate the help. -Eric -- View this message in context: http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with merging two zoo objects
Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: dat11 V2 V3 V4 V5 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 dat22 V2 V3 V4 V5 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA 2010-10-15 12:09:33 NA NA NA NA NA NA NA NA 2010-10-15 13:43:54 NA NA NA NA NA NA NA NA 2010-10-15 13:44:15 NA NA NA NA NA NA NA NA 2010-10-15 13:45:51 NA NA NA NA NA NA NA NA 2010-10-15 13:46:21 NA NA NA NA NA NA NA NA 2010-10-15 13:47:27 NA NA NA NA NA NA NA NA 2010-10-15 13:47:54 NA NA NA NA NA NA NA NA 2010-10-15 13:49:51 NA NA NA NA NA NA NA NA Warning messages: 1: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length 2: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length If somebody points me whether I went wrong, it would be really great. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Overlaying two png?
I have a program that creates a Png file using Rgooglemap with an extent (lonmin,lonmax,latmin,latmax) I also have a contour plot of the same location, same extent, same sized (height/width) png file. I'm looking for a way to make the contour semi transparent and overlay it on the google map ( hybrid map) Since I have 7000 of these to do an automated process is desired ( grin) Any pointers in the right direction ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
Megh wrote: Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA Since the simulated example works, it must have to do with your data. Try str(dat11), str(dat12), maybe something strange has crept in. Dieter library(zoo) x.date - as.Date(paste(2003, 02, c(1, 3, 7, 9, 14), sep = -)) x - zoo(matrix(1:10, ncol = 2), x.date) x str(x) y.date - as.Date(paste(2006, 02, c(1, 3, 7, 9, 14), sep = -)) y - zoo(matrix(11:20, ncol = 2), y.date) y str(y) -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997494.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
On Fri, 15 Oct 2010, Megh Dal wrote: Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: dat11 V2 V3 V4 V5 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 dat22 V2 V3 V4 V5 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA 2010-10-15 12:09:33 NA NA NA NA NA NA NA NA 2010-10-15 13:43:54 NA NA NA NA NA NA NA NA 2010-10-15 13:44:15 NA NA NA NA NA NA NA NA 2010-10-15 13:45:51 NA NA NA NA NA NA NA NA 2010-10-15 13:46:21 NA NA NA NA NA NA NA NA 2010-10-15 13:47:27 NA NA NA NA NA NA NA NA 2010-10-15 13:47:54 NA NA NA NA NA NA NA NA 2010-10-15 13:49:51 NA NA NA NA NA NA NA NA Warning messages: 1: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length 2: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length If somebody points me whether I went wrong, it would be really great. merge() does cbind() (among some more general computations), I guess you want rbind(). Try rbind(dat11, dat22). hth, Z Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
On Fri, Oct 15, 2010 at 2:20 PM, Megh Dal megh700...@yahoo.com wrote: Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: dat11 V2 V3 V4 V5 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 dat22 V2 V3 V4 V5 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA 2010-10-15 12:09:33 NA NA NA NA NA NA NA NA 2010-10-15 13:43:54 NA NA NA NA NA NA NA NA 2010-10-15 13:44:15 NA NA NA NA NA NA NA NA 2010-10-15 13:45:51 NA NA NA NA NA NA NA NA 2010-10-15 13:46:21 NA NA NA NA NA NA NA NA 2010-10-15 13:47:27 NA NA NA NA NA NA NA NA 2010-10-15 13:47:54 NA NA NA NA NA NA NA NA 2010-10-15 13:49:51 NA NA NA NA NA NA NA NA Warning messages: 1: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length 2: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length If somebody points me whether I went wrong, it would be really great. If I try it then it works properly so there is likely something wrong with your dat11 and dat22 objects. If you provide the problem reproducibly one might be able to say more. Lines1 - Date Time V2 V3 V4 V5 + 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 + 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 + 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 Lines2 - Date Time V2 V3 V4 V5 + 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 + 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 + 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 library(zoo) dat1 - read.zoo(textConnection(Lines1), header = TRUE, + index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t))) Warning messages: 1: closing unused connection 8 (Lines2) 2: closing unused connection 7 (Lines1) 3: closing unused connection 5 (Lines2) 4: closing unused connection 4 (Lines1) 5: closing unused connection 3 (Lines2) dat2 - read.zoo(textConnection(Lines2), header = TRUE, + index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t))) merge(dat1, dat2) V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2 V3.dat2 V4.dat2 V5.dat2 2010-10-15 12:09:12 NA NA NA NA74.074.0 74.074.0 2010-10-15 12:09:33 NA NA NA NA73.973.9 73.973.9 2010-10-15 12:20:36 NA NA NA NA74.074.0 74.074.0 2010-10-15 12:30:36 NA NA NA NA74.074.0 74.074.0 2010-10-15 12:41:03 NA NA NA NA73.773.7 73.773.7 2010-10-15 13:43:5473.873.873.873.8 NA NA NA NA 2010-10-15 13:44:1573.873.873.873.8 NA NA NA NA 2010-10-15 13:45:5173.873.873.873.8 NA NA NA NA 2010-10-15 13:46:2173.873.873.873.8 NA NA NA NA 2010-10-15 13:47:2773.873.873.873.8 NA NA NA NA 2010-10-15 13:47:5473.873.873.873.8 NA NA NA NA 2010-10-15 13:49:5173.773.773.773.7 NA NA NA NA -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
Hi Gabor, please see the attached files which is in text format. I have opened them on excel then, used clipboard to load them into R. Still really unclear what to do. Also can you please elaborate this term index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)) in your previous file? In help, it is given that:If FUN is specified then read.zoo calls FUN with the index as the first argument. I really could not connect your syntax with help. --- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote: From: Gabor Grothendieck ggrothendi...@gmail.com Subject: Re: [R] Problem with merging two zoo objects To: Megh Dal megh700...@yahoo.com Cc: r-h...@stat.math.ethz.ch Date: Saturday, October 16, 2010, 12:11 AM On Fri, Oct 15, 2010 at 2:20 PM, Megh Dal megh700...@yahoo.com wrote: Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: dat11 V2 V3 V4 V5 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 dat22 V2 V3 V4 V5 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA 2010-10-15 12:09:33 NA NA NA NA NA NA NA NA 2010-10-15 13:43:54 NA NA NA NA NA NA NA NA 2010-10-15 13:44:15 NA NA NA NA NA NA NA NA 2010-10-15 13:45:51 NA NA NA NA NA NA NA NA 2010-10-15 13:46:21 NA NA NA NA NA NA NA NA 2010-10-15 13:47:27 NA NA NA NA NA NA NA NA 2010-10-15 13:47:54 NA NA NA NA NA NA NA NA 2010-10-15 13:49:51 NA NA NA NA NA NA NA NA Warning messages: 1: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length 2: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length If somebody points me whether I went wrong, it would be really great. If I try it then it works properly so there is likely something wrong with your dat11 and dat22 objects. If you provide the problem reproducibly one might be able to say more. Lines1 - Date Time V2 V3 V4 V5 + 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 + 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 + 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 Lines2 - Date Time V2 V3 V4 V5 + 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 + 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 + 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 library(zoo) dat1 - read.zoo(textConnection(Lines1), header = TRUE, + index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t))) Warning messages: 1: closing unused connection 8 (Lines2) 2: closing unused connection 7 (Lines1) 3: closing unused connection 5 (Lines2) 4: closing unused connection 4 (Lines1) 5: closing unused connection 3 (Lines2) dat2 - read.zoo(textConnection(Lines2), header = TRUE, + index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t))) merge(dat1, dat2) V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2 V3.dat2 V4.dat2 V5.dat2 2010-10-15 12:09:12 NA NA NA NA 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 NA NA NA NA 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 NA NA NA NA 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 NA NA NA NA 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 NA NA NA NA 73.7 73.7 73.7 73.7 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 NA NA NA NA 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 NA NA NA NA 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 NA NA NA NA
Re: [R] using apply function and storing output
Hi: You need to give a function for rollapply() to apply :) Here's my toy example: d - as.data.frame(matrix(rpois(30, 5), nrow = 10)) library(zoo) d1 - zoo(d) # uses row numbers as index # rolling means of 3 in each subseries (columns) rollmean(d1, 3) V1 V2 V3 2 3.00 5.33 4.33 3 3.33 4.33 4.00 4 2.67 3.67 3.33 5 4.33 3.67 3.33 6 5.00 5.00 5.00 7 4.67 4.67 4.00 8 5.67 3.67 4.00 9 5.00 3.00 2.67 # create new function cv - function(x) sd(x)/mean(x) # use new function in rollapply(): rollapply(d1, 3, FUN = cv) V1V2V3 2 0.333 0.1082532 0.5329387 3 0.1732051 0.4803845 0.6614378 4 0.2165064 0.5677271 0.4582576 5 0.7418193 0.5677271 0.4582576 6 0.600 0.3464102 0.400 7 0.7525467 0.4948717 0.6614378 8 0.8882158 0.5677271 0.6614378 9 1.0583005 0.333 0.2165064 This is a simplistic example, but it should get you started. HTH, Dennis On Fri, Oct 15, 2010 at 2:00 AM, David A. dasol...@hotmail.com wrote: Hi Dennis and Dimitris, thanks for your answers. I am trying the rollmean() function and also the rollapply() function because I also want to calculate CV for the values. For this I created a co.var() function. I am having problems using them. co.var-function(x)( +sd(x)/mean(x) +) dim(mydata) [1] 1710 244 xxx-rollmean(mydata,3,by=3) works fine and creates a vector which I will transform into a matrix. I still have to find out how to store the output in my previously created 570x244 0's matrix in an ordered way. but, since the examples in the help page says it´s the same, I tried xxx-rollapply(mydata,3,mean,by=3) Error in UseMethod(rollapply) : no applicable method for 'rollapply' applied to an object of class c('matrix', 'integer', 'numeric') and, with my created function... xxx-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3) Error in UseMethod(rollapply) : no applicable method for 'rollapply' applied to an object of class c('matrix', 'integer', 'numeric') Can you help me with the error? Dave -- Date: Fri, 15 Oct 2010 00:45:08 -0700 Subject: Re: [R] using apply function and storing output From: djmu...@gmail.com To: dasol...@hotmail.com CC: r-help@r-project.org Hi: Look into the rollmean() function in package zoo. HTH, Dennis On Fri, Oct 15, 2010 at 12:34 AM, David A. dasol...@hotmail.com wrote: Hi list, I have a 1710x244 matrix of numerical values and I would like to calculate the mean of every group of three consecutive values per column to obtain a new matrix of 570x244. I could get it done using a for loop but how can I do that using apply functions? In addition to this, do I have to initizalize a 570x244 matrix with 0's to store the calculated values or can the output matrix be generated while calculating the mean values? Cheers, Dave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scaling on normlized data set
Hi, I am using R and tried to normalize the data within each sample group using RMA. When I tried to import the all the normalized expression data as a single text file and make a boxplot, it showed discrepancy among the sample groups. I tried to scale them or re-normalize them again, so that it can be used for further analysis. Unfortunately, I did not manage it on using AffyPLM package. Would you please help me out with this problem. Thanks in advance. Best regards, H. Nawaz -- View this message in context: http://r.789695.n4.nabble.com/scaling-on-normlized-data-set-tp2996771p2996771.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bwplot change whiskers position to percentile 5 and P95
Hi David, More info Thanks a lot Christophe ## library(Hmisc) library(lattice) library(fields) library(gregmisc) library(quantreg) str(sasdata03_a) 'data.frame': 109971 obs. of 6 variables: $ jaar : Factor w/ 3 levels 2006,2007,..: 1 1 1 1 1 1 1 1 1 1 ... $ Cat_F : Factor w/ 6 levels a,d,g,h,..: 1 1 1 1 1 1 1 1 1 1 ... $ montant_oa: num 8087 1960 1573 11502 3862 ... $ nombre_cas: int 519 140 111 780 262 125 79 334 97 503 ... $ montant_i : num 221 221 221 221 221 ... $ nombre_i : int 12 12 12 12 12 12 12 12 12 12 ... aggregate.table(sasdata03_a$nombre_cas, sasdata03_a$jaar, sasdata03_a$Cat_F, nobs) a d g h i k 2006 7568 1911 5351 7430 7377 7217 2007 7499 1914 5378 7334 7275 7138 2008 7524 1953 5488 7292 7192 7130 d2008 - sasdata03_a[sasdata03_a$Cat_F==d sasdata03_a$jaar==2008, ] quantile(d2008$nombre_cas, probs = c(0.95)) 95% 3630.2 #bb# # # # graph on real data # # # #bb# bwplot(jaar ~ nombre_cas | Cat_F, xlab=, ylab=, data=sasdata03_a , xlim=c(-100,3800), layout=c(2,3), X = sasdata03_a$nombre_i, pch = |, stats=boxplot2.stats, main=Fig. 1: P95 d 2008=3630, par.settings = list( plot.symbol = list(alpha = 1, col = transparent,cex = 1,pch = 20)), panel = function(x, y, ..., X, subscripts){ panel.grid(v = -1, h = 0) panel.bwplot(x, y, ..., subscripts = subscripts) X - X[subscripts] X - tapply(X, y, unique) Y - tapply(y, y, unique) tg - table(y) panel.points(X, Y, cex=3, pch =| , col = red) panel.text((3700*0.8), (Y-0.15), labels = paste(N=, tg)) }) #Simulated data 1 # ex - data.frame(v1 = log(abs(rt(180, 3)) + 1), v2 = rep(c(2007, 2006, 2005), 60), z = rep(c(a, b, c, d, e, f), e = 30)) ex2 - data.frame(v1b = log(abs(rt(18, 3)) + 1), v2 = rep(c(2007, 2006, 2005), 6), z = rep(c(a, b, c, d, e, f), e = 3)) ex3 - merge(ex, ex2, by=c(v2,z)) # # # #graph on simulated data # # # # bwplot(as.factor(v2) ~ v1 | as.factor(z), data = ex3, layout=c(3,2), X = ex3$v1b, pch = |, main=Boxplot.stats, stats=boxplot2.stats, par.settings = list( plot.symbol = list(alpha = 1, col = transparent,cex = 1,pch = 20)), panel = function(x, y, ..., X, subscripts){ panel.grid(v = -1, h = 0) panel.bwplot(x, y, ..., subscripts = subscripts) X - X[subscripts] xmax =max(x) X - tapply(X, y, unique) Y - tapply(y, y, unique) tg - table(y) panel.points(X, Y, cex=3, pch =| , col = red) panel.text((xmax-xmax*0.1), (Y-0.15), labels = paste(N=, tg)) }) ## On Thu, Oct 14, 2010 at 5:22 PM, David Winsemius dwinsem...@comcast.netwrote: On Oct 14, 2010, at 11:07 AM, Christophe Bouffioux wrote: Hi, I have tried your proposition, and it works properly on the simulated data, but not on my real data, and I do not see any explanations, this is weird, i have no more ideas to explore the problem You should: a) provide the code that you used to call boxplot b) offer str(sasdata03_a) , since summary will often obscure class-related problems c) consider running: sasdata03_a$Cat_F - factor(sasdata03_a$Cat_F) # to get rid of the empty factor level so here i give some information on my data, nothing special actually, Christophe summary(sasdata03_a) jaar Cat_F montant_oanombre_cas montant_i 2006:36854 a :22591Min. : -112.8Min. : -5.0 Min. : 33.22 2007:36538 h :220561st Qu.: 1465.5 1st Qu.: 104.01st Qu.: 37.80 2008:36579 i :21844 Median : 4251.5 Median : 307.0 Median : 50.00 k :21485 Mean : 13400.0Mean : 557.5 Mean : 1172.17 g :16217 3rd Qu.: 13648.53rd Qu.: 748.0 3rd Qu.: 458.67 d : 5778 Max. : 534655.6Max. :13492.0 Max. :17306.73 (Other):0 nombre_i Min. : 1.00 1st Qu.: 1.00 Median : 5.00 Mean : 44.87 3rd Qu.: 29.00 Max. :689.00 is.data.frame(sasdata03_a) [1] TRUE The code of the function: boxplot2.stats - function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE) { if (coef 0) stop('coef' must not be negative) nna - !is.na(x) n - sum(nna) stats - stats::fivenum(x, na.rm = TRUE) stats[c(1,5)]- quantile(x, probs=c(0.05, 0.95)) iqr - diff(stats[c(2, 4)]) if (coef == 0) do.out - FALSE else {
Re: [R] RJava help
public class my_convolve { public static void main(String[] args) { } public static void convolve() { System.out.println(Hello); } } library(rJava) .jinit(classpath=C:/Documents and Settings/GV/workspace/Test/bin, parameters=-Xmx512m) .jcall(my_convolve, method=convolve) Error in .jcall(my_convolve, method = convolve) : method convolve with signature ()V not found -- View this message in context: http://r.789695.n4.nabble.com/RJava-help-tp2995886p2996914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time vs Concentration Graphs by ID
I tried that too, it doesn't work because of the way I wrote the code. Listing y as free or not giving it a limit makes the scale go from -0.5 to 0.5, which is useless. This is what my code looks like now (it's S-Plus code, btw)-- I'll try reading up on lattices in R to see if I can come up with something better. par(mfrow = c(2,2)) unique.id - unique(d1$ID) unique.id - sort(unique.id) for(j in unique.id){ temp.id - d1[d1$ID==j,] unique.dose -unique(temp.id$DOSE) plot(0,0,type=n, xlab= Time (hr),ylab=Concentration (ug/L), xlim=c(0,32),ylim=c(0,200), main=paste(ID,j) ) for(i in unique.dose){ temp.subanddose - temp.id[temp.id$DOSE==5,] points(temp.subanddose$TAD, temp.subanddose$DV,col=1,pch=1,) points(temp.subanddose$TAD, temp.subanddose$IPRE,type=l,col=1)} for(i in unique.dose){ temp.subanddose - temp.id[temp.id$DOSE==7,] points(temp.subanddose$TAD, temp.subanddose$DV,col=2,pch=2,) points(temp.subanddose$TAD, temp.subanddose$IPRE,type=l,col=2)} for(i in unique.dose){ temp.subanddose - temp.id[temp.id$DOSE==10,] points(temp.subanddose$TAD, temp.subanddose$DV,col=3,pch=3,) points(temp.subanddose$TAD, temp.subanddose$IPRE,type=l,col=3)} key(text=list(c(5 mg, 7 mg, 10 mg)),lines=list(type=rep (l,3),col=1:3),border=T,corner=c(1,1)) } On Fri, Oct 15, 2010 at 1:31 AM, David Winsemius dwinsem...@comcast.netwrote: On Oct 15, 2010, at 3:46 AM, Anh Nguyen wrote: Hello Dennis, That's a very good suggestion. I've attached a template here as a .png file, I hope you can view it. This is what I've managed to achieve in S-Plus (we use S-Plus at work but I also use R because there's some very good R packages for PK data that I want to take advantage of that is not available in S-Plus). The only problem with this is, unfortunately, I cannot figure out how make the scale non-uniform and I hope to fix that. That would be easy if your efforts which I have not yet seen were in lattice. If htat were the case then adding this would solve you problem: scales=list(y=list(relation=free) -- David My data looks like this: IDDose Time Conc Pred ... 1 5 0 00 1 5 0.5 68 1 5 1 16 20 ... 1 7 0 00 1 7 0.5 10 12 1 7 1 20 19 ... 110 3 60 55 ... 2512 4 2 ... ect I don't care if it's ggplot or something else as long as it looks like how I envisioned. On Fri, Oct 15, 2010 at 12:22 AM, Dennis Murphy djmu...@gmail.com wrote: I don't recall that you submitted a reproducible example to use as a template for assistance. Ista was kind enough to offer a potential solution, but it was an abstraction based on the limited information provided in your previous mail. If you need help, please provide an example data set that illustrates the problems you're encountering and what you hope to achieve - your chances of a successful resolution will be much higher when you do. BTW, there's a dedicated newsgroup for ggplot2: look for the mailing list link at http://had.co.nz/ggplot2/ HTH, Dennis On Thu, Oct 14, 2010 at 10:02 PM, Anh Nguyen eataban...@gmail.com wrote: I found 2 problems with this method: - There is only one line for predicted dose at 5 mg. - The different doses are 5, 7, and 10 mg but somehow there is a legend for 5,6,7,8,9,10. - Is there a way to make the line smooth? - The plots are also getting a little crowded and I was wondering if there a way to split it into 2 or more pages? Thanks for your help. On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi, Assuming the data is in a data.frame named D, something like library(ggplot2) # May need install.packages(ggplot2) first ggplot(D, aes(x=Time, y=Concentration, color=Dose) + geom_point() + geom_line(aes(y = PredictedConcentration, group=1)) + facet_wrap(~ID, scales=free, ncol=3) should do it. -Ista On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo eataban...@gmail.com wrote: Hello-- I have a data for small population who took 1 drug at 3 different doses. I have the actual drug concentrations as well as predicted concentrations by my model. This is what I'm looking for: - Time vs Concentration by ID (individual plots), with each subject occupying 1 plot -- there is to be 9 plots per page (3x3) - Observed drug concentration is made up of points, and predicted drug concentration is a curve without points. Points and curve will be the same color for each dose. Different doses will have different colors. - A legend to specify which color correlates to which dose. - Axes should be different for each individual (as some individual will
Re: [R] split data with missing data condition
Try this: split(as.data.frame(DF), is.na(DF$x)) On Fri, Oct 15, 2010 at 9:45 AM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear all I have data like this: x y [1,] 59.74889 3.1317081 [2,] 38.77629 1.7102589 [3,] NA 2.2312962 [4,] 32.35268 1.3889621 [5,] 74.01394 1.5361227 [6,] 34.82584 1.1665412 [7,] 42.72262 2.7870875 [8,] 70.54999 3.3917257 [9,] 59.37573 2.6763249 [10,] 68.87422 1.9697770 [11,] 19.00898 2.0584415 [12,] 60.27915 2.5365194 [13,] 50.76850 2.3943836 [14,] NA 2.2862790 [15,] 39.01229 1.7924957 and I want to spit data into two set of data, data set of nonmising and data set of missing. How I can do this. Many Thanks. Jumlong -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using apply function and storing output
Thanks Dennis, I don't think it was a problem of not feeding in a function for rollapply(), because I was using mean() and my co.var() function in the FUN argument. The key part seems to be the transformation that zoo() does to the matrix. If I do the same transformation to my original matrix, the rollapply() function now works fine. Thanks again David Date: Fri, 15 Oct 2010 03:00:27 -0700 Subject: Re: [R] using apply function and storing output From: djmu...@gmail.com To: dasol...@hotmail.com CC: r-help@r-project.org Hi: You need to give a function for rollapply() to apply :) Here's my toy example: d - as.data.frame(matrix(rpois(30, 5), nrow = 10)) library(zoo) d1 - zoo(d) # uses row numbers as index # rolling means of 3 in each subseries (columns) rollmean(d1, 3) V1 V2 V3 2 3.00 5.33 4.33 3 3.33 4.33 4.00 4 2.67 3.67 3.33 5 4.33 3.67 3.33 6 5.00 5.00 5.00 7 4.67 4.67 4.00 8 5.67 3.67 4.00 9 5.00 3.00 2.67 # create new function cv - function(x) sd(x)/mean(x) # use new function in rollapply(): rollapply(d1, 3, FUN = cv) V1V2V3 2 0.333 0.1082532 0.5329387 3 0.1732051 0.4803845 0.6614378 4 0.2165064 0.5677271 0.4582576 5 0.7418193 0.5677271 0.4582576 6 0.600 0.3464102 0.400 7 0.7525467 0.4948717 0.6614378 8 0.8882158 0.5677271 0.6614378 9 1.0583005 0.333 0.2165064 This is a simplistic example, but it should get you started. HTH, Dennis On Fri, Oct 15, 2010 at 2:00 AM, David A. dasol...@hotmail.com wrote: Hi Dennis and Dimitris, thanks for your answers. I am trying the rollmean() function and also the rollapply() function because I also want to calculate CV for the values. For this I created a co.var() function. I am having problems using them. co.var-function(x)( +sd(x)/mean(x) +) dim(mydata) [1] 1710 244 xxx-rollmean(mydata,3,by=3) works fine and creates a vector which I will transform into a matrix. I still have to find out how to store the output in my previously created 570x244 0's matrix in an ordered way. but, since the examples in the help page says it´s the same, I tried xxx-rollapply(mydata,3,mean,by=3) Error in UseMethod(rollapply) : no applicable method for 'rollapply' applied to an object of class c('matrix', 'integer', 'numeric') and, with my created function... xxx-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3) Error in UseMethod(rollapply) : no applicable method for 'rollapply' applied to an object of class c('matrix', 'integer', 'numeric') Can you help me with the error? Dave Date: Fri, 15 Oct 2010 00:45:08 -0700 Subject: Re: [R] using apply function and storing output From: djmu...@gmail.com To: dasol...@hotmail.com CC: r-help@r-project.org Hi: Look into the rollmean() function in package zoo. HTH, Dennis On Fri, Oct 15, 2010 at 12:34 AM, David A. dasol...@hotmail.com wrote: Hi list, I have a 1710x244 matrix of numerical values and I would like to calculate the mean of every group of three consecutive values per column to obtain a new matrix of 570x244. I could get it done using a for loop but how can I do that using apply functions? In addition to this, do I have to initizalize a 570x244 matrix with 0's to store the calculated values or can the output matrix be generated while calculating the mean values? Cheers, Dave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem using BRugs
Hi R users, I am trying to call openbugs from R. And I got the following error message: ~ model is syntactically correct expected the collection operator c error pos 8 (error on line 1) variable ww is not defined in model or in data set [1] C:\\DOCUME~1\\maomao\\LOCALS~1\\Temp\\RtmpqJk9R3/inits1.txt Initializing chain 1: model must be compiled before initial values loaded model must be initialized before updating model must be initialized before DIC an be monitored Error in samplesSet(parametersToSave) : model must be initialized before monitors used ~~ I did define variable ww in my data and model (they are listed below). I am not sure if this is due to some errors in my code (please see below) or because openbugs cannot handle the model I am using. In my model, y[i] also depends on all other y[j]s. Could you help me figure out the problem and hopefully get the code to work? Many thanks for your help. --- Maomao ~~ data-list(y,cap2,pol2,cap1,pol1,g,wo,wd,ww,mu,tau) inits-function() {list(beta=beta0, rho_o=rho_o_0, rho_d=rho_d_0, rho_w=rho_w_0)} parameters-c(beta, rho_o, rho_d, rho_w) probit.sim-BRugsFit(data,inits,parameters,modelFile=spatial.openbugs.txt,numChains=1,nIter=2000) # my model model { for (i in 1:676) { y[i] ~ dbern(p[i]) wwy[i]- inprod(ww[i, 1:676] , y[]) woy[i]- inprod(wo[i, 1:676] , y[]) wdy[i]- inprod(wd[i, 1:676] , y[]) probit(p[i])- rho_o * woy[i] + rho_d * wdy[i] + rho_w * wwy[i] + beta[1] + beta[2] * cap2[i] + beta[3] * pol2[i] + beta[4] * cap1[i] + beta[5] * pol1[i] + beta[6] * g[i]+ e[i] } # Priors for (j in 1:6) { beta[1:6] ~ dmnorm(mu[1:6], tau[1:6, 1:6]) } rho_o ~ dunif(-1,1) rho_d ~ dunif(-1,1) rho_w ~ dunif(-1,1) for (i in 1:676) { e[i] ~ dnorm(0, 1) } } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time vs Concentration Graphs by ID
Thank you for the very helpful tips. Just one last question: - In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs PREDconcentration in one graph (per individual)? You said in lattice you would replace 'smooth' by 'l' in the type = argument of xyplot() that just means now the OBSconc is replaced by a line instead of points and PREDconc is not plotted, right? -- I tried to superimpose 2 lattices but that won't work. I can keep the x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs conc = 80 and thus for this case, there would be 2 different y-scales superimposed on one another. - the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there are only 5,7,10 mg doses) but it's nothing that photoshop can't take care of. This is very cool, I think I understand it a lot better now. It was a lot easier than what I was doing before, that's for sure. Thanks! On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: To get the plots precisely as you have given them in your png file, you're most likely going to have to use base graphics, especially if you want a separate legend in each panel. Packages ggplot2 and lattice have more structured ways of constructing such graphs, so you give up a bit of freedom in the details of plot construction to get nicer default configurations. Perhaps you've written a panel function in S-PLUS (?) to produce each graph in your png file - if so, you could simply add a couple of lines to determine the max y-value and from that the limits of the y scale. It shouldn't be at all difficult to make such modifications. Since R base graphics is very similar to base graphics in S-PLUS, you could possibly get away with popping your S-PLUS code directly into R and see how far that takes you. Lattice is based on Trellis graphics, but the syntax in lattice has changed a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more recent graphics system in R predicated on the grammar of graphics exposited by Leland Wilkinson. For my example, I've modified the Theophylline data set in package nlme, described in Pinheiro Bates (2000), Mixed Effects Models in S and S-PLUS, Springer. The original data set has eleven unique doses - I combined them into three intervals and converted them to factor with cut(). I also created four groups of Subjects and put them into a variable ID. The output data frame is called theo. I didn't fit the nlme models to the subjects - instead, I was lazy and used loess smoothing instead. The code to generate the data frame is given below; this is what we mean by a 'reproducible example', something that can be copied and pasted into an R session. # Use modified version of Theophylline data in package nlme library(nlme) theo - Theoph theo - subset(theo, Dose 3.5) theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25', '4.8', '5.5')) theo - as.data.frame(theo) theo$ID - with(theo, ifelse(Subject %in% c(6, 7, 12), 1, ifelse(Subject %in% c(2, 8, 10), 2, ifelse(Subject %in% c(4, 11, 5), 3, 4) ))) # ID is used for faceting, dose for color and shape # lattice version: xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3, pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'), scales = list(y = list(relation = 'free')), auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points = TRUE, text = levels(theo$dose)) ) # ggplot2 version: # scales = 'free_y' allows independent y scales per panel g - ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose, group = Subject)) g + geom_point() + geom_smooth(method = 'loess', se = FALSE) + facet_wrap( ~ ID, ncol = 2, scales = 'free_y') + opts(legend.position = c(0.9, 0.4)) This is meant to give you some indication how the two graphics systems work - it's a foundation, not an end product. There are a couple of ways I could have adjusted the y-scales in the lattice graphs (either directly in the scales = ... part or to use a prepanel function for loess), but since you're not likely to use loess in your plots, it's not an important consideration. Both ggplot2 and lattice place the legend outside the plot area by default; I've illustrated a couple of ways to pull it into the plot region FYI. One other thing: if your data set contains fitted values from your PK models for each subject * dose combination, the loess smoothing is unnecessary. In ggplot2, you would use geom_line(aes(y = pred), ...) in place of geom_smooth(), and in lattice you would replace 'smooth' by 'l' in the type = argument of xyplot(). HTH, Dennis On Fri, Oct 15, 2010 at 12:46 AM, Anh Nguyen eataban...@gmail.com wrote: Hello Dennis, That's a very good suggestion. I've attached a template here as a .png file, I hope you can view it. This
[R] feed cut() output into goodness-of-fit tests
Hello, My question is assuming I have cut()'ed my sample and look at the table() of it, how can I compute probabilities for the bins? Do I have to parse table's names() to fetch bin endpoints to pass them to p[distr-name] functions? i really don't want to input arguments to PDF functions by hand (nor copy-and-paste way). x.fr - table(cut(x,10)) x.fr (0.0617,0.549] (0.549,1.04](1.04,1.52](1.52,2.01] (2.01,2.5] 16 28 26 18 6 (2.5,2.99](2.99,3.48](3.48,3.96](3.96,4.45](4.45,4.94] 3 2 0 0 1 names(x.fr) [1] (0.0617,0.549] (0.549,1.04] (1.04,1.52](1.52,2.01] [5] (2.01,2.5] (2.5,2.99] (2.99,3.48](3.48,3.96] [9] (3.96,4.45](4.45,4.94] -- Andrei Zorine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
I have compared dat11 and x using str() function, however did not find drastic difference: str(dat11) ‘zoo’ series from 2010-10-15 13:43:54 to 2010-10-15 13:49:51 Data: num [1:7, 1:4] 73.8 73.8 73.8 73.8 73.8 73.8 73.7 73.8 73.8 73.8 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:7] 7 6 5 4 ... ..$ : chr [1:4] V2 V3 V4 V5 Index: POSIXlt[1:7], format: 2010-10-15 13:43:54 2010-10-15 13:44:15 2010-10-15 13:45:51 2010-10-15 13:46:21 2010-10-15 13:47:27 2010-10-15 13:47:54 ... str(x) ‘zoo’ series from 2003-02-01 to 2003-02-14 Data: int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10 Index: Class 'Date' num [1:5] 12084 12086 12090 12092 12097 Thanks, -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997510.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
On Fri, 15 Oct 2010, Megh Dal wrote: Hi Gabor, please see the attached files which is in text format. I have opened them on excel then, used clipboard to load them into R. Still really unclear what to do. I've read both files using read.zoo(): R z1 - read.zoo(dat1.txt, sep = ,, header = TRUE, +format = %m/%d/%Y %H:%M:%S, tz = ) Warning message: In zoo(rval3, ix) : some methods for zoo objects do not work if the index entries in 'order.by' are not unique R z2 - read.zoo(dat2.txt, sep = ,, header = TRUE, +format = %m/%d/%Y %H:%M:%S, tz = ) Then, merge() does not work because some time indexes are not unique (and it would be unclear how these should be matched): R merge(z1, z2) Error in merge.zoo(z1, z2) : series cannot be merged with non-unique index entries in a series However, you can remove the duplicated, e.g., by computing averages: R z1a - aggregate(z1, time(z1), mean) Then R merge(z1a, z2) works. Also can you please elaborate this term index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)) in your previous file? In help, it is given that:If FUN is specified then read.zoo calls FUN with the index as the first argument. I really could not connect your syntax with help. The way Gabor read the data, the index was in two separate columns (columns 1 and 2). Hence, he specified index = list(1, 2) and then provided a function that would return a POSIXct object when called with two arguments FUN(column1, column2) hth, Z --- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote: From: Gabor Grothendieck ggrothendi...@gmail.com Subject: Re: [R] Problem with merging two zoo objects To: Megh Dal megh700...@yahoo.com Cc: r-h...@stat.math.ethz.ch Date: Saturday, October 16, 2010, 12:11 AM On Fri, Oct 15, 2010 at 2:20 PM, Megh Dal megh700...@yahoo.com wrote: Dear all, I have following 2 zoo objects. However when I try to merge those 2 objects into one, nothing is coming as intended. Please see below the objects as well as the merged object: dat11 V2 V3 V4 V5 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 dat22 V2 V3 V4 V5 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 merge(dat11, dat22) V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22 2010-10-15 12:09:12 NA NA NA NA NA NA NA NA 2010-10-15 12:09:33 NA NA NA NA NA NA NA NA 2010-10-15 13:43:54 NA NA NA NA NA NA NA NA 2010-10-15 13:44:15 NA NA NA NA NA NA NA NA 2010-10-15 13:45:51 NA NA NA NA NA NA NA NA 2010-10-15 13:46:21 NA NA NA NA NA NA NA NA 2010-10-15 13:47:27 NA NA NA NA NA NA NA NA 2010-10-15 13:47:54 NA NA NA NA NA NA NA NA 2010-10-15 13:49:51 NA NA NA NA NA NA NA NA Warning messages: 1: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length 2: In MATCH(x, x) == seq_len(length(x)) : longer object length is not a multiple of shorter object length If somebody points me whether I went wrong, it would be really great. If I try it then it works properly so there is likely something wrong with your dat11 and dat22 objects. If you provide the problem reproducibly one might be able to say more. Lines1 - Date Time V2 V3 V4 V5 + 2010-10-15 13:43:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:44:15 73.8 73.8 73.8 73.8 + 2010-10-15 13:45:51 73.8 73.8 73.8 73.8 + 2010-10-15 13:46:21 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:27 73.8 73.8 73.8 73.8 + 2010-10-15 13:47:54 73.8 73.8 73.8 73.8 + 2010-10-15 13:49:51 73.7 73.7 73.7 73.7 Lines2 - Date Time V2 V3 V4 V5 + 2010-10-15 12:09:12 74.0 74.0 74.0 74.0 + 2010-10-15 12:09:33 73.9 73.9 73.9 73.9 + 2010-10-15 12:20:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:30:36 74.0 74.0 74.0 74.0 + 2010-10-15 12:41:03 73.7 73.7 73.7 73.7 library(zoo) dat1 - read.zoo(textConnection(Lines1), header = TRUE, + index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t))) Warning messages: 1: closing unused connection 8 (Lines2) 2: closing unused connection 7 (Lines1) 3: closing unused connection 5 (Lines2) 4: closing unused connection 4 (Lines1) 5: closing unused connection 3
[R] Beginner question on bar plot
I've read a number of examples on doing a multiple bar plot, but cant seem to grasp how they work or how to get my data into the proper form. I have two variable holding the same factor The variables were created using a cut command, The following simulates that A - 1:100 B - 1:100 A[30:60] - 43 Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high)) Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high)) What I want to do is create a barplot with 3 groups of side by side bars group 1, = low and the two bars would be the count for Acut, and the count for Bcut group 2 = med and the two bars again would be the counts for this factor level in Acut and Bcut group 3 = high and like the above two. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time vs Concentration Graphs by ID
On Fri, Oct 15, 2010 at 7:29 AM, Anh Nguyen eataban...@gmail.com wrote: Thank you for the very helpful tips. Just one last question: - In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs PREDconcentration in one graph (per individual)? You said in lattice you would replace 'smooth' by 'l' in the type = argument of xyplot() that just means now the OBSconc is replaced by a line instead of points and PREDconc is not plotted, right? -- I tried to superimpose 2 lattices but that won't work. I can keep the x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs conc = 80 and thus for this case, there would be 2 different y-scales superimposed on one another. - the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there are only 5,7,10 mg doses) but it's nothing that photoshop can't take care of. No need to resort to photoshop... see the breaks argument of ?scale_x_continous -Ista This is very cool, I think I understand it a lot better now. It was a lot easier than what I was doing before, that's for sure. Thanks! On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: To get the plots precisely as you have given them in your png file, you're most likely going to have to use base graphics, especially if you want a separate legend in each panel. Packages ggplot2 and lattice have more structured ways of constructing such graphs, so you give up a bit of freedom in the details of plot construction to get nicer default configurations. Perhaps you've written a panel function in S-PLUS (?) to produce each graph in your png file - if so, you could simply add a couple of lines to determine the max y-value and from that the limits of the y scale. It shouldn't be at all difficult to make such modifications. Since R base graphics is very similar to base graphics in S-PLUS, you could possibly get away with popping your S-PLUS code directly into R and see how far that takes you. Lattice is based on Trellis graphics, but the syntax in lattice has changed a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more recent graphics system in R predicated on the grammar of graphics exposited by Leland Wilkinson. For my example, I've modified the Theophylline data set in package nlme, described in Pinheiro Bates (2000), Mixed Effects Models in S and S-PLUS, Springer. The original data set has eleven unique doses - I combined them into three intervals and converted them to factor with cut(). I also created four groups of Subjects and put them into a variable ID. The output data frame is called theo. I didn't fit the nlme models to the subjects - instead, I was lazy and used loess smoothing instead. The code to generate the data frame is given below; this is what we mean by a 'reproducible example', something that can be copied and pasted into an R session. # Use modified version of Theophylline data in package nlme library(nlme) theo - Theoph theo - subset(theo, Dose 3.5) theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25', '4.8', '5.5')) theo - as.data.frame(theo) theo$ID - with(theo, ifelse(Subject %in% c(6, 7, 12), 1, ifelse(Subject %in% c(2, 8, 10), 2, ifelse(Subject %in% c(4, 11, 5), 3, 4) ))) # ID is used for faceting, dose for color and shape # lattice version: xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3, pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'), scales = list(y = list(relation = 'free')), auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points = TRUE, text = levels(theo$dose)) ) # ggplot2 version: # scales = 'free_y' allows independent y scales per panel g - ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose, group = Subject)) g + geom_point() + geom_smooth(method = 'loess', se = FALSE) + facet_wrap( ~ ID, ncol = 2, scales = 'free_y') + opts(legend.position = c(0.9, 0.4)) This is meant to give you some indication how the two graphics systems work - it's a foundation, not an end product. There are a couple of ways I could have adjusted the y-scales in the lattice graphs (either directly in the scales = ... part or to use a prepanel function for loess), but since you're not likely to use loess in your plots, it's not an important consideration. Both ggplot2 and lattice place the legend outside the plot area by default; I've illustrated a couple of ways to pull it into the plot region FYI. One other thing: if your data set contains fitted values from your PK models for each subject * dose combination, the loess smoothing is unnecessary. In ggplot2, you would use geom_line(aes(y = pred), ...) in place of geom_smooth(), and in lattice you would replace 'smooth' by 'l' in the type = argument of xyplot(). HTH, Dennis On Fri, Oct 15, 2010
Re: [R] Problem with merging two zoo objects
On Fri, Oct 15, 2010 at 3:22 PM, Megh Dal megh700...@yahoo.com wrote: Hi Gabor, please see the attached files which is in text format. I have opened them on excel then, used clipboard to load them into R. Still really unclear what to do. Also can you please elaborate this term index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)) in your previous file? In help, it is given that:If FUN is specified then read.zoo calls FUN with the index as the first argument. I really could not connect your syntax with help. 1. It works for me with the files you supplied. Note that you have some duplicate times in your first file so I aggregated them using only the last of any set of duplicates: library(zoo) dat11 - read.zoo(dal1.csv, aggregate = function(x) tail(x, 1), + sep = ,, header = TRUE, tz = , format = %m/%d/%Y %H:%M:%S) dat22 - read.zoo(dal2.csv, + sep = ,, header = TRUE, tz = , format = %m/%d/%Y %H:%M:%S) m - merge(dat11, dat22) str(m) ‘zoo’ series from 2010-10-15 09:00:24 to 2010-10-15 13:49:51 Data: num [1:361, 1:8] 74.3 74.3 74.3 74.2 74.2 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:8] data.open.dat11 data.high.dat11 data.low.dat11 data.close.dat11 ... Index: POSIXct[1:361], format: 2010-10-15 09:00:24 2010-10-15 09:01:15 ... packageDescription(zoo)$Version [1] 1.6-4 2. You seem to be using an old version of zoo. With the latest version of zoo on CRAN, 1.6-4, the index.column= documentation in help(read.zoo) does document the list construction. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] feed cut() output into goodness-of-fit tests
On Fri, Oct 15, 2010 at 10:22 AM, Andrei Zorine zoav1...@gmail.com wrote: Hello, My question is assuming I have cut()'ed my sample and look at the table() of it, how can I compute probabilities for the bins? I actually don't know what you mean by this (my own ignorance probably). Do I have to parse table's names() to fetch bin endpoints For equal-width bins you can use seq(min(x), max(x), by = (max(x) - min(x))/10) HTH, Ista to pass them to p[distr-name] functions? i really don't want to input arguments to PDF functions by hand (nor copy-and-paste way). x.fr - table(cut(x,10)) x.fr (0.0617,0.549] (0.549,1.04] (1.04,1.52] (1.52,2.01] (2.01,2.5] 16 28 26 18 6 (2.5,2.99] (2.99,3.48] (3.48,3.96] (3.96,4.45] (4.45,4.94] 3 2 0 0 1 names(x.fr) [1] (0.0617,0.549] (0.549,1.04] (1.04,1.52] (1.52,2.01] [5] (2.01,2.5] (2.5,2.99] (2.99,3.48] (3.48,3.96] [9] (3.96,4.45] (4.45,4.94] -- Andrei Zorine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
Thanks Gabor for pointing to my old version. However I got one more question why the argument tz= is sitting there? As you are not passing any explicit value for that, I am assuming it is redundant. Without any tz argument, I got following: head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y %H:%M:%S)) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique packageDescription(zoo) Package: zoo Version: 1.6-4 Date: 2010-07-09 Title: Z's ordered observations Author: Achim Zeileis, Gabor Grothendieck, Felix Andrews Maintainer: Achim Zeileis achim.zeil...@r-project.org Description: An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo's key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics. Depends: R (= 2.10.0), stats Suggests: coda, chron, DAAG, fCalendar, fSeries, fts, its, lattice, strucchange, timeDate, timeSeries, tis, tseries, xts Imports: stats, utils, graphics, grDevices, lattice (= 0.18-1) LazyLoad: yes License: GPL-2 Clearly the format argument is not working properly, the time component is missing. Why it is so? -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997662.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question on bar plot
Tena koe Steven cutData - rbind(summary(Acut), summary(Bcut)) barplot(cutData, beside=TRUE) should get you started. The challenge, as you identify, is to get the data into the appropriate form and the simple approach I have used may not work for your real data. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of steven mosher Sent: Saturday, 16 October 2010 9:01 a.m. To: r-help Subject: [R] Beginner question on bar plot I've read a number of examples on doing a multiple bar plot, but cant seem to grasp how they work or how to get my data into the proper form. I have two variable holding the same factor The variables were created using a cut command, The following simulates that A - 1:100 B - 1:100 A[30:60] - 43 Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high)) Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high)) What I want to do is create a barplot with 3 groups of side by side bars group 1, = low and the two bars would be the count for Acut, and the count for Bcut group 2 = med and the two bars again would be the counts for this factor level in Acut and Bcut group 3 = high and like the above two. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recovering x/y coordinates from a scatterplot image
Do I recall correctly that there is an R package that can take an image, and help one estimate the x/y coordinates? I can't find the package, thought it was an R-tool, but would appreciate any leads. Thanks, Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating 'all' sum contrasts
Michael, Let c_1 and c_2 be vectors representing contrasts. Then c_1 and c_2 are orthogonal if and only if the inner product is 0. In your example, you have vectors (1,0,-1) and (0,1,-1). The inner product is 1, so they are not orthogonal. It's impossible to have more orthogonal contrasts than you have levels in your factor, a result from basic linear algebra. You can get all possible pairwise contrasts, which is different from orthogonal contrasts (in fact, it's only possible to have floor(n/2) orthogonal pairwise contrasts). This is probably not the easiest way, but it works: n - 10 M - matrix(0,nrow=n,ncol=n*(n-1)/2) comb - combn(n,2) M[cbind(comb[1,],1:(n*(n-1)/2))] - 1 M[cbind(comb[2,],1:(n*(n-1)/2))] - -1 M is then a matrix containing all pairwise contrasts for n levels of a factor. Hope that helps, Jonathan On Fri, Oct 15, 2010 at 10:30 AM, Michael Hopkins hopk...@upstreamsystems.com wrote: On 15 Oct 2010, at 13:55, Berwin A Turlach wrote: G'day Michael, Hi Berwin Thanks for the reply On Fri, 15 Oct 2010 12:09:07 +0100 Michael Hopkins hopk...@upstreamsystems.com wrote: OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 1 1 0 2 0 1 3 -1 -1 These two contrasts are *not* orthogonal. I'm surprised. Can you please tell me how you calculated that. This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. You have to decide what you want. The contrasts for all pairwise comparaisons between the levels or all possible orthogonal contrasts? Well the pairwise contrasts are the most important as I am looking for evidence of whether they are zero (i.e. no difference between levels) or not. But see my above comment about orthogonality. The latter is actually not well defined. For a factor with p levels, there would be p orthogonal contrasts, which are only identifiable up to rotation, hence infinitely many such sets. But there are p(p-1) pairwise comparisons. So unless p=2, yo have to decide what you want Well of course the pairwise comparisons are bi-directional so in fact only p(p-1)/2 are of interest to me. Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? Look at package multcomp, in particular functions glht and mcp, these might help. Thanks I will have a look. But I want to be able to do this transparently within lm() using regsubsets() etc as I am collecting large quantities of summary stats from all possible models to use with a model choice criterion based upon true Bayesian model probabilities. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019) +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: ber...@maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
On Fri, Oct 15, 2010 at 4:27 PM, Megh megh700...@yahoo.com wrote: Thanks Gabor for pointing to my old version. However I got one more question why the argument tz= is sitting there? As you are not passing any explicit It would otherwise assume Date class. str(read.zoo(file=dal1.csv, header=TRUE, sep=,, format = %m/%d/%Y %H:%M:%S, aggregate = mean)) ‘zoo’ series from 2010-10-15 to 2010-10-15 Data: num [1, 1:4] 73.7 73.7 73.7 73.7 - attr(*, dimnames)=List of 2 ..$ : chr 2010-10-15 ..$ : chr [1:4] data.open data.high data.low data.close Index: Class 'Date' num 14897 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] feed cut() output into goodness-of-fit tests
Andrei - Looking inside the code for cut, it looks like you could retrieve the breaks as follows: getbreaks = function(x,nbreaks){ nb = nbreaks + 1 dx = diff(rx - range(x,na.rm=TRUE)) seq.int(rx[1] - dx/1000,rx[2] + dx/1000,length.out=nb) } The dx/1000 is what makes cut()'s break different than a simple call to seq(). - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 15 Oct 2010, Andrei Zorine wrote: Hello, My question is assuming I have cut()'ed my sample and look at the table() of it, how can I compute probabilities for the bins? Do I have to parse table's names() to fetch bin endpoints to pass them to p[distr-name] functions? i really don't want to input arguments to PDF functions by hand (nor copy-and-paste way). x.fr - table(cut(x,10)) x.fr (0.0617,0.549] (0.549,1.04](1.04,1.52](1.52,2.01] (2.01,2.5] 16 28 26 18 6 (2.5,2.99](2.99,3.48](3.48,3.96](3.96,4.45](4.45,4.94] 3 2 0 0 1 names(x.fr) [1] (0.0617,0.549] (0.549,1.04] (1.04,1.52](1.52,2.01] [5] (2.01,2.5] (2.5,2.99] (2.99,3.48](3.48,3.96] [9] (3.96,4.45](4.45,4.94] -- Andrei Zorine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using optimize with two unknowns, e.g. to parameterize a distribution with given confidence interval
Hi, I would like to write a function that finds parameters of a log-normal distribution with a 1-alpha CI of (x_lcl, x_ucl): However, I don't know how to optimize for the two unknown parameters. Here is my unsuccessful attempt to find a lognormal distribution with a 90%CI of 1,20: prior - function(x_lcl, x_ucl, alpha, mean, var) { a - (plnorm(x_lcl, mean, var) - (alpha/2))^2 b - (plnorm(x_ucl, mean, var) - (1-alpha/2))^2 return(a+b) } optimize(fn=prior, interval = c(-5, 100), 1, 20) I understand that this problem has a closed form solution, but I would like to make this a general function. Thanks, David -- David LeBauer, PhD Energy Biosciences Institute University of Illinois Urbana-Champaign 1206 W. Gregory Drive Urbana, IL 61801, U.S.A. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using apply function and storing output
On Fri, Oct 15, 2010 at 6:46 AM, David A. dasol...@hotmail.com wrote: Thanks Dennis, I don't think it was a problem of not feeding in a function for rollapply(), because I was using mean() and my co.var() function in the FUN argument. The key part seems to be the transformation that zoo() does to the matrix. If I do the same transformation to my original matrix, the rollapply() function now works fine. rollapply has ts and zoo methods so you have to pass it an object of one those classes: methods(rollapply) [1] rollapply.ts* rollapply.zoo* Non-visible functions are asterisked -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dealing with Non-Standard Hours
A data set I obtained has the hours running from 01 through 24 rather than the conventional 00 through 23. My favorite, strptime, balks at hour 24. I thought it would be easy to correct but it must be too late on Friday for my brain and caffeine isn't helping. TIA for a hint, Clint -- Clint BowmanINTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recovering x/y coordinates from a scatterplot image
On Fri, Oct 15, 2010 at 9:46 PM, Rob James aetiolo...@gmail.com wrote: Do I recall correctly that there is an R package that can take an image, and help one estimate the x/y coordinates? I can't find the package, thought it was an R-tool, but would appreciate any leads. Thanks, I dont know an R package for this, but there are various programs to do it, none of which I can ever remember the name once every six years when I need it. Some searching on the internet has found this one: http://plotdigitizer.sourceforge.net/ but doubtless there are others! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compiling and testing R-2.12.0
I downloaded the tarball for R-2-12.0, made ./configure make without problems. Then make test ...which have now been running for more than an hour, and seems to have stalled at: comparing 'reg-plot-latin1.ps' to './reg-plot-latin1.ps.save' ... OK make[3]: Leaving directory `/home/kjetil/R/R-2.12.0/tests' make[2]: Leaving directory `/home/kjetil/R/R-2.12.0/tests' make[2]: Entering directory `/home/kjetil/R/R-2.12.0/tests' running tests of Internet and socket functions expect some differences make[3]: Entering directory `/home/kjetil/R/R-2.12.0/tests' running code in 'internet.R' ... at which point it has been for about an hour! ¿Any ideas? I am running in a shell from within emacs23, on ubuntu 10.10 maverick. Kjetil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Parameter extract
Awesome! It worked. Thank you both for your help. -joe -- View this message in context: http://r.789695.n4.nabble.com/Data-Parameter-extract-tp2996369p2997761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dealing with Non-Standard Hours
You could have posted an example of your data. You can use 'sub' to substitute one set of characters for another in your data. There are other ways of doing it if we had an example of your data. On Fri, Oct 15, 2010 at 5:55 PM, Clint Bowman cl...@ecy.wa.gov wrote: A data set I obtained has the hours running from 01 through 24 rather than the conventional 00 through 23. My favorite, strptime, balks at hour 24. I thought it would be easy to correct but it must be too late on Friday for my brain and caffeine isn't helping. TIA for a hint, Clint -- Clint Bowman INTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compiling and testing R-2.12.0
On Fri, 15-Oct-2010 at 07:41PM -0300, Kjetil Halvorsen wrote: | I downloaded the tarball for R-2-12.0, made | ./configure | make | | without problems. | | Then | make test | ...which have now been running for more than an hour, and seems to | have stalled at: | comparing 'reg-plot-latin1.ps' to './reg-plot-latin1.ps.save' ... OK | make[3]: Leaving directory `/home/kjetil/R/R-2.12.0/tests' | make[2]: Leaving directory `/home/kjetil/R/R-2.12.0/tests' | make[2]: Entering directory `/home/kjetil/R/R-2.12.0/tests' | running tests of Internet and socket functions | expect some differences | make[3]: Entering directory `/home/kjetil/R/R-2.12.0/tests' | running code in 'internet.R' ... | | at which point it has been for about an hour! Do you access the internet via a proxy server? In my experience, the HTTP_PROXY variable isn't read by the R process that does those tests. It doesn't matter much since it times out after about 5 minutes. I think the internet tests are the last to run, so you've probably passed all the others. HTH | | ¿Any ideas? | I am running in a shell from within emacs23, on ubuntu 10.10 maverick. | | Kjetil | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble with installing R-2.12.0 from source on Windows
Dear R People: I'm trying to install R-2.12.0 from source on a Netbook with Windows XP. I have installed the Rtools.exe (version 2.12) However, when I enter tar xvfz R-2.12.0.tar.gz I keep getting the message cannot change owneship to uid 501, gid 20 invalid argument Has anyone else run across this, please? Thanks, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
On Fri, Oct 15, 2010 at 9:56 PM, Megh Dal megh700...@yahoo.com wrote: However I have noticed a strange thing. Placing of tz = matters here: head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, format = %m/%d/%Y %H:%M:%S), tz = ) Your tz argument has been passed as an argument of head. You want it as an argument of read.zoo . data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique You are missing the aggregate= argument to read.zoo which is needed if you have duplicate times in your input to tell it how to resolve them. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time vs Concentration Graphs by ID
Hi: On Fri, Oct 15, 2010 at 4:29 AM, Anh Nguyen eataban...@gmail.com wrote: Thank you for the very helpful tips. Just one last question: - In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs PREDconcentration in one graph (per individual)? You said in lattice you would replace 'smooth' by 'l' in the type = argument of xyplot() that just means now the OBSconc is replaced by a line instead of points and PREDconc is not plotted, right? My mistake. That requires a little trickery. Try using both observed and predicted as responses in the xyplot, as in xyplot(conc + pred ~ Time | ID, ..., type = c('p', 'l'), distribute.type = TRUE, ...) where the ... refers to the other arguments in the xyplot() call. If that works (and that *is* an if since it's untested), then conc should be plotted as points and pred as lines on the same (x, y) scale. See section 5.3 of the Lattice book by Deepayan Sarkar; the code for the book example is on its web site: http://lmdvr.r-forge.r-project.org/figures/figures.html The figure in question is Figure 5.12, but it uses data from several previous figures, so you might just want to copy and paste everything from just before Figure 5.10 up to it - the data frame in use is SeatacWeather. -- I tried to superimpose 2 lattices but that won't work. No, it won't. You might be able to get away with that in base graphics (add = TRUE or lines/points), but not in Lattice. Actually, ggplot2 is better positioned to handle that case 'easily'. I can keep the x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs conc = 80 and thus for this case, there would be 2 different y-scales superimposed on one another. You may need to plot concentration first if it has the higher y limit. No guarantee that will work, but it's worth a shot. If it fails, you can set the y-limits per panel manually using scales, something like scales = list(y = list(limits = list(c(0, 60), c(0, 80), c(0, 60), c(0, 60 in panel order, but again, that's untested and I'm not certain it would work. Better solutions/corrections welcome. - the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there are only 5,7,10 mg doses) but it's nothing that photoshop can't take care of. You must have dose as numeric in your input data frame. If you convert it to factor (e.g, with a new variable) and use the factor version of dose in ggplot2, it should work. This is very cool, I think I understand it a lot better now. It was a lot easier than what I was doing before, that's for sure. Thanks! For complex requests like these, it *really* helps to have a template data frame to work with. It took me about a half hour to rearrange the Theoph data frame from nlme into shape so that I could get a plot that remotely resembled what you were looking for. Alas, I don't have the time today to do the same for this request. HTH, Dennis On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: To get the plots precisely as you have given them in your png file, you're most likely going to have to use base graphics, especially if you want a separate legend in each panel. Packages ggplot2 and lattice have more structured ways of constructing such graphs, so you give up a bit of freedom in the details of plot construction to get nicer default configurations. Perhaps you've written a panel function in S-PLUS (?) to produce each graph in your png file - if so, you could simply add a couple of lines to determine the max y-value and from that the limits of the y scale. It shouldn't be at all difficult to make such modifications. Since R base graphics is very similar to base graphics in S-PLUS, you could possibly get away with popping your S-PLUS code directly into R and see how far that takes you. Lattice is based on Trellis graphics, but the syntax in lattice has changed a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more recent graphics system in R predicated on the grammar of graphics exposited by Leland Wilkinson. For my example, I've modified the Theophylline data set in package nlme, described in Pinheiro Bates (2000), Mixed Effects Models in S and S-PLUS, Springer. The original data set has eleven unique doses - I combined them into three intervals and converted them to factor with cut(). I also created four groups of Subjects and put them into a variable ID. The output data frame is called theo. I didn't fit the nlme models to the subjects - instead, I was lazy and used loess smoothing instead. The code to generate the data frame is given below; this is what we mean by a 'reproducible example', something that can be copied and pasted into an R session. # Use modified version of Theophylline data in package nlme library(nlme) theo - Theoph theo - subset(theo, Dose 3.5) theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25', '4.8', '5.5')) theo -
Re: [R] Recovering x/y coordinates from a scatterplot image
Hi Rob: Are you thinking of the digitize package? HTH, Dennis On Fri, Oct 15, 2010 at 1:46 PM, Rob James aetiolo...@gmail.com wrote: Do I recall correctly that there is an R package that can take an image, and help one estimate the x/y coordinates? I can't find the package, thought it was an R-tool, but would appreciate any leads. Thanks, Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
However I have noticed a strange thing. Placing of tz = matters here: head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, format = %m/%d/%Y %H:%M:%S), tz = ) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, tz = , format = %m/%d/%Y %H:%M:%S)) data.open data.high data.low data.close 2010-10-15 09:00:24 74.35 74.3574.35 74.35 2010-10-15 09:01:15 74.30 74.3074.30 74.30 2010-10-15 09:01:21 74.35 74.3574.35 74.35 2010-10-15 09:01:27 74.20 74.2074.20 74.20 2010-10-15 09:01:30 74.25 74.2574.25 74.25 2010-10-15 09:01:36 74.25 74.2574.25 74.25 Warning message: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique Is it a bug or a rule that for any function, placing of it's arguments matter? Thanks, --- On Sat, 10/16/10, Megh Dal megh700...@yahoo.com wrote: From: Megh Dal megh700...@yahoo.com Subject: Re: [R] Problem with merging two zoo objects To: Gabor Grothendieck ggrothendi...@gmail.com Cc: r-help@r-project.org Date: Saturday, October 16, 2010, 7:20 AM I dont know whether I am missing something or not: head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y %H:%M:%S), tz=GMT) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y %H:%M:%S)) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique In either case, I am missing the time component. Where I am going wrong? Thanks, --- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote: From: Gabor Grothendieck ggrothendi...@gmail.com Subject: Re: [R] Problem with merging two zoo objects To: Megh megh700...@yahoo.com Cc: r-help@r-project.org Date: Saturday, October 16, 2010, 2:33 AM On Fri, Oct 15, 2010 at 4:27 PM, Megh megh700...@yahoo.com wrote: Thanks Gabor for pointing to my old version. However I got one more question why the argument tz= is sitting there? As you are not passing any explicit It would otherwise assume Date class. str(read.zoo(file=dal1.csv, header=TRUE, sep=,, format = %m/%d/%Y %H:%M:%S, aggregate = mean)) ‘zoo’ series from 2010-10-15 to 2010-10-15 Data: num [1, 1:4] 73.7 73.7 73.7 73.7 - attr(*, dimnames)=List of 2 ..$ : chr 2010-10-15 ..$ : chr [1:4] data.open data.high data.low data.close Index: Class 'Date' num 14897 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with merging two zoo objects
I dont know whether I am missing something or not: head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y %H:%M:%S), tz=GMT) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y %H:%M:%S)) data.open data.high data.low data.close 2010-10-15 73.7 73.7 73.7 73.7 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 2010-10-15 73.8 73.8 73.8 73.8 Warning messages: 1: In zoo(rval3, ix) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique 2: In zoo(rval, x.index[i]) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique In either case, I am missing the time component. Where I am going wrong? Thanks, --- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote: From: Gabor Grothendieck ggrothendi...@gmail.com Subject: Re: [R] Problem with merging two zoo objects To: Megh megh700...@yahoo.com Cc: r-help@r-project.org Date: Saturday, October 16, 2010, 2:33 AM On Fri, Oct 15, 2010 at 4:27 PM, Megh megh700...@yahoo.com wrote: Thanks Gabor for pointing to my old version. However I got one more question why the argument tz= is sitting there? As you are not passing any explicit It would otherwise assume Date class. str(read.zoo(file=dal1.csv, header=TRUE, sep=,, format = %m/%d/%Y %H:%M:%S, aggregate = mean)) ‘zoo’ series from 2010-10-15 to 2010-10-15 Data: num [1, 1:4] 73.7 73.7 73.7 73.7 - attr(*, dimnames)=List of 2 ..$ : chr 2010-10-15 ..$ : chr [1:4] data.open data.high data.low data.close Index: Class 'Date' num 14897 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R MySQL (Databases)
Dear R-helpers, Considering that a substantial part of analysis is related data manipulation, I'm just wondering if I should do the basic data part in a database server (currently I have the data in .txt file). For this purpose, I am planning to use MySQL. Is MySQL a good way to go about? Are there any anticipated problems that I need to be aware of? Considering, that many users here use large datasets. Do you typical store the data in databases and query relevant portions for your analysis? Does it speed up the entire process? Is it neater to do things in a database? (for e.g. errors could corrected at data import stage itself by conditions in defining the data itself in the database as opposed to discovering things when you do the analysis in R and realize something is wrong in the output?) This is vis-à-vis using the built in SQLLite, indexing, etc capabilities in R itself? Does performance work better with a database backend (especially for simple but large datasets)? The financial applications that I am thinking of are not exactly realtime but quick response and fast performance would definitely help. Aside info, I want to take things to a cloud environment at some point of time just because it will be easier and cheaper to deliver. Kind of an open question, but any inputs will help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
cheers mate that worked fine thanks a bunch -- View this message in context: http://r.789695.n4.nabble.com/convert-factor-data-to-numeric-tp1012855p2996495.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AIC in bestglm, glm, and lm - why do they differ?
I recently found the ?bestglm? package while trolling CRAN for a function to save some keystrokes with simple (k10) nested models. I am interested in using the best of several nested linear models which I have explored using lm(), glm() and now bestglm(). When I compare the AIC values for the ?best? candidate model I get the same AIC values with lm() and glm() but a different AIC (and likelihood) value from bestglm(). Is this the result of some difference in likelihood calculation that I am missing in reviewing the documentation and help files? I can provide code if there is interest in looking into this, otherwise I will continue to assemble my tables the long way with glm() and lm(), though the options and potential of the bestglm() package has me very interested. Cheers, Darren Gillis Biological Sciences University of Manitoba Winnipeg, MB __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time vs Concentration Graphs by ID
I found 2 problems with this method: - There is only one line for predicted dose at 5 mg. - The different doses are 5, 7, and 10 mg but somehow there is a legend for 5,6,7,8,9,10. - Is there a way to make the line smooth? - The plots are also getting a little crowded and I was wondering if there a way to split it into 2 or more pages? Thanks for your help. On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn iz...@psych.rochester.eduwrote: Hi, Assuming the data is in a data.frame named D, something like library(ggplot2) # May need install.packages(ggplot2) first ggplot(D, aes(x=Time, y=Concentration, color=Dose) + geom_point() + geom_line(aes(y = PredictedConcentration, group=1)) + facet_wrap(~ID, scales=free, ncol=3) should do it. -Ista On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo eataban...@gmail.com wrote: Hello-- I have a data for small population who took 1 drug at 3 different doses. I have the actual drug concentrations as well as predicted concentrations by my model. This is what I'm looking for: - Time vs Concentration by ID (individual plots), with each subject occupying 1 plot -- there is to be 9 plots per page (3x3) - Observed drug concentration is made up of points, and predicted drug concentration is a curve without points. Points and curve will be the same color for each dose. Different doses will have different colors. - A legend to specify which color correlates to which dose. - Axes should be different for each individual (as some individual will have much higher drug concentration than others) and I want to see in detail how well predicted data fits observed data. Any help would be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Time-vs-Concentration-Graphs-by-ID-tp2996431p2996431.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RJava help
Le 14/10/10 20:28, lord12 a écrit : I need help with RJava. So I run this R code: library(rJava) #load the rJava library .jinit(classpath=c:/Documents and Settings/GV/workspace/Test/src, parameters=-Xmx512m) #the above is to load the Java virtual machine, x = runif(1000) y = runif(1000) #the above are two vectors to convolve #now you can call the Java method as z=.jcall(Test, [D, convolve, x,y) I get an error Error in .jcall(Test, [D, convolve, x, y) : RcallMethod: cannot determine object class. Why is this? It would be useful if you showed your Test java class and repost on the appropriate mailing list for rJava: http://mailman.rz.uni-augsburg.de/mailman/listinfo/stats-rosuda-devel Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/b8wOqW : LondonR Rcpp slides |- http://bit.ly/cCmbgg : Rcpp 0.8.6 `- http://bit.ly/bzoWrs : Rcpp svn revision 2000 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast rowCumsums wanted for calculating the cdf
Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregor mailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F - t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this - probably routine - task? Thanks in advance, Gregor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare histograms
On Fri, Oct 15, 2010 at 2:47 AM, Michael Bedward michael.bedw...@gmail.comwrote: Hi Rainer, Great - many thanks for that. Yes, I'm using OSX I initially tried to use install.packages to get get a pre-built binary of earthmovdist from Rforge, but it failed with... In getDependencies(pkgs, dependencies, available, lib) : package earthmovdist is not available Yes - we had some problems with getting the package build for OSX, but we (more specifically Dirk) are working on that. When I tried installing with type=source this was also failing. However, after reading your post I looked at the error messages properly and it turned out to be a trivial problem. The .First function defined in my .Rprofile was printing some text to the console with cat() which was being incorrectly picked up by the package build as if it was a makefile argument. When I commented out the call to cat the package installed successfully. I haven't had this problem installing other packages from source so I think there must be a little problem with your setup (?) Thanks for letting us know - could you send us the offending entry in your .Rprofile (or the whole .Rprofile?), so that we can see if it is an OSX or general problem? Now that it's installed I look forward to trying it out shortly. Great - please give us some feedback on what you think about it. Cheers, Rainer Thanks again. Michael On 15 October 2010 03:17, Rainer M Krug r.m.k...@gmail.com wrote: On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward michael.bedw...@gmail.com wrote: Hi Juan, Yes, you can use EMD to quantify the difference between any pair of histograms regardless of their shape. The only constraint, at least the way that I've done it previously, is to have compatible bins. The original application of EMD was to compare images based on colour histograms which could have all sorts of shapes. I looked at the package that Dennis alerted me to on RForge but unfortunately it seems to be inactive No - well, it depends how you define inactive: the functionality we wanted to include is included, therefore no further development was necessary. and the nightly builds are broken. I've downloaded the source code and will have a look at it sometime in the next few days. Thanks for alerting us - we will look into that. But just don't use the nightly builds, as they are not different to the last release. Just download the package for your system (I assume Windows or mac, as I just installed from source without problems under Linux). Let me know if it doesn't work, Cheers, Rainer Meanwhile, let me know if you want a copy of my own code. It uses the lpSolve package. Michael On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote: Hi Michael, I have the same challenge, can you use this earth movers distance it to compare bimodal distributions? Thanks cheers, Juan On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be
[R] Downloading file with lapply
I'm still getting familiar with lapply I have this date sequence x - seq(as.Date(01-Jan-2010,format=%d-%b-%Y), Sys.Date(), by=1) #to generate series of dates I want to apply the function for all values of x . so I use lapply (Still a newbie!) I wrote this test function pFun - function (x) { print(paste(This is: ,x,sep=)) } When I run it: lapply(x,pFun(x)) It gives the result correctly.. but end with an error Error in match.fun(FUN) : 'pFun(x)' is not a function, character or symbol Question 1: What is the issue here?? Now, to the real problem. I wrote a function to cmDownFun(sDate)which is working correctly If I do cmDownFun (x[6]) .. it works correctly .. (i.e. my function was ok this time around!) Hoewever if I do, lapply (x,cmDownFun(x)) It fails at 01-Jan-2010 this is because of: HTTP status was '404 Not Found' This is OK, because the file does not exist ... but I want to continue for the remaining values of x lapply(x,try(cmDownFun(x),silent = TRUE)) .. does not work I also put the try inside the function itself but does not work either. -- Thanks R-Helpers. Yes, this is a silly question and it will not be repeated! :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast rowCumsums wanted for calculating the cdf
Hi, You might look at Reduce(). It seems faster. I converted the matrix to a list in an incredibly sloppy way (which you should not emulate) because I cannot think of the simple way. probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; add - function(x) {Reduce(`+`, x, accumulate = TRUE)} system.time(F.slow - t(apply(probs, 1, cumsum))) user system elapsed 36.758 0.416 42.464 system.time(for (cc in 2:ncol(F)) { + F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; + }) user system elapsed 0.980 0.196 1.328 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) user system elapsed 0.420 0.072 0.539 .539 seconds for 10 vectors each with 1 million elements does not seem bad. For 3 each, on my system it took .014 seconds, which for 200,000 iterations, I estimated as: (20*.014)/60/60 [1] 0.778 (and this is on a stone age system with a single core processor and only 756MB of RAM) It should not be difficult to get the list back to a matrix. Cheers, Josh On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote: Dear all, Maybe the easiest solution: Is there anything that speaks against generalizing cumsum from base to cope with matrices (as is done in matlab)? E.g.: cumsum(Matrix, 1) equivalent to apply(Matrix, 1, cumsum) The main advantage could be optimized code if the Matrix is extreme nonsquare (e.g. 100,000x10), but the summation is done over the short side (in this case 10). apply would practically yield a loop over 100,000 elements, and vectorization w.r.t. the long side (loop over 10 elements) provides considerable efficiency gains. Many regards, Gregor On Tue, 12 Oct 2010 10:24:53 +0200 Gregor mailingl...@gmx.at wrote: Dear all, I am struggling with a (currently) cost-intensive problem: calculating the (non-normalized) cumulative distribution function, given the (non-normalized) probabilities. something like: probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites F - t(apply(probs, 1, cumsum)) #SLOOOW! One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson: F - matrix(0, nrow=nrow(probs), ncol=ncol(probs)); F[,1] - probs[,1,drop=TRUE]; for (cc in 2:ncol(F)) { F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; } In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but in order to extend matrixStats, this could be a nontrivial issue. Any ideas for speeding up this - probably routine - task? Thanks in advance, Gregor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.