Re: [R] ddply to count frequency of combinations
Brian, I'm a bit confused about how the following line works, specifically, what is happening in freq=length(x)? Is it just taking the length of x after it has been summarized by different combinations x y? I guess that must be the case, because that gives the same result as using freq=length(y) d1-ddply(d, .(x, y), summarize, freq=length(x)) d2-ddply(d, .(x, y), summarize, freq=length(y)) Also, what is the significance of the periods before the second argument in .(x, y) ? Thanks for the help. On Tue, Jun 21, 2011 at 12:54 PM, Brian Diggs dig...@ohsu.edu wrote: On 6/21/2011 11:30 AM, Idris Raja wrote: I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x- c(1,2,3,4,5,1,2,3,4) y- c(1,2,3,4,5,1,2,4,1) df-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y) count # # (1, 1) 2 # (2, 2) 2 # (3, 3) 1 # (4, 4) 1 # (5, 5) 1 # (2, 3) 1 # (3, 4) 1 # (4, 1) 1 [[alternative HTML version deleted]] Jorge and Dennis gave good responses that get you to the result you asked for, but for completeness I thought I'd include some ddply versions: ddply(d, .(x, y), summarize, freq=length(x)) This uses the summarize function you were asking about, however you can also do it with: ddply(d, .(x, y), nrow) or ddply(d, .(x, y), as.data.frame(nrow)) The latter giving a slightly nicer name (value instead of V1). As an aside, I prefer using the summarise spelling of the function when I do use it, because it won't clash with Hmisc::summarize. ddply(d, .(x, y), summarise, freq=length(x)) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange date problem - May 3, 1992 is NA
Hi r-help-boun...@r-project.org napsal dne 22.06.2011 20:40:39: On 6/22/2011 1:34 PM, Sarah Goslee wrote: On Wed, Jun 22, 2011 at 2:28 PM, David Winsemius dwinsem...@comcast.net wrote: On Jun 22, 2011, at 2:03 PM, Sarah Goslee wrote: Hi, On Wed, Jun 22, 2011 at 11:40 AM, Alexander Shenkin ashen...@ufl.edu wrote: is.na(strptime(5/2/1992, format=%m/%d/%Y)) [1] FALSE is.na(strptime(5/3/1992, format=%m/%d/%Y)) [1] TRUE I can't reproduce your problem on R 2.13.0 on linux: I also cannot reproduce it on a Mac with 2.13.0 beta Which strongly suggests that you should start by upgrading your R installation if at all possible. I'd also recommend trying it on a default R session, with no extra packages loaded, and no items in your workspace. It's possible that something else is interfering. On linux, that's achieved by typing R --vanilla at the command line. I'm afraid I don't know how to do it for Windows, but should be similarly straightforward. Thanks Sarah. Still getting the problem. I should surely upgrade, but still, not a bad idea to get to the bottom of this, or at least have it documented as a known issue. BTW, I'm on Windows 7 Pro x64. Can not reproduce on Windows 2000 on R2.12.0dev (2.13.0 and 2.14.0dev) is.na(strptime(5/3/1992, format=%m/%d/%Y)) [1] FALSE sessionInfo() R version 2.12.0 Under development (unstable) (2010-05-31 r52164) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Czech_Czech Republic.1250 LC_CTYPE=Czech_Czech Republic.1250 [3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C [5] LC_TIME=Czech_Czech Republic.1250 attached base packages: [1] stats grDevices datasets utils graphics methods base other attached packages: [1] lattice_0.18-8 fun_1.0 loaded via a namespace (and not attached): [1] grid_2.12.0 tools_2.12.0 So either some problem with your installation or your OS. Regards Petr (running Rgui.exe --vanilla): is.na(strptime(5/3/1992, format=%m/%d/%Y)) [1] TRUE is.na(strptime(5/2/1992, format=%m/%d/%Y)) [1] FALSE sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interaction between categorical variables
On Jun 21, 2011, at 08:39 , taby gathoni wrote: Dear R-users, I need some assistance. I am running some interactive variables for categorical variables. I have dgen(2 levels converted to dummy variables) and dtoe(4-levels also converted to dummy variables). So I have worked with them in two ways: i created a variable X1 = dgen*dtoe and I get an error Error in dgen * dtoe : non-conformable arraysthen i run a glm, binomial using that interaction variable and i get : logit_x = glm(samp2$STATUS ~ dgen*dtoe, data=samp2,family = binomial(logit)) summary(logit_x) Call: glm(formula = samp2$STATUS ~ dgen * dtoe, family = binomial(logit), data = samp2) Deviance Residuals: Min 1Q Median 3Q Max -2.6594 0.2431 0.2563 0.2563 0.2563 Coefficients: (5 not defined because of singularities) Estimate Std. Error z value Pr(|z|) (Intercept) 1.857e+01 4.612e+03 0.0040.997 dgendfemale 1.024e-09 3.766e+03 0.0001.000 dgendmale NA NA NA NA dtoedpermanent1 -1.517e+01 4.612e+03 -0.0030.997 dtoedcontract1 -1.511e+01 4.612e+03 -0.0030.997 dtoedprobation1 2.229e-09 4.982e+03 0.0001.000 dgendfemale:dtoedpermanent1 1.069e-01 3.766e+03 0.0001.000 dgendmale:dtoedpermanent1 NA NA NA NA dgendfemale:dtoedcontract1 1.511e+01 3.962e+03 0.0040.997 dgendmale:dtoedcontract1NA NA NA NA dgendfemale:dtoedprobation1 NA NA NA NA dgendmale:dtoedprobation1 NA NA NA NA (Dispersion parameter for binomial family taken to be 1) Null deviance: 269.48 on 999 degrees of freedom Residual deviance: 266.56 on 993 degrees of freedom AIC: 280.56 Number of Fisher Scoring iterations: 17 The thing is I need the coefficients, the p-values and t-values of all the variables. In other words i do not want an output of NAs. How can I achieve this? Something is odd here. What do you mean converted to dummy variables? Normally you'd use factor variables and let the modelling machinery do the rest. Why do you have two dummies for dgen but only 3 for the four-level dtoe? Notice that you can't have e.g. both a female and a male dummy when there is an intercept in the model (unless you have a 3rd sex in your data) since the two dummies will sum to 1. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Re: Help with winbugs code
Hi folks, I'm forwarding this to the list as my email to nita was about getting her code to the list. Additionally, I'm running Linux and have no experience with WinBUGS. Jim Original Message Subject:Re: [R] Help with winbugs code Date: Thu, 23 Jun 2011 16:49:33 +0700 From: nita yalina taya...@gmail.com To: Jim Lemon j...@bitwrit.com.au Thanks to reply my messagei really appreciate that..here is my code: i also attach a text file. in that code I initial varible y in the initial part, but it make winbugs open a new window undefine real result but when I delete variable y in the initial part it said that there some variable that has to be initialized. what should i do? ...very grateful for your help... model{ for(i in 1:N){ #model persamaan pengukuran for(j in 1:P){ y[i,j]~dnorm(mu[i,j],psi [j]) I(thd [j,z[i,j]],thd[j,z[i,j]+1]) ephat[i,j]-y[i,j] -mu[i,j] } #faktor Budaya Organisasi mu[i,1]-xi[i,1] mu[i,2]-lam[1]*xi[i,1] mu[i,3]-lam[2]*xi[i,1] #faktor Kemampuan Pengguna mu[i,4]-xi[i,2] mu[i,5]-lam[3]*xi[i,2] mu[i,6]-lam[4]*xi[i,2] #faktor Mekanisme Dukungan mu[i,7]-xi[i,3] mu[i,8]-lam[5]*xi[i,3] mu[i,9]-lam[6]*xi[i,3] #faktor Desain Antarmuka mu[i,10]-xi[i,4] mu[i,11]-lam[7]*xi[i,4] mu[i,12]-lam[8]*xi[i,4] #faktor Persepsi Kualitas mu[i,13]-xi[i,5] mu[i,14]-lam[9]*xi[i,5] mu[i,15]-lam[10]*xi[i,5] #faktor Persepsi Kemudahan Kegunaan mu[i,16]-eta[i,1] mu[i,17]-lam[11]*eta[i,1] mu[i,18]-lam[12]*eta[i,1] #faktor Persepsi Kegunaan mu[i,19]-eta[i,2] mu[i,20]-lam[13]*eta[i,2] mu[i,21]-lam[14]*eta[i,2] mu[i,22]-lam[15]*eta[i,2] #faktor Sikap ke arah Penggunaan mu[i,23]-eta[i,3] mu[i,24]-lam[16]*eta[i,3] mu[i,25]-lam[17]*eta[i,3] #faktor Persepsi Niat untuk Menggunakan mu[i,26]-eta[i,4] mu[i,27]-lam[18]*eta[i,4] mu[i,28]-lam[19]*eta[i,4] #faktor Adopsi E-government mu[i,29]-eta[i,5] mu[i,30]-lam[20]*eta[i,5] #model persamaan struktural xi[i,1:5] ~dmnorm(u[1:5],phi[1:5,1:5]) eta[i,1]~dnorm(nu[i,1],pskp) nu[i,1]-gam[1]*xi[i,2]+gam[2]*xi[i,3]+gam[3]*xi[i,4] dthat[i,1]-eta[i,1]-nu[i,1] eta[i,2]~dnorm(nu[i,2],pspk) nu[i,2]-gam[4]*xi[i,1]+beta[1]*eta[i,1] dthat[i,2]-eta[i,2]-nu[i,2] eta[i,3]~dnorm(nu[i,3],pssp) nu[i,3]-beta[2]*eta[i,2]+beta[3]*eta[i,3] dthat[i,3]-eta[i,3]-nu[i,3] eta[i,4]~dnorm(nu[i,4],psnm) nu[i,4]-beta[4]*eta[i,1]+beta[5]*eta[i,2]+gam[5]*xi[i,5] dthat[i,4]-eta[i,4]-nu[i,4] eta[i,5]~dnorm(nu[i,5],psae) nu[i,5]-beta[6]*eta[i,4] dthat[i,5]-eta[i,5]-nu[i,5] }#akhir dari i for (i in 1:5) {u[i]-0.0} #lamda var.lam[1]-8.0*psi[2] var.lam[2]-8.0*psi[3] var.lam[3]-8.0*psi[5] var.lam[4]-8.0*psi[6] var.lam[5]-8.0*psi[8] var.lam[6]-8.0*psi[9] var.lam[7]-8.0*psi[11]var.lam[8]-8.0*psi[12] var.lam[9]-8.0*psi[14] var.lam[10]-8.0*psi[15] var.lam[11]-8.0*psi[17] var.lam[12]-8.0*psi[18] var.lam[13]-8.0*psi[20] var.lam[14]-8.0*psi[21] var.lam[15]-8.0*psi[22] var.lam[16]-8.0*psi[24]var.lam[17]-8.0*psi[25] var.lam[18]-8.0*psi[27] var.lam[19]-8.0*psi[28] var.lam[20]-8.0*psi[30] for (i in 1:20) {lam[i] ~dnorm(1,var.lam[i])} for (j in 1:P) { psi[j] ~dgamma(10,8) sgl[j]-1/psi[j] } #gamma gam[1]~dnorm(0.4,var.pk http://var.pk) gam[2]~dnorm(0.5,var.kp http://var.kp) gam[3]~dnorm(0.4,var.kp http://var.kp) gam[4]~dnorm(0.6,var.kp http://var.kp) gam[5]~dnorm(0.1,var.nm) var.pk http://var.pk -8.0*pspk pspk~dgamma(10,8) sgpk-1/pspk var.kp http://var.kp -8.0*pskp pskp~dgamma(10,8) sgkp-1/pskp var.sp -8.0*pssp pssp~dgamma(10,8) sgsp-1/pssp var.nm -8.0*psnm psnm~dgamma(10,8) sgnm-1/psnm var.ae http://var.ae -8.0*psae psae~dgamma(10,8) sgae-1/psae #beta beta[1] ~dnorm(0.4,var.pk http://var.pk) beta[2] ~dnorm(0.5,var.sp) beta[3] ~dnorm(0.6,var.sp) beta[4] ~dnorm(0.6,var.nm) beta[5] ~dnorm(0.5,var.nm) beta[6] ~dnorm(0.4,var.ae http://var.ae) phi[1:5,1:5] ~dwish(R[1:5,1:5],30) phx[1:5,1:5]-inverse(phi[1:5,1:5]) } #end of model DATA list(N=43, P=30, R=structure( .Data=c(10,0,0,0,0, 0,10,0,0,0, 0,0,10,0,0, 0,0,0,10,0, 0,0,0,0,10 ), .Dim=c(5,5)), thd=structure(
Re: [R] plotmath: unexpected SPECIAL
Folks, the relevant thing you have to remember is: All the stuff must be valid R syntax (with few additional functions as mention in the ?plotmath help file). Knowing that it is obvious where additional operators are required. Best, Uwe Ligges On 23.06.2011 02:56, Bryan Hanson wrote: Thanks to both David and Sarah. I'm glad I asked, as I had tried some of the combos Sarah suggested and observed the same behavior, which puzzled me. David, thanks for reminding me about ~ as that is a different way to get a space into the string. I just don't use plotmath often enough to become decent at it. Thanks again. Bryan On Jun 22, 2011, at 8:49 PM, David Winsemius wrote: On Jun 22, 2011, at 8:10 PM, Bryan Hanson wrote: Hello R Masters and the Rest of Us: The first of these works fine, the 2nd is accepted but too literal (the %-% is shown in the plot label and in the wrong position). The 3rd throws and error due to unexpected SPECIAL. Would someone recommend a way to format this? I want the two phrases connected by a right arrow. TIA, these things always elude me. Bryan *** Bryan Hanson Professor of Chemistry Biochemistry DePauw University xlab1 -expression(paste(Phase Angle , phi, Neat-O)) xlab2 - expression(paste(treatment: low stress, high stress, sep = %-%)) xlab3 - expression(paste(treatment: low stress, %-%, high stress)) plot(1:10, main = xlab1) plot(1:10, main = xlab2) Doesn't seem that %-% works without flanking terms xlab3 - expression(treatment*:~low~stress %-% high~stress) plot(1, main=xlab3) Or: xlab3 - expression(treatment: low stress %-% high~stress) plot(1, main=xlab3) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with winbugs code
1. This list is about R rather than WinBUGS. 2. The attachment did not come through the list anyway. Best, Uwe Ligges On 22.06.2011 09:29, nita yalina wrote: Good afternoon sir, i'm a student in Indonesia who study technology management, i need to review the software i'v been develop, i was told to use bayesian SEM because i don't have large sample. i don't know much about statistics but i try to make a code via winbugs.. and a i got a problem. here i attach my winbugs code and my model. could you hep me to find what's wrong with my code? thank you very much for your answer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with mclapply -- losing output/data
Janet, 1. you posted to the R-help mailing list rather than to Elizabeth, 2. if posting to a mailing list, please quote the original thread otherwise we do not know what you are talking about if we removed the prior part of the thread. Uwe Ligges On 23.06.2011 04:22, Janet Young wrote: Hi Elizabeth, I just found your thread after experiencing a similar problem (I was also using some IRanges/GenomicRanges functions). You've probably figured it out by this time - I'm actually curious to know what you found? I think I've tracked it down in my case, where a small minority of the elements I was trying to lapply/mclapply over caused a genuine error (perhaps in your case a genuine NULL answer - is that possible?). When I used lapply that error was very obvious - it came up on the screen. But when I used mclapply it looked like the function completed fine - it wasn't until later that I saw the error was captured as a string object in the relevant slot of the list object returned by mclapply. I should build in some checking if I want to persist with using mclapply. I wonder if something similar is going on in your case? Janet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saved EPS does not match screen when using bquote(.(i))
On 23.06.2011 07:14, Dennis Murphy wrote: Hi: As Uwe suggested... pdf('testgraph.pdf') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() postscript('testgraph.ps') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() png('testgraph.png') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() The three graphs look the same (although the PS graph is rotated to landscape while the other two are portrait). Yes, thanks, and ?postscript will show arguments such as 'horizontal' and 'paper' that will help to make them really similar. Best, Uwe The main point is that mu_1 and mu_2 show up correctly in the two panels in all three graphs (at least on my viewers). The following thread from last January describes some of the problems that certain viewers have with Greek letters, which appear to be viewer and platform dependent: http://r-project.markmail.org/search/?q=pdf%20incorrect#query:pdf%20incorrect+page:2+mid:egmb6utulrxgcznw+state:results I'm guessing that I've seen about a half dozen or so similar posts in this forum over the past year and a half, so you can check the list archives for related problems. HTH, Dennis On Wed, Jun 22, 2011 at 8:07 PM, John Kruschkejohnkrusc...@gmail.com wrote: Here's a fairly minimal-case example in which the saved EPS does not match the screen. The error comes when using bquote(.(i)) instead of bquote(1), as demonstrated by the two minimally different cases below. Very strange. Any clues as to why? # begin --- # Version A. X axis labels have subscripts as constants. EPS is correct. windows() layout( matrix( 1:2 , nrow=2 ) ) plot( 0 , 0 , xlab=bquote(mu[1]) ) plot( 0 , 0 , xlab=bquote(mu[2]) ) savePlot( file=SavePlotTestA.eps , type=eps ) # Axis labels are correct in EPS. # Version B. X axis labels have subscripts as variable index. EPS is wrong! windows() layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } savePlot( file=SavePlotTestB.eps , type=eps ) # X-AXIS OF PLOT 1 IS WRONG IN EPS. #-- end - Thanks! John K. Kruschke, Professor http://www.indiana.edu/%7Ekruschke/DoingBayesianDataAnalysis/ 2011/6/22 Uwe Liggeslig...@statistik.tu-dortmund.de On 22.06.2011 13:50, John Kruschke wrote: The error happens when using the savePlot() command, like this: savePlot( file=TestSavePlot.eps , type=eps ) savePlot( file=TestSavePlot.jpg , type=jpg ) Well, plot directly into a device, for postscript: postscript(estSavePlot.eps, additionalArguments .) plot(1:10) dev.off() Uwe Ligges The images in the two saved files are not the same, with the JPG being correct but the EPS being wrong. When you suggest starting separate devices explicitly, what do you mean? (I've skimmed through the results of ??device, but can't make sense of it.) Thank you! John K. Kruschke, Professor 2011/6/22 Uwe Liggeslig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de I guess you use the menu to save the plots from your Windows device into files rather than starting separate devices explicitly? If so, please use explicit calls to the devices and everything happens as you define it. Uwe Ligges On 22.06.2011 04:31, John Kruschke wrote: When I save a particular plot as JPG or BMP, the saved image is an accurate copy of that plot. But when I save the same plot as EPS or PDF, some text elements are incorrectly altered. An example is attached. Notice in the top middle and top right panels, the x-axis labels have correct subscript 1 in the JPG, but incorrect subscript 2 in the EPS. I'm using R 2.13.0 on Windows 7. Any clues regarding the source of this error and its solution would be appreciated. For example, are there EPS/PDF device drivers that need to be separately updated? Many thanks. John K. Kruschke, Professor http://www.indiana.edu/%7Ekruschke/DoingBayesianDataAnalysis/ htt**p://www.indiana.edu/%**7Ekruschke/**DoingBayesianDataAnalysis/http://www.indiana.edu/%7Ekruschke/DoingBayesianDataAnalysis/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.htmlhttp://www.**R-project.org/posting-guide.**htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] please help for mgcv package
In the 'inst/scripts' directory of package 'gamair' there are files containing the R code for each chapter of the book (questionable layout uncorrected, but typo free, at least!). Simon On 21/06/11 16:27, pigpigmeow wrote: i read a book from WOOD, there's an example which is talking about the pollutant. library(gamair) library(mgcv) y-gam(death~s(time,bs=cr,k=200)+s(pm10median,bs=cr)+s(so2median,bs=cr)+s(o3median,bs=cr)+s(tmpd,bs=cr),data=chicago,family=Possion) lag.sum-function(a,10,11) {n-length(a) b-rep(0,n-11) for(i in 0:(11-10)) b-b+a[(i+1):(n-11+i)] b} death-chicago$death[4:5114] time-chicago$time[4:5114] o3-lag.sum(chicago$o3median,0,3) tmp-lag.sum(chicago$tmpd,0,3) pm10-lag.sum(log(chicago$pm10median+40),0,3) so2-lag.sum(log(chicago$so2median+10),0,) I don't know what is the script (Bold font ) used for.. and it shows Error: unexpected numeric constant in lag.sum-function(a,10 , why? anyone can answer me? -- View this message in context: http://r.789695.n4.nabble.com/please-help-for-mgcv-package-tp3614485p3614485.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subtracting TimeStamps
Hi All,, I am new to R and having a problem dealing with timestamps. I have 2 columns in my table . Both of these have timestamps value in format 06/22/11 05:34 PM . I want to calculate the difference in these 2 columns ( in minutes) and save it as a 3rd column. Can anyone help ? Thanks Sumit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing rows of zeros from a matrix
On Jun 2, 2011, at 11:35 AM, Petr Savicky wrote: On Thu, Jun 02, 2011 at 11:23:28AM -0400, Jim Silverton wrote: Hi, Can someone tell me how to remove rows of zeros from a matrix? For example if I have the following matrix, 0 0 0 1 2 8 0 0 4 56 I should end up with 0 1 2 8 4 56 Hi. Try the following a - matrix(c(0, 0, 2, 0, 4, 0, 1, 8, 0, 56), ncol=2) a[rowSums(a != 0) != 0, ] To avoid removing rows where non-zero elements do sum to 0 one could use the only slightly longer test that first converts a to logical: a - matrix(c(1, 0, 2, 0, 4, -1, 1, 8, 0, 56), ncol=2) a[ rowSums(a==0) != ncol(a), ] [,1] [,2] [1,]1 -1 [2,]01 [3,]28 [4,]4 56 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MST dissimilarity
Dear R-helpers, I need to quantify dissimilarity of two minimum spanning trees, specifically dissimilarity of their topologies. (They connect the same objects but they are calculated from different sets of variables.) Are you aware of any R-function doing this? Best regards Ondrej Mikula __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subtracting TimeStamps
On Jun 23, 2011, at 7:26 AM, sumit gupta wrote: Hi All,, I am new to R and having a problem dealing with timestamps. I have 2 columns in my table . Both of these have timestamps value in format 06/22/11 05:34 PM . If these are Date or DateTime objects you can use: ?difftime # with attention to the units argument ?DateTimeClasses If they are not yet in a correct class you need to first use: ?strptime I want to calculate the difference in these 2 columns ( in minutes) and save it as a 3rd column. When examples are provided -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time-series analysis with treatment effects - statistical approach
From: rvarad...@jhmi.edu To: marchy...@hotmail.com; jmo...@student.canterbury.ac.nz; r-help@r-project.org Subject: RE: [R] Time-series analysis with treatment effects - statistical approach Date: Thu, 23 Jun 2011 02:59:19 + If you have any specific features of the time series of soil moisture, you could either model that or directly estimate it and test for differences in the 4 treatments. If you do not have any such specific considerations, you might want to consider some nonparametric approaches such as functional data analysis, in particular functional principal components analysis (fPCA) might be relevant. You could also consider semiparametric methods. For example, take a look at the SemiPar package. Ravi. I guess just playing with it while waiting for other code to finish, I'd be curious if you had any controlled tests such as impulse response- what did treatment do when you held at constant temp and humidity and illumination in stll air after single burst of rain? If you were pursing the model approach, quick look suggests qualitatitve rather than just quantitative effects - in one case looks like linear or biphasic dry out dynamics, others seem to just fall off of cliff. Objective of course matters too, if you are trying to sell this to farmers, maybe a plot of moisture for each treatment against control would help. I just did that after averaging over sensors and it may be a reasonable analysis for cost effectiveness if you can translate moisture into dollars. Now you would still need to put error bars on comparisons and use words carefully etc but that approach may be more important than getting at dynamics. I dunno. Consider that in fact maybe all you care about is peaks, if too dry for one day kills the crop then that is what you want to focus the analysis on etc etc etc. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Mike Marchywka [marchy...@hotmail.com] Sent: Wednesday, June 22, 2011 9:31 PM To: jmo...@student.canterbury.ac.nz; r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Date: Wed, 22 Jun 2011 17:21:52 -0700 From: jmo...@student.canterbury.ac.nz To: r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Hi Mike, here's a sample of my data so that you get an idea what I'm working with. Thanks, data helps make statements easier to test :) I'm quite busy at moment but I will try to look during dead time. http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt SampleDataSet.txt Also, I've uploaded an image showing a sample graph of daily soil moisture by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments. Also, I've included precipitation to show the soil moisture response to precip. Personally I'd try to write a simple physical model or two and see which one(s) fit best. It shouldn't be too hard to find sources and sinks of water and write a differential equation with a few parameters. There are probably online lecture notes that cover this or related examples. You probably suspect a mode of action for the treatments, see if that is consistent with observed dyanmics. You may need to go get temperature and cloud data but it may or may not be worth it. http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg I have used ANOVA previously, but I don't like it for 2 reasons. The first is that I have to average away all of the interesting variation. But mainly, There are a number of assumptions that go into that to make it useful. If you are just drawing samples from populations of identical independent things great but here I would look at things related to non-stationary statistics of time series. it becomes quite cumbersome to do a separate ANOVA for each day (700+ days) or even each week (104 weeks). I discovered a way to do repetitive tasks that can be concisely specified using something called a computer. Writing loops is pretty easy, don't give up due to cumbersomeness. Also, you could try a few simple things like plotting difference charts ( plot treatment minus control for example). If you approach this purely empirically, there are time series packages and maybe the econ/quant financial analysts would have some thoughts that wouldn't be well known in your field. Thanks for your help, -Justin - - - - - - Mike Marchywka | V.P. Technology 415-264-8477 marchy...@phluant.com Online Advertising and Analytics for Mobile http://www.phluant.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] numerical integration and 'non-finite function value' error
Here is a self-contained example of my problem. set.seed(100) x = rbeta(100, 10.654, 10.439) # So the shape parameters and the exteremes are a = 10.654 b = 10.439 xmax = 1 xmin = 0 # Using the non-standardized form (as in my application and this shouldn't make any difference) of the # Beta density function, I specify the integrand (i.e., xf(x)) as integrand = function(x) {x*((1/beta(a,b))*((x^(a - 1)*(xmax - x)^(b-1)) / xmax^(a + b - 1)))} # Say I want to integrate in the range (0, 0.45) and then in (0.45, Inf) # In (0, 0.45) integrate(integrand, lower = 0, upper = 0.45)$val [1] 0.1176079 # In (0.45, Inf) integrate(integrand, lower = 0.45, upper = Inf)$val Error in integrate(integrand, lower = 0.45, upper = Inf) : non-finite function value The same thing happens when I integrate f(x) only. Thanks again. -- View this message in context: http://r.789695.n4.nabble.com/numerical-integration-and-non-finite-function-value-error-tp3618486p3619761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC() vs. mle.aic() vs. step()?
The packages is wle. I'll put together some code that shows the behavior I'm talking about, and send it to the list. Alexandra On Thu, 2011-06-23 at 13:51 +0200, Rubén Roa wrote: I don't find the mle.aic function. Thus it does not ship with R and it's in some contributed package. What package is that? If you had asked for help providing minimal, self-contained, reproducible code, you'd have realized that you need to tell people what package you are using. ___ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Alexandra Thorn Enviado el: miércoles, 22 de junio de 2011 22:38 Para: r-help@r-project.org Asunto: [R] AIC() vs. mle.aic() vs. step()? I know this a newbie question, but I've only just started using AIC for model comparison and after a bunch of different keyword searches I've failed to find a page laying out what the differences are between the AIC scores assigned by AIC() and mle.aic() using default settings. I started by using mle.aic() to find the best submodels, but then I wanted to also be able to make comparisons with a couple of submodels that were nowhere near the top, so I started calculating AIC values using AIC(). What I found was that not only the scores, but also the ranking of the models was different. I'm not sure if this has to do with the fact that mle.aic() scores are based on the full model, or some sort of difference in penalties, or something else. Could anybody enlighten me as to the differences between these functions, or how I can use the same scoring system to find the best models and also compare to far inferior models? Failing that, could someone point me to an appropriate resource that might help me understand? Thanks in advance, Alexandra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rms package - problems with fit.mult.impute
Hi! Does anyone know how to do the test for goodness of fit of a logistic model (in rms package) after running fit.mult.impute? I am using the rms and Hmisc packages to do a multiple imputation followed by a logistic regression model using lrm. Everything works fine until I try to run the test for goodness of fit: residuals(type=c(gof)) One needs to specify y=T and x=T in the fit. But I get a warning message when I do that with fit.multiple.impute. a-aregImpute(~med.hist.err+ med.discr+newLiving+No.drugs+Days.categ+Los+Age+Ward+Sex, n.impute=20, nk=0,data=med.err) ddist-datadist(Age,No.drugs,Days.categ, Sex, Living, Ward) options(datadist=ddist) fmi-fit.mult.impute(med.hist.err~Age+No.drugs+Days.categ+Sex+Living+Ward, fitter=lrm, x=T, y=T,a,data=med.err) Error in 1:n.impute : NA/NaN argument In addition: Warning message: In 1:n.impute : numerical expression has 18 elements: only the first used It works to do the fit.mult.impute without x and y=T but then I get the following warning message when running residuals gof-residuals(fmi, type=c(gof)) Error in residuals.lrm(fmi, type = c(gof)) : you did not specify y=T in the fit It was no problem to do the goodness of fit test when I ran the lrm on my complete data set without multiple imputation and fit.mult.impute. model.lrm-lrm(med.hist.err~Age+No.drugs+Days +Sex+Living+Ward, x=TRUE, y=TRUE) gof-residuals(model.lrm, type=c(gof)) Thanks Lina _ PhD student Linnaeus University Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The R Journal Vol. 3/1 now published
Dear All, The first issue of the third volume of The R Journal is now available at http://journal.r-project.org/current.html. Thanks to everyone involved. Heather -- Editor in chief heather.tur...@r-project.org ___ r-annou...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-announce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numerical integration and 'non-finite function value' error
On Jun 23, 2011, at 8:55 AM, Adan_Seb wrote: Here is a self-contained example of my problem. set.seed(100) x = rbeta(100, 10.654, 10.439) # So the shape parameters and the exteremes are a = 10.654 b = 10.439 xmax = 1 xmin = 0 # Using the non-standardized form (as in my application and this shouldn't make any difference) of the # Beta density function, I specify the integrand (i.e., xf(x)) as integrand = function(x) {x*((1/beta(a,b))*((x^(a - 1)*(xmax - x)^(b-1)) / xmax^(a + b - 1)))} # Say I want to integrate in the range (0, 0.45) and then in (0.45, Inf) # In (0, 0.45) integrate(integrand, lower = 0, upper = 0.45)$val [1] 0.1176079 # In (0.45, Inf) Right. integrand() is NaN outside [0,1] integrand(1:100) [1] 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [22] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [43] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [64] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [85] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN integrate(integrand, lower = 0.45, upper = Inf)$val Error in integrate(integrand, lower = 0.45, upper = Inf) : non-finite function value The same thing happens when I integrate f(x) only. Thanks again. -- View this message in context: http://r.789695.n4.nabble.com/numerical-integration-and-non-finite-function-value-error-tp3618486p3619761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [example code] RE: AIC() vs. mle.aic() vs. step()?
Ok, here's some example code showing how I get different output for AIC vs. mle.aic(). Now that I've taken another look at the independent variables, I'm wondering whether missing values in one of the variables might be what is messing me up. I'm going to see if the behavior changes when I remove those... Alexandra #Code with outputs R require(wle) R xA # 1st independent variable (categorical) [1] Diffuse Diffuse Diffuse Diffuse Diffuse Diffuse Diffuse Diffuse Diffuse [10] Diffuse Diffuse RingDiffuse Diffuse RingDiffuse Diffuse Diffuse [19] Diffuse Diffuse RingDiffuse Diffuse Diffuse Diffuse Diffuse Diffuse [28] Diffuse Diffuse RingRingDiffuse Diffuse RingRing Ring [37] Diffuse Diffuse RingRingRingRingRingRing Diffuse [46] Other RingRingRingRingRingOther Ring Ring [55] RingRingRingRingRingRingRingDiffuse Diffuse [64] Diffuse RingRingRingDiffuse Diffuse Diffuse Diffuse Diffuse [73] Diffuse Diffuse Diffuse Other RingRingRingRing Diffuse [82] Diffuse Diffuse Diffuse RingRingRingRingRing Diffuse [91] Other Other RingRingRingOther RingOther Diffuse [100] Diffuse Diffuse RingRing Levels: Diffuse Other Ring R x5 # 2nd independent variable [1] 35.1890163 22.8565556 15.2969944 9.6002241 25.0393843 21.1797882 [7] 9.2677660 14.5228280 6.6982274 5.7889657 21.4854297 20.5942436 [13] 20.2180106 0.4442017 5.0414991 26.9849474 14.7613970 10.3045834 [19] 13.4192478 13.9074085 6.7219989 13.2569404 18.1492698 8.9814628 [25] 14.2575003 21.8982503 8.5661574 15.3434996 7.4060632 10.2824613 [31] 23.4777018 35.3389594 51.5448186 6.9571801 23.3166747 35.2280399 [37] 53.3812646 44.7933630 25.5658796 9.6980968 2.9003139 4.8073814 [43] 6.9274067 8.6178642 43.9578503 0.000 44.1995269 14.6878355 [49] 5.6385462 0.000 21.1687124 20.5669418 0.000 0.000 [55] 28.4924849 8.7184163 18.8744437 20.9748315 21.3849539 163.1436925 [61] 10.8565582 9.9297861 0.000 0.000 41.9369100 121.7625948 [67] 13.5709398 20.1040412 14.1449650 8.2172524 10.1649988 19.5981176 [73] 20.3028117 17.0104638 12.6129991 8.2051932 6.4293587 22.1598564 [79] 13.9703385 23.0206302 15.2590230 14.4778824 2.4819054 21.8293460 [85] 25.1515167 32.1050850 12.5154914 11.6927538 9.4048632 38.4559899 [91] 53.1959167 14.4917170 10.2548528 8.8227194 12.8573515 10.0589965 [97] 12.8868929 9.6626724 5.9826061 3.2581190 13.4467376 8.8065840 [103] 17.7734493 x15 # 3rd independent variable [1] 1.69924629 -1.63414400 0.71415169 4.17480342 1.52512663 1.73541068 [7] -5.47498002 0.95681283 -1.48092555 1.51101949 -2.25838766 2.12958863 [13] 1.43795703 -4.48003373 -3.65963009 -0.76346388 -2.44019863 1.32552847 [19] 1.89863804 1.80655970 -0.74175682 1.30112633 -1.06424643 -1.47852202 [25] 0.09035915 NA NA 1.82385292 -0.15308708 1.04685322 [31] 2.45599032 1.36474093 -2.39863477 -0.21220447 -2.50255033 -1.92296430 [37] -0.24577578 -1.96756216 0.43349997 0.88459859 -0.12755905 2.31771322 [43] -1.21846731 1.75082992 -3.02346893 -4.15582445 1.09946460 4.30008522 [49] 4.37542383 NA -1.93641862 -0.01919492 -2.39609318 -3.12228102 [55] 0.48804606 -1.42886437 -3.52078266 3.22115286 0.87942540 -0.29385365 [61] 0.40030867 0.84382607 -0.14445408 -0.61903527 NA 1.53158894 [67] -1.01595045 0.18857375 -1.24703875 -0.53766035 -0.43305094 1.30035414 [73] 0.08256647 -0.01008154 -1.89151834 0.60161181 1.38339048 1.70782208 [79] 0.48995599 NA 0.71774340 NA 0.35578308 -1.30038021 [85] 0.18170942 -0.76999772 -0.52860127 -0.58713905 2.45770818 -3.79345760 [91] -0.73700348 1.85916858 0.48523489 -2.24404921 -3.71691741 -0.80525820 [97] 0.20768561 -0.05588210 NA -0.50332833 0.70407465 -0.57391160 [103] -1.11740646 y1 #response variable [1] 0.11736407 0.12793015 0.06627390 0.03385292 0.05111586 0.12896867 [7] 0.21030113 0.10661115 0.02321079 0.06035170 0.17966075 0.22120809 [13] 0.16367033 0.07062699 0.11563063 0.62809888 0.13571557 0.14366535 [19] 0.16453117 0.04030618 0.29904079 0.13865458 0.25814464 0.09636693 [25] 0.14262893 0.12619897 0.15919200 0.10713175 0.18137740 0.37961763 [31] 0.16831734 0.02425770 0.12793015 0.23174790 0.16384251 0.41976893 [37] 0.12498691 0.18960957 0.33873792 0.19594614 0.44510411 0.45554491 [43] 0.70821663 0.20739951 0.07828510 0.07393444 0.12290867 0.22614130 [49] 0.49742825 0.04013179 0.58127117 0.05216166 0.27597288 0.14090123 [55] 0.22120809 0.49090375 0.33216113 0.03437621 0.12031011 0.10261893 [61] 0.58141318 0.06244214 0.03594604 0.17966075 0.09345085 0.43887815 [67] 0.51929244 0.20501885 0.04663966 0.33104604 0.28841287 0.26924687 [73] 0.29495726 0.23675230 0.33385065 0.02814909 0.25281753 0.21240608 [79] 0.15204576 0.18288671 0.32867804 0.19813360 0.17379109
[R] Confidence interval from resampling
Dear R gurus, I have the following code, but I still not know how to estimate and extract confidence intervals (95%CI) from resampling. Thanks! ~Adriana #data penta-c(770,729,640,486,450,410,400,340,306,283,278,260,253,242,240,229,201,198,190,186,180,170,168,151,150,148,147,125,117,110,107,104,85,83,80,74,70,66,54,46,45,43,40,38,10) x-log(penta+1) plot(ecdf(x), ylab=Probability, xlab=Concentration (Ln+1)) x.wei-fitdistr(x,weibull) x.wei shapescale 6.7291685 5.3769965 (0.7807718) (0.1254696) xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] x.wei-fitdistr(x,weibull) x.wei xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] curve(pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE), add=TRUE,col='green',lwd=3) #draw random numbers from a weibull distribution 100 times with shape=xwei.shape, scale = xwei.scale draw - lapply(1:100, function(.x){ out-rweibull(x, shape=xwei.shape, scale = xwei.scale) }) newx- data.frame(draw) colnames(newx)-paste(x, 1:100, sep = ) newmat-data.matrix(newx) # matrix of coefficients rownum=2 colnum=100 ResultMat-matrix(NA, ncol=colnum, nrow=rownum) rownum2=45 colnum2=100 ResultMat2-matrix(NA, ncol=colnum2, nrow=rownum2) #loop through each column in the source matrix for (i in 1:100) { sel_col-newmat[col(newmat)==i] {ResultMat[,i]-coef(fitdistr(sel_col,weibull))} xwei.shape- ResultMat[1,i] xwei.scale- ResultMat[2,i] curve(pweibull(x, shape=xwei.shape, scale=xwei.scale, lower.tail=TRUE, log.p = FALSE), add=TRUE, col='grey',lwd=0.5) ResultMat2[,i]-pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE) } ## convert dataframe to numeric MatOut- as.matrix(ResultMat2) mode(MatOut) - numeric # initiate variables mm-ml-mu-rep(0,length(MatOut[,1])) # mean and upper/lower quantiles for(i in 1:length(MatOut[,1])){ mm[i]- mean(MatOut[i,]) ml[i]- quantile(MatOut[i,], 0.025, na.rm=TRUE) mu[i]- quantile(MatOut[i,], 0.975, na.rm=TRUE) } #lines(x, mm, col=black) lines(x, ml, col=blue, lwd=2) lines(x, mu, col=blue, lwd=2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [cleaned code] RE: AIC() vs. mle.aic() vs. step()?
On Thu, 2011-06-23 at 09:29 -0400, Alexandra Thorn wrote: Ok, here's some example code showing how I get different output for AIC vs. mle.aic(). Now that I've taken another look at the independent variables, I'm wondering whether missing values in one of the variables might be what is messing me up. I'm going to see if the behavior changes when I remove those... Okay, here's the code with dput() used to present the various objects. Thanks to the list for being so patient with me... it's been quite educational. Thanks in advance, Alexandra Code: R require(wle) R dput(xA) structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 1L, 1L, 1L, 3L, 3L), .Label = c(Diffuse, Other, Ring), class = factor) R dput(x5) c(35.1890162754395, 22.8565556431035, 15.296994438186, 9.60022407772812, 25.0393843292946, 21.179788208, 9.26776603463063, 14.5228279661883, 6.6982273554755, 5.7889657235105, 21.4854296564891, 20.5942435860781, 20.2180106449331, 0.44420165807875, 5.041499147412, 26.984947359545, 14.7613969969327, 10.304583446995, 13.4192477726851, 13.90740846636, 6.721998863216, 13.25694036483, 18.1492698335532, 8.9814627576195, 14.2575003028425, 21.8982502817969, 8.5661573887, 15.343499557995, 7.4060631990625, 10.2824613451941, 23.4777018427811, 35.3389594363836, 51.5448185920973, 6.9571800684925, 23.3166747093435, 35.2280399322705, 53.3812645912466, 44.7933630466069, 25.5658796310335, 9.6980968165235, 2.90031387090862, 4.80738140821225, 6.927406749722, 8.61786424398488, 43.957850260725, 0, 44.1995269203482, 14.68783550262, 5.63854620095413, 0, 21.1687123966326, 20.566941833529, 0, 0, 28.4924849319605, 8.7184162712155, 18.8744437360889, 20.9748315239075, 21.3849539280062, 163.143692522173, 10.85655822755, 9.92978608605625, 0, 0, 41.9369100379775, 121.762594814280, 13.570939755438, 20.1040411710892, 14.1449650049318, 8.2172523975435, 10.16499876975, 19.598117628078, 20.3028116584013, 17.0104638219038, 12.612999143628, 8.20519315482388, 6.42935872078125, 22.1598563909594, 13.9703385210014, 23.0206302023242, 15.25902295115, 14.4778823661717, 2.4819054257875, 21.8293459510672, 25.151516683063, 32.105084991422, 12.5154914474453, 11.6927538156488, 9.40486317871687, 38.4559898615062, 53.195916748074, 14.4917169976215, 10.2548528385015, 8.82271943808338, 12.8573514676201, 10.0589964580665, 12.886892914765, 9.6626724052155, 5.98260608673, 3.25811900139, 13.446737566015, 8.80658397675, 17.773449287436) R dput(x15) c(1.69924629406401, -1.63414400065288, 0.714151689343318, 4.17480342154949, 1.52512663197893, 1.73541067946363, -5.47498002151169, 0.956812825760668, -1.48092554972038, 1.51101949018443, -2.25838766176389, 2.12958862888441, 1.43795702627435, -4.48003372542488, -3.65963008576897, -0.763463882139697, -2.44019862561235, 1.32552846648453, 1.89863804289907, 1.80655970149808, -0.741756823200407, 1.30112633095768, -1.06424642846912, -1.47852202054490, 0.090359152072348, NA, NA, 1.82385291704612, -0.153087078076393, 1.0468532207338, 2.45599032439301, 1.36474092834838, -2.39863477181754, -0.212204468662908, -2.50255033079852, -1.92296430369566, -0.245775784395867, -1.96756216156693, 0.433499968438238, 0.884598593578297, -0.127559050278120, 2.31771322353091, -1.21846730709075, 1.7508299240518, -3.02346893141966, -4.15582444612729, 1.09946459784029, 4.30008521664531, 4.37542383384967, NA, -1.93641861765076, -0.019194921394532, -2.39609317657158, -3.12228102462318, 0.488046064498046, -1.42886436846636, -3.52078266098328, 3.22115286286252, 0.879425403143162, -0.293853650273392, 0.400308672754849, 0.843826073923569, -0.144454076182464, -0.619035270434771, NA, 1.53158893613932, -1.01595045420127, 0.188573746980020, -1.24703875463314, -0.53766035430668, -0.433050941330375, 1.30035413662748, 0.0825664730349873, -0.0100815443036547, -1.89151834308193, 0.601611806130933, 1.38339048228375, 1.70782208107344, 0.489955991643127, NA, 0.717743402714073, NA, 0.355783083720979, -1.30038021268004, 0.181709422709264, -0.769997723552683, -0.528601269320360, -0.587139047162164, 2.45770817832288, -3.79345760049497, -0.737003476707607, 1.85916858045961, 0.485234889001515, -2.24404921428853, -3.71691740913278, -0.805258199659559, 0.207685613867357, -0.0558821002122282, NA, -0.503328331764907, 0.704074652205563, -0.573911596976014, -1.11740646296423) R dput(y1) c(0.117364072525805, 0.127930151301644, 0.066273900401, 0.0338529181312498, 0.0511158613502366, 0.128968673883822, 0.210301133239691, 0.10661115427526, 0.0232107944450872, 0.0603516951698553, 0.179660748593193, 0.221208092790247, 0.163670330934813, 0.0706269859311667,
Re: [R] Confidence interval from resampling
On Jun 23, 2011, at 9:44 AM, Adriana Bejarano wrote: Dear R gurus, I have the following code, but I still not know how to estimate and extract confidence intervals (95%CI) from resampling. If you have a distribution of values, say resamp.stat, of a statistic from a properly performed resampling operation you can extract and display easily the 5th and 95th percentiles. CI.stat - quantile(resamp.stat, c(0.05, 0.95) ) CI.stat Note: I do not think that 100 replications would generally be sufficient for final work, although its probably acceptable for testing. Your code as posted initially threw a bunch of errors since you did not include library(MASS), but fixing that fairly obvious problem shows that you can draw a nice plot. However, it remains unclear what statistic of what distribution you desire to assess. Mean, median, ... of what? I do not think the error that arose on my machine from the wrapped text here: #draw random numbers from a weibull distribution 100 times with ... shape=xwei.shape, scale = xwei.scale - error . was causing any problem, but there were a bunch of warnings that ought to be investigated: Right after the loop I see ten of these: Warning messages: 1: In dweibull(x, shape, scale, log) : NaNs produced -- David Thanks! ~Adriana #data penta- c (770,729,640,486,450,410,400,340,306,283,278,260,253,242,240,229,201,198,190,186,180,170,168,151,150,148,147,125,117,110,107,104,85,83,80,74,70,66,54,46,45,43,40,38,10 ) x-log(penta+1) plot(ecdf(x), ylab=Probability, xlab=Concentration (Ln+1)) x.wei-fitdistr(x,weibull) x.wei shapescale 6.7291685 5.3769965 (0.7807718) (0.1254696) xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] x.wei-fitdistr(x,weibull) x.wei xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] curve(pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE), add=TRUE,col='green',lwd=3) #draw random numbers from a weibull distribution 100 times with shape=xwei.shape, scale = xwei.scale draw - lapply(1:100, function(.x){ out-rweibull(x, shape=xwei.shape, scale = xwei.scale) }) newx- data.frame(draw) colnames(newx)-paste(x, 1:100, sep = ) newmat-data.matrix(newx) # matrix of coefficients rownum=2 colnum=100 ResultMat-matrix(NA, ncol=colnum, nrow=rownum) rownum2=45 colnum2=100 ResultMat2-matrix(NA, ncol=colnum2, nrow=rownum2) #loop through each column in the source matrix for (i in 1:100) { sel_col-newmat[col(newmat)==i] {ResultMat[,i]-coef(fitdistr(sel_col,weibull))} xwei.shape- ResultMat[1,i] xwei.scale- ResultMat[2,i] curve(pweibull(x, shape=xwei.shape, scale=xwei.scale, lower.tail=TRUE, log.p = FALSE), add=TRUE, col='grey',lwd=0.5) ResultMat2[,i]-pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE) } ## convert dataframe to numeric MatOut- as.matrix(ResultMat2) mode(MatOut) - numeric # initiate variables mm-ml-mu-rep(0,length(MatOut[,1])) # mean and upper/lower quantiles for(i in 1:length(MatOut[,1])){ mm[i]- mean(MatOut[i,]) ml[i]- quantile(MatOut[i,], 0.025, na.rm=TRUE) mu[i]- quantile(MatOut[i,], 0.975, na.rm=TRUE) } #lines(x, mm, col=black) lines(x, ml, col=blue, lwd=2) lines(x, mu, col=blue, lwd=2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to delete file - http://CannotDeleteFile.net
Have you ever run into a situation where you wanted to delete a file, but Windows simply wouldn’t allow you to do it? Personally, these things happen to me all the time, especially when I’m at a client’s house trying to get their machine clean of malware. Have you ever tried deleting a locked file using common windows commands? If so, then you’ll know that this is just not possible. if yes, you can try special tool Long Path Tool http://HowToDeleteFile.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RBloomberg data account for dividend
Solved :) Just use RBloomberg with RJava and use option_names more in http://www.carfield.com.hk/document/Finance/rbloomberg-manual-0-4-144.pdf Thanks mike you did a good job :) -- View this message in context: http://r.789695.n4.nabble.com/RBloomberg-data-account-for-dividend-tp3614213p3619816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-squared values for multiple linear regression with a matrix of multiple response variables
Dear list, I have a matrix Y of multiple response variables and a matrix X of predictor variables and I would like to fit a multivariate multiple regression model and compute the R2-value to determine the overall proportion of variance of the response matrix Y that is explained by the predictor matrix X. I have been using manova(Y ~ X) to assess the significance of the linear model. I am also using lm(Y ~ X) or lm(cbind(y1, y2, ...) ~ x1 + x2 + x3 +) but these seem to fit separate multiple linear models to each response variable, i.e., summary(lm_object) would return a list of regression summaries for each response variable. I would actually like to fit a model on the two matrices with one as the response and the other as the predictor, and compute an R2 value of the correlation between the two matrices. Is there a built-in function in R that does this? If not, how can I compute an R2 value of a correlation between two matrices? best, Manabu -- Manabu Sakamoto, PhD School of Earth Sciences University of Bristol manabu.sakam...@googlemail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem (and solution) to rle on vector with NA values
Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example rv-c(1,1,NA,NA,3,3,3);rle(rv) Returns Run Length Encoding lengths: int [1:4] 2 1 1 3 values : num [1:4] 1 NA NA 3 not Run Length Encoding lengths: int [1:3] 2 2 3 values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test y - x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle-function (x) { if (!is.vector(x) !is.list(x)) stop('x' must be an atomic vector) n - length(x) if (n == 0L) return(structure(list(lengths = integer(), values = x), class = rle)) BEGIN NEW SECTION PART 1 naRepFlag-F if(any(is.na(x))){ naRepFlag-T IS_LOGIC-ifelse(typeof(x)==logical,T,F) if(typeof(x)==logical){ x-as.integer(x) naMaskVal-2 }else if(typeof(x)==character){ naMaskVal-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse=) }else{ naMaskVal-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 } x[which(is.na(x))]-naMaskVal } END NEW SECTION PART 1 y - x[-1L] != x[-n] i - c(which(y), n) BEGIN NEW SECTION PART 2 if(naRepFlag) x[which(x==naMaskVal)]-NA if(IS_LOGIC) x-as.logical(x) END NEW SECTION PART 2 structure(list(lengths = diff(c(0L, i)), values = x[i]), class = rle) } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Dr. Cormac Long. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] agrep: How to match with more than 1 substitution?
Hi all I'm trying to match a numeric code to a vector of numeric codes: a - c(12345, 12346, 12347) agrep(12349, a, max.distance=list(substitutions=1)) # [1] 1 2 3 agrep(12399, a, max.distance=list(substitutions=2)) # integer(0) I didn't expect the latter result as substituting two characters from the pattern makes the pattern identical to the vector elements. What do I need to change to match with more than 1 substitution allowed? Thanks in advance, Nik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stepwise Manova
Hello all, I have a question on manova in R: I'm using the function manova() from the stats package. Is there anything like a stepwise (backward or forward) manova in R (like there is for regression and anova). When I enter: step(Model1, data=Mydata) R returns the message: Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k, : no 'drop1' method for mlm models Which tells me that there is no step, drop1 or add1 function for multilinear models (mlm). Correct? My model looks like this: Model1 - lm(cbind(Var1, Var2, Var3, Var4, Var5) ~ X, data=Mydata) But is there another way to do a stepwise manova? Thanks a lot for your help, Veronika ** This message is intended only for the use of the address...{{dropped:14}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem (and solution) to rle on vector with NA values
On 2011-06-23 06:44, Cormac Long wrote: Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example rv-c(1,1,NA,NA,3,3,3);rle(rv) Returns Run Length Encoding lengths: int [1:4] 2 1 1 3 values : num [1:4] 1 NA NA 3 not Run Length Encoding lengths: int [1:3] 2 2 3 values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test y- x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle-function (x) { if (!is.vector(x) !is.list(x)) stop('x' must be an atomic vector) n- length(x) if (n == 0L) return(structure(list(lengths = integer(), values = x), class = rle)) BEGIN NEW SECTION PART 1 naRepFlag-F if(any(is.na(x))){ naRepFlag-T IS_LOGIC-ifelse(typeof(x)==logical,T,F) if(typeof(x)==logical){ x-as.integer(x) naMaskVal-2 }else if(typeof(x)==character){ naMaskVal-paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse=) }else{ naMaskVal-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 } x[which(is.na(x))]-naMaskVal } END NEW SECTION PART 1 y- x[-1L] != x[-n] i- c(which(y), n) BEGIN NEW SECTION PART 2 if(naRepFlag) x[which(x==naMaskVal)]-NA if(IS_LOGIC) x-as.logical(x) END NEW SECTION PART 2 structure(list(lengths = diff(c(0L, i)), values = x[i]), class = rle) } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Well, it's not worth a flame, but ... from the help page (see 'Details'): Missing values are regarded as unequal to the previous value, even if that is also missing. Peter Ehlers Dr. Cormac Long. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to delete file -
On Jun 23, 2011, at 9:48 AM, Mark Finger wrote: Have you ever run into a situation where you wanted to delete a file, but Windows simply wouldn’t allow you to do it? Personally, these things happen to me all the time, especially when I’m at a client’s house trying to get their machine clean of malware. Have you ever tried deleting a locked file using common windows commands? If so, then you’ll know that this is just not possible. My apologies to the list. This looked like an off-topic question ( but we are fairly permissive in our review policy ) when cursorily viewed on the moderation panel. It looks like spam when seen in full display. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to delete file -
On 23.06.2011 16:50, David Winsemius wrote: On Jun 23, 2011, at 9:48 AM, Mark Finger wrote: Have you ever run into a situation where you wanted to delete a file, but Windows simply wouldn’t allow you to do it? Personally, these things happen to me all the time, especially when I’m at a client’s house trying to get their machine clean of malware. Have you ever tried deleting a locked file using common windows commands? If so, then you’ll know that this is just not possible. My apologies to the list. This looked like an off-topic question ( but we are fairly permissive in our review policy ) when cursorily viewed on the moderation panel. It looks like spam when seen in full display. No need to apologize, but time to thank you and the other volunteers for keeping the list free of spam! Best, Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem (and solution) to rle on vector with NA values
Hello Cormac. Not having thoroughly checked whether your code actually works, the behavior of rle you describe is the one documented (check the details of ?rle) and makes sense as the missingness could have different reasons. As such, changing this type of behavior would probably break a lot of existing code that is built on top of rle. There are other peculiarities and disputabilities about some base R functions (the order of the arguments for sample trips me every time), but unless the argument is really strong or a downright bug, I doubt people will be willing to change this. Perhaps making the new behavior optional (through a new parameter na.action or similar, with the default the original behavior) is an option? Feel free to run your own version of rle in any case. I suggest you rename it, though, as it may cause problems for some packages. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Cormac Long Sent: donderdag 23 juni 2011 15:44 To: r-help@r-project.org Subject: [R] problem (and solution) to rle on vector with NA values Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example rv-c(1,1,NA,NA,3,3,3);rle(rv) Returns Run Length Encoding lengths: int [1:4] 2 1 1 3 values : num [1:4] 1 NA NA 3 not Run Length Encoding lengths: int [1:3] 2 2 3 values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test y - x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle-function (x) { if (!is.vector(x) !is.list(x)) stop('x' must be an atomic vector) n - length(x) if (n == 0L) return(structure(list(lengths = integer(), values = x), class = rle)) BEGIN NEW SECTION PART 1 naRepFlag-F if(any(is.na(x))){ naRepFlag-T IS_LOGIC-ifelse(typeof(x)==logical,T,F) if(typeof(x)==logical){ x-as.integer(x) naMaskVal-2 }else if(typeof(x)==character){ naMaskVal- paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse=) }else{ naMaskVal-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 } x[which(is.na(x))]-naMaskVal } END NEW SECTION PART 1 y - x[-1L] != x[-n] i - c(which(y), n) BEGIN NEW SECTION PART 2 if(naRepFlag) x[which(x==naMaskVal)]-NA if(IS_LOGIC) x-as.logical(x) END NEW SECTION PART 2 structure(list(lengths = diff(c(0L, i)), values = x[i]), class = rle) } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Dr. Cormac Long. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] else problem
Dear R users, I have run into a problem using if...else and I hope you can shed some light on it. I am using R version 2.2.0.1. I have the following data frame: head(dat2f) year tot_km3y [1,] 1964 0.1876854 [2,] 1965 0.1835116 [3,] 1966 0.1915012 [4,] 1967 0.1869758 [5,] 1968 0.2249865 [6,] 1969 0.1916011 I need to pick out the median year, and since there are an even number of data, I cannot use 'median' directly since it gives me a non-existent year/discharge. I found a way to get around that with the following: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] (I really only need the year, not the actual discharge with that year, which is why I left med_TotQ as the true median) However, I have some data sets that have an odd number of data for which the following works perfectly: md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] Thus, I would like to apply the above calculations depended on the condition of: length(dat2f$year)%%2==0 I put it all together as below: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Each individual piece works perfectly on its own, but together I get the following error: if (length(dat2f$year)%%2==0) { + md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { Error: unexpected 'else' in else md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Error: unexpected '}' in } When I tried to look up else I got this error: ?else Error: unexpected 'else' in ?else I have used exactly the same set up with if...else in other code and it worked fine then. I tried to run it again, and I got the same error as above. What is the problem? I hope it isn't something simple and silly! I realize that I can use the first line: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] for all data sets and it will give me the median for both odd and even-length data sets, but it is now about the principle; why won't the if...else work? Thank you very much for your time! Kara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading List data into R with scan()
Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] else problem
On Jun 23, 2011, at 10:59 AM, Kara Przeczek wrote: Dear R users, I have run into a problem using if...else and I hope you can shed some light on it. I am using R version 2.2.0.1. I have the following data frame: head(dat2f) year tot_km3y [1,] 1964 0.1876854 [2,] 1965 0.1835116 [3,] 1966 0.1915012 [4,] 1967 0.1869758 [5,] 1968 0.2249865 [6,] 1969 0.1916011 I need to pick out the median year, and since there are an even number of data, I cannot use 'median' directly since it gives me a non-existent year/discharge. I found a way to get around that with the following: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] (I really only need the year, not the actual discharge with that year, which is why I left med_TotQ as the true median) However, I have some data sets that have an odd number of data for which the following works perfectly: md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] Thus, I would like to apply the above calculations depended on the condition of: length(dat2f$year)%%2==0 I put it all together as below: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { If this line is executed at a console session it will fail because the interpreter does not keep a copy of the last condition. If you moved the closing curley-brace to just befor the 'else', you should get the behavior you expect (if your other code is correct): Try instead: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Each individual piece works perfectly on its own, but together I get the following error: if (length(dat2f$year)%%2==0) { + md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { Error: unexpected 'else' in else md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Error: unexpected '}' in } When I tried to look up else I got this error: ?else Error: unexpected 'else' in ?else Try instead: ?else -- David I have used exactly the same set up with if...else in other code and it worked fine then. I tried to run it again, and I got the same error as above. What is the problem? I hope it isn't something simple and silly! I realize that I can use the first line: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] for all data sets and it will give me the median for both odd and even-length data sets, but it is now about the principle; why won't the if...else work? Thank you very much for your time! Kara David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath: unexpected SPECIAL
On Thu, Jun 23, 2011 at 6:30 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: Folks, the relevant thing you have to remember is: All the stuff must be valid R syntax (with few additional functions as mention in the ?plotmath help file). Knowing that it is obvious where additional operators are required. Best, Uwe Ligges Actually, I think that may be the source of the confusion. I was thinking about it in terms of *labels*, and there's no intrinsic reason that an axis label or other text caption has to have x and y arguments, even if it otherwise might be an operator. A right arrow, as the original querent asked about, could have other uses in a label than connecting two items. It simply didn't occur to me that it could *only* be used that way, though in other contexts that would make perfect sense. It is only possible to use a binary operator through plotmath, even though the result is a label, and not to talk about a binary operator. That's the missing conceptual bit, at least for me. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
On 23.06.2011 16:39, Michael Pearmain wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 I don't understand why you want 40 empty list elements, but here is what you asked for (not optimized, just hacked in few seconds): temp - strsplit(d, :) num - as.numeric(sapply(temp, [[, 1)) L - vector(mode = list, length = max(num)) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(temp[[i]][2], ,))) } L Uwe Ligges but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] else problem
Perhaps some additional clarification... (???) if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { If this line is executed at a console session it will fail because the interpreter does not keep a copy of the last condition. If you moved the closing curley-brace to just befor the 'else', you should get the behavior you expect (if your other code is correct): Well, um.. not sure if this is what you meant, but what is happening at the console is that when you type return, the interpreter checks for a syntactically complete statement. If it finds what has been given to it **thus far** is, it tries to execute it (if not, it gives a continuation character and waits for more input) and, as you said, then starts anew to interpret the next line(s) entered, forgetting all previous. The problem above is that the if() statement up to the close bracket, } is syntactically complete, and so the else{ that follows makes no sense as the beginnig of a new line to be interpreted. The simplest and universal solution to this is to simply enclose the whole conditional in { }: {if(length ... ... else {...} } This forces the interpreter to wait for the last } before it will interpret and execute. Hoping this clarifies rather than obfuscates. -- Bert Try instead: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Each individual piece works perfectly on its own, but together I get the following error: if (length(dat2f$year)%%2==0) { + md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { Error: unexpected 'else' in else md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Error: unexpected '}' in } When I tried to look up else I got this error: ?else Error: unexpected 'else' in ?else Try instead: ?else -- David I have used exactly the same set up with if...else in other code and it worked fine then. I tried to run it again, and I got the same error as above. What is the problem? I hope it isn't something simple and silly! I realize that I can use the first line: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] for all data sets and it will give me the median for both odd and even-length data sets, but it is now about the principle; why won't the if...else work? Thank you very much for your time! Kara David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
l - readLines(textConnection(1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5)) On Thu, Jun 23, 2011 at 12:28 PM, Henrique Dallazuanna www...@gmail.com wrote: Try this: sapply(lapply(strsplit(l, :), strsplit, ,), function(x)structure(lapply(x[2], as.numeric), .Names = x[1])) On Thu, Jun 23, 2011 at 11:39 AM, Michael Pearmain michael.pearm...@gmail.com wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
Thanks Uwe, The list elements was a mistake on my part, i just wanted everything before the : to be the name of the element. Thanks for the help, i can play around with this to get what i want. M 2011/6/23 Uwe Ligges lig...@statistik.tu-dortmund.de On 23.06.2011 16:39, Michael Pearmain wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 I don't understand why you want 40 empty list elements, but here is what you asked for (not optimized, just hacked in few seconds): temp - strsplit(d, :) num - as.numeric(sapply(temp, [[, 1)) L - vector(mode = list, length = max(num)) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(**temp[[i]][2], ,))) } L Uwe Ligges but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
Try this: sapply(lapply(strsplit(l, :), strsplit, ,), function(x)structure(lapply(x[2], as.numeric), .Names = x[1])) On Thu, Jun 23, 2011 at 11:39 AM, Michael Pearmain michael.pearm...@gmail.com wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
On Jun 23, 2011, at 11:19 AM, Uwe Ligges wrote: On 23.06.2011 16:39, Michael Pearmain wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 I don't understand why you want 40 empty list elements, but here is what you asked for (not optimized, just hacked in few seconds): temp - strsplit(d, :) num - as.numeric(sapply(temp, [[, 1)) L - vector(mode = list, length = max(num)) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(temp[[i]][2], ,))) } L I wondered about that too. Perhaps he would be satisfied with alpha indexing: d - c( 1: 3,4,5,6, 2:1,2,3, 43: 5,7,8,9,5) temp - strsplit(d, :) num - sapply(temp, [[, 1) L - vector(mode = list) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(temp[[i]][2], ,))) } L $`1` [1] 3 4 5 6 $`2` [1] 1 2 3 $`43` [1] 5 7 8 9 5 Uwe Ligges but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ddply to count frequency of combinations
On 6/22/2011 11:02 PM, Idris Raja wrote: Brian, I'm a bit confused about how the following line works, specifically, what is happening in freq=length(x)? Is it just taking the length of x after it has been summarized by different combinations x y? I guess that must be the case, because that gives the same result as using freq=length(y) d1-ddply(d, .(x, y), summarize, freq=length(x)) d2-ddply(d, .(x, y), summarize, freq=length(y)) Effectively, ddply takes the dataframe (d), splits it up into multiple dataframes based on unique combinations of the variables (x and y), and calls the function (summarize) with each of the sub-dataframes in turn. ddply also has the option to pass additional parameters to the function that is called. In this case, that is what happens with freq=length(x). Each sub-dataframe is the first argument to a call to summarize([sub-dataframe], freq=length(x)). summarize, in turn, takes a dataframe and other arguments in the form of var=value. It evaluates each of the values in the context of the dataframe (that is, column names can be used directly as variables) and assigns the result to the variable var. These var's then become the columns of a new dataframe. summarize(df, freq=length(x)) freq 19 You are right that length(y) would work just as well; since they are both columns in the same dataframe, they must have the same length. (The last thing ddply does is take all the dataframes that are returned from the function calls and put them back together into a single dataframe which also includes information on which subset each corresponds to.) Also, what is the significance of the periods before the second argument in .(x, y) ? The variables to split on can be given as quoted variables, a formula or character vector. The . is a function in plyr that quotes variables (the first option). The following three are identical: ddply(df, .(x, y), summarise, freq=length(x)) ddply(df, ~x+y, summarise, freq=length(x)) ddply(df, c(x, y), summarise, freq=length(x)) Thanks for the help. You may also benefit from reading Hadley's paper on the topic: Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/. On Tue, Jun 21, 2011 at 12:54 PM, Brian Diggsdig...@ohsu.edu wrote: On 6/21/2011 11:30 AM, Idris Raja wrote: I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x- c(1,2,3,4,5,1,2,3,4) y- c(1,2,3,4,5,1,2,4,1) df-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y) count # # (1, 1) 2 # (2, 2) 2 # (3, 3) 1 # (4, 4) 1 # (5, 5) 1 # (2, 3) 1 # (3, 4) 1 # (4, 1) 1 [[alternative HTML version deleted]] Jorge and Dennis gave good responses that get you to the result you asked for, but for completeness I thought I'd include some ddply versions: ddply(d, .(x, y), summarize, freq=length(x)) This uses the summarize function you were asking about, however you can also do it with: ddply(d, .(x, y), nrow) or ddply(d, .(x, y), as.data.frame(nrow)) The latter giving a slightly nicer name (value instead of V1). As an aside, I prefer using the summarise spelling of the function when I do use it, because it won't clash with Hmisc::summarize. ddply(d, .(x, y), summarise, freq=length(x)) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University [[alternative HTML version deleted]] -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory(RAM) issues
Hi Lui, Anupam and my other R-user friends, Thanks for your previous suggestions. As for my issue, it is clearly RAM problem as my code is running perfectly as long as my input data size is small and code has been refined a number of times to increase efficiency [ by using matrix more in the context of manipulations and reducing order of time complexity wherever possible]. Yet it has gotta run 2000 iterations and in each of the iterations it repeatedly modifies few [0.5 Mn * 15] tables After few basic investigation it seems that the problem MAY be resolved if 64 bit version of R is installed in 64 bit OS as it claims to be able use potentially a 8TB RAM ( for my case I think 4 GB RAM may serve the purpose but 2 GB is not enough) if available.I don't have any domain knowledge in IT since I hail from a statistics background. I work in a company as a statistical analyst. The IT folks of my company are ready to deploy any free solution but I need to tell them what software needs to be installed and what all OS configurations would be required. They normally run all of the applications on unix servors. So I need to know if any free 64 bit of R version can be installed in unix servor or if not unix, may be on other servor. So hereby I request my R-user friends to please let me know if any free 64 bit version of R is available that can be installed on any unix servor. If that is not available is there any other FREE solution and if available, how to get that and what all configuration is required. Awaiting your replies, Regards, Abhisek P.S. Please forward this mail to any other R-mailing list if you deem it fit for any of them. On Sat, Jun 11, 2011 at 4:38 PM, Lui ## lui.r.proj...@googlemail.com wrote: Hello Abhisek, maybe you wanna try it on just a bigger machine (I guess you are working at a university, so I hope you do have access to them). In case getting computing time or the like is a major issue, you may wanna try Amazon AWS: For a few rupees (about 50-100 per hour) you can rent pretty fast computers (20 Ghz, 8BG of RAM). You may want to try out the Windows version (little bit more expensive) which is easily accessible via remote desktop. Installing Revolution (which is free for academic purposes) (64 Bit Version) might give you a good start. Maybe its not a viable option in a long term view (pricing), but it might help to get a clue whether the problem can be solved on a bigger machine and just trying it out without a bigger hastle... Good luck! Lui On Sat, Jun 11, 2011 at 7:47 AM, Abhisek Saha tad...@gmail.com wrote: Thanks Anupam for your inputs. I believe there are two ways to circumvent the issue...1 making the code more efficient 1 Increasing the memory in some way.The reasons why I did not paste the code are 1 It is long and using quite a number of functions that I created 2 Secondly my intention is not to make the code more efficient if that's possible. Here landing into a memory problem with 2 GB RAM is natural as my analysis entails 1500 simulations from huge multivariate distributions that change after every simulation and tomorrow I may have to do similar analysis with 10 million observations * 20 columns. In view of above I shall be needing more memory sometime later and my IT friends are ready to support me for that(probably with a sandbox) but I am not sure whether I can install probably a servor version of R that can be capable of working with 8GB or so RAM. So it is more of technical help I need and I have no clue regarding the plausibility of the solution mentioned( i.e. a servor version of R that is capable of more memory). Regards, Abhisek On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote: It will be helpful on this forum to use metric measures: 12 Lakh is 1.2 million, thus your data is 1.2 million observations x 15 variables. I do not know the intricacies of R. You may have to wait for someone with that knowledge to respond. Including some relevant portions of error messages and code in your query can also help someone to respond to your message. Anupam. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Abhisek Saha Sent: Saturday, June 11, 2011 6:25 AM To: r-help@r-project.org Subject: [R] Memory(RAM) issues Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or something to this effect. I have a RAM of 2GB. I checked with the option like memory.size and it says a 64 bit R if sat on 64 bit windows then max memory capability is 8TB. Now I don't have background to understand the
Re: [R] Loading List data into R with scan()
Thanks All, Henrique, gave me the solution is was looking for, the indexing was a mistake on my part. Thanks again On 23 June 2011 16:37, David Winsemius dwinsem...@comcast.net wrote: On Jun 23, 2011, at 11:19 AM, Uwe Ligges wrote: On 23.06.2011 16:39, Michael Pearmain wrote: Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan(my.data, what = character(0), quiet = TRUE) to load [1] 1: 3,4,5,6 [2] 2:1,2,3 [3] 43: 5,7,8,9,5 I don't understand why you want 40 empty list elements, but here is what you asked for (not optimized, just hacked in few seconds): temp - strsplit(d, :) num - as.numeric(sapply(temp, [[, 1)) L - vector(mode = list, length = max(num)) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(**temp[[i]][2], ,))) } L I wondered about that too. Perhaps he would be satisfied with alpha indexing: d - c( 1: 3,4,5,6, 2:1,2,3, 43: 5,7,8,9,5) temp - strsplit(d, :) num - sapply(temp, [[, 1) L - vector(mode = list) for(i in seq_along(temp)){ L[[num[i]]] - as.numeric(unlist(strsplit(**temp[[i]][2], ,))) } L $`1` [1] 3 4 5 6 $`2` [1] 1 2 3 $`43` [1] 5 7 8 9 5 Uwe Ligges but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem (and solution) to rle on vector with NA values
D'oh! Completely missed that. Definately a case or RTFMS (RTFM, Stupid). My apologies for the spam. Sincerely (with additional grovelling) Cormac. On 23 June 2011 15:59, Nick Sabbe nick.sa...@ugent.be wrote: Hello Cormac. Not having thoroughly checked whether your code actually works, the behavior of rle you describe is the one documented (check the details of ?rle) and makes sense as the missingness could have different reasons. As such, changing this type of behavior would probably break a lot of existing code that is built on top of rle. There are other peculiarities and disputabilities about some base R functions (the order of the arguments for sample trips me every time), but unless the argument is really strong or a downright bug, I doubt people will be willing to change this. Perhaps making the new behavior optional (through a new parameter na.action or similar, with the default the original behavior) is an option? Feel free to run your own version of rle in any case. I suggest you rename it, though, as it may cause problems for some packages. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Cormac Long Sent: donderdag 23 juni 2011 15:44 To: r-help@r-project.org Subject: [R] problem (and solution) to rle on vector with NA values Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example rv-c(1,1,NA,NA,3,3,3);rle(rv) Returns Run Length Encoding lengths: int [1:4] 2 1 1 3 values : num [1:4] 1 NA NA 3 not Run Length Encoding lengths: int [1:3] 2 2 3 values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test y - x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle-function (x) { if (!is.vector(x) !is.list(x)) stop('x' must be an atomic vector) n - length(x) if (n == 0L) return(structure(list(lengths = integer(), values = x), class = rle)) BEGIN NEW SECTION PART 1 naRepFlag-F if(any(is.na(x))){ naRepFlag-T IS_LOGIC-ifelse(typeof(x)==logical,T,F) if(typeof(x)==logical){ x-as.integer(x) naMaskVal-2 }else if(typeof(x)==character){ naMaskVal- paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse=) }else{ naMaskVal-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 } x[which(is.na(x))]-naMaskVal } END NEW SECTION PART 1 y - x[-1L] != x[-n] i - c(which(y), n) BEGIN NEW SECTION PART 2 if(naRepFlag) x[which(x==naMaskVal)]-NA if(IS_LOGIC) x-as.logical(x) END NEW SECTION PART 2 structure(list(lengths = diff(c(0L, i)), values = x[i]), class = rle) } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Dr. Cormac Long. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory(RAM) issues
On Jun 23, 2011, at 11:57 AM, Abhisek Saha wrote: Hi Lui, Anupam and my other R-user friends, Thanks for your previous suggestions. As for my issue, it is clearly RAM problem as my code is running perfectly as long as my input data size is small and code has been refined a number of times to increase efficiency [ by using matrix more in the context of manipulations and reducing order of time complexity wherever possible]. Yet it has gotta run 2000 iterations and in each of the iterations it repeatedly modifies few [0.5 Mn * 15] tables After few basic investigation it seems that the problem MAY be resolved if 64 bit version of R is installed in 64 bit OS as it claims to be able use potentially a 8TB RAM ( for my case I think 4 GB RAM may serve the purpose but 2 GB is not enough) if available.I don't have any domain knowledge in IT since I hail from a statistics background. I work in a company as a statistical analyst. The IT folks of my company are ready to deploy any free solution but I need to tell them what software needs to be installed and what all OS configurations would be required. They normally run all of the applications on unix servors. So I need to know if any free 64 bit of R version can be installed in unix servor or if not unix, may be on other servor. So hereby I request my R-user friends to please let me know if any free 64 bit version of R is available Yes. that can be installed on any unix servor. If that is not available is there any other FREE solution and if available, how to get that and what all configuration is required. There are a bunch of UNIX-alike platforms. Read the FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#What-machines-does-R-run-on_003f -- David. Awaiting your replies, Regards, Abhisek P.S. Please forward this mail to any other R-mailing list if you deem it fit for any of them. On Sat, Jun 11, 2011 at 4:38 PM, Lui ## lui.r.proj...@googlemail.com wrote: Hello Abhisek, maybe you wanna try it on just a bigger machine (I guess you are working at a university, so I hope you do have access to them). In case getting computing time or the like is a major issue, you may wanna try Amazon AWS: For a few rupees (about 50-100 per hour) you can rent pretty fast computers (20 Ghz, 8BG of RAM). You may want to try out the Windows version (little bit more expensive) which is easily accessible via remote desktop. Installing Revolution (which is free for academic purposes) (64 Bit Version) might give you a good start. Maybe its not a viable option in a long term view (pricing), but it might help to get a clue whether the problem can be solved on a bigger machine and just trying it out without a bigger hastle... Good luck! Lui On Sat, Jun 11, 2011 at 7:47 AM, Abhisek Saha tad...@gmail.com wrote: Thanks Anupam for your inputs. I believe there are two ways to circumvent the issue...1 making the code more efficient 1 Increasing the memory in some way.The reasons why I did not paste the code are 1 It is long and using quite a number of functions that I created 2 Secondly my intention is not to make the code more efficient if that's possible. Here landing into a memory problem with 2 GB RAM is natural as my analysis entails 1500 simulations from huge multivariate distributions that change after every simulation and tomorrow I may have to do similar analysis with 10 million observations * 20 columns. In view of above I shall be needing more memory sometime later and my IT friends are ready to support me for that(probably with a sandbox) but I am not sure whether I can install probably a servor version of R that can be capable of working with 8GB or so RAM. So it is more of technical help I need and I have no clue regarding the plausibility of the solution mentioned( i.e. a servor version of R that is capable of more memory). Regards, Abhisek On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote: It will be helpful on this forum to use metric measures: 12 Lakh is 1.2 million, thus your data is 1.2 million observations x 15 variables. I do not know the intricacies of R. You may have to wait for someone with that knowledge to respond. Including some relevant portions of error messages and code in your query can also help someone to respond to your message. Anupam. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Abhisek Saha Sent: Saturday, June 11, 2011 6:25 AM To: r-help@r-project.org Subject: [R] Memory(RAM) issues Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or
Re: [R] else problem
Thank you for all your help! I did not know to use when searching for help, as ?mean, etc, had always worked for me in the past. It makes perfect sense why 'else' was causing me the trouble the way I was using it. I think it was working in my other code, despite the same format, because it was part of a function and thus would have been executed completely within the function? Cheers, Kara From: Bert Gunter [gunter.ber...@gene.com] Sent: June 23, 2011 8:27 AM To: David Winsemius Cc: Kara Przeczek; r-help@r-project.org Subject: Re: [R] else problem Perhaps some additional clarification... (???) if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { If this line is executed at a console session it will fail because the interpreter does not keep a copy of the last condition. If you moved the closing curley-brace to just befor the 'else', you should get the behavior you expect (if your other code is correct): Well, um.. not sure if this is what you meant, but what is happening at the console is that when you type return, the interpreter checks for a syntactically complete statement. If it finds what has been given to it **thus far** is, it tries to execute it (if not, it gives a continuation character and waits for more input) and, as you said, then starts anew to interpret the next line(s) entered, forgetting all previous. The problem above is that the if() statement up to the close bracket, } is syntactically complete, and so the else{ that follows makes no sense as the beginnig of a new line to be interpreted. The simplest and universal solution to this is to simply enclose the whole conditional in { }: {if(length ... ... else {...} } This forces the interpreter to wait for the last } before it will interpret and execute. Hoping this clarifies rather than obfuscates. -- Bert Try instead: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Each individual piece works perfectly on its own, but together I get the following error: if (length(dat2f$year)%%2==0) { + md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { Error: unexpected 'else' in else md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Error: unexpected '}' in } When I tried to look up else I got this error: ?else Error: unexpected 'else' in ?else Try instead: ?else -- David I have used exactly the same set up with if...else in other code and it worked fine then. I tried to run it again, and I got the same error as above. What is the problem? I hope it isn't something simple and silly! I realize that I can use the first line: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] for all data sets and it will give me the median for both odd and even-length data sets, but it is now about the principle; why won't the if...else work? Thank you very much for your time! Kara David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence interval from resampling
Depending on how critical the problem is, you might also want to look at the literature on bootstrap CI's, perhaps starting from the references in boot.ci in the boot package. The simple quantiles are not necessarily the most appropriate. For example I seem to recall that BCa intervals were the intervals recommended for 'general use' by Efron and Tibshirani (Introduction to the Bootstrap (1993) Chapman and Hall) with ABC intervals also getting an honourable mention. 1993 is a long time ago, though... S Ellison -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: 23 June 2011 15:25 To: Adriana Bejarano Cc: r-help@r-project.org Subject: Re: [R] Confidence interval from resampling On Jun 23, 2011, at 9:44 AM, Adriana Bejarano wrote: Dear R gurus, I have the following code, but I still not know how to estimate and extract confidence intervals (95%CI) from resampling. If you have a distribution of values, say resamp.stat, of a statistic from a properly performed resampling operation you can extract and display easily the 5th and 95th percentiles. CI.stat - quantile(resamp.stat, c(0.05, 0.95) ) CI.stat Note: I do not think that 100 replications would generally be sufficient for final work, although its probably acceptable for testing. Your code as posted initially threw a bunch of errors since you did not include library(MASS), but fixing that fairly obvious problem shows that you can draw a nice plot. However, it remains unclear what statistic of what distribution you desire to assess. Mean, median, ... of what? I do not think the error that arose on my machine from the wrapped text here: #draw random numbers from a weibull distribution 100 times with ... shape=xwei.shape, scale = xwei.scale - error . was causing any problem, but there were a bunch of warnings that ought to be investigated: Right after the loop I see ten of these: Warning messages: 1: In dweibull(x, shape, scale, log) : NaNs produced -- David Thanks! ~Adriana #data penta- c (770,729,640,486,450,410,400,340,306,283,278,260,253,242,240,229,201,1 98,190,186,180,170,168,151,150,148,147,125,117,110,107,104,85,83,80,74 ,70,66,54,46,45,43,40,38,10 ) x-log(penta+1) plot(ecdf(x), ylab=Probability, xlab=Concentration (Ln+1)) x.wei-fitdistr(x,weibull) x.wei shapescale 6.7291685 5.3769965 (0.7807718) (0.1254696) xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] x.wei-fitdistr(x,weibull) x.wei xwei.shape - x.wei$estimate[[1]] xwei.scale - x.wei$estimate[[2]] curve(pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE), add=TRUE,col='green',lwd=3) #draw random numbers from a weibull distribution 100 times with shape=xwei.shape, scale = xwei.scale draw - lapply(1:100, function(.x){ out-rweibull(x, shape=xwei.shape, scale = xwei.scale) }) newx- data.frame(draw) colnames(newx)-paste(x, 1:100, sep = ) newmat-data.matrix(newx) # matrix of coefficients rownum=2 colnum=100 ResultMat-matrix(NA, ncol=colnum, nrow=rownum) rownum2=45 colnum2=100 ResultMat2-matrix(NA, ncol=colnum2, nrow=rownum2) #loop through each column in the source matrix for (i in 1:100) { sel_col-newmat[col(newmat)==i] {ResultMat[,i]-coef(fitdistr(sel_col,weibull))} xwei.shape- ResultMat[1,i] xwei.scale- ResultMat[2,i] curve(pweibull(x, shape=xwei.shape, scale=xwei.scale, lower.tail=TRUE, log.p = FALSE), add=TRUE, col='grey',lwd=0.5) ResultMat2[,i]-pweibull(x, shape=xwei.shape, scale = xwei.scale,lower.tail=TRUE, log.p=FALSE) } ## convert dataframe to numeric MatOut- as.matrix(ResultMat2) mode(MatOut) - numeric # initiate variables mm-ml-mu-rep(0,length(MatOut[,1])) # mean and upper/lower quantiles for(i in 1:length(MatOut[,1])){ mm[i]- mean(MatOut[i,]) ml[i]- quantile(MatOut[i,], 0.025, na.rm=TRUE) mu[i]- quantile(MatOut[i,], 0.975, na.rm=TRUE) } #lines(x, mm, col=black) lines(x, ml, col=blue, lwd=2) lines(x, mu, col=blue, lwd=2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any
[R] split dataframe by sample()
Hi, I seemingly have a simple problem, but I've spend hours reading guides posts on this forum and I can't seem to piece together what I need. I have a dataframe where I want to divide it into two subsets: a sample, and the remainder of the dataframe in a new frame. I've tried this: split(df, sample(nrow(df), size=100, replace=FALSE)) another way would be to make a new dataframe of my sample and (something I can do in SQL but not R) then select rows that are NOT in the sample dataframe. Thanks for any help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] else problem
On Jun 23, 2011, at 12:05 PM, Kara Przeczek wrote: Thank you for all your help! I did not know to use when searching for help, as ?mean, etc, had always worked for me in the past. It makes perfect sense why 'else' was causing me the trouble the way I was using it. I think it was working in my other code, despite the same format, because it was part of a function and thus would have been executed completely within the function? Right. When done at the console or sourced, the problem will arise, but not when inside a function. Bert's strategy of enclosing the whole call in {} can be used to demonstrate at the console: {fntest - function() if (FALSE) print(T) +else print(F) } fntest() [1] F Or you can enclose just the body: fntest - function() { if (FALSE) print(T) + else { print(F) } } fntest() [1] F -- David Cheers, Kara From: Bert Gunter [gunter.ber...@gene.com] Sent: June 23, 2011 8:27 AM To: David Winsemius Cc: Kara Przeczek; r-help@r-project.org Subject: Re: [R] else problem Perhaps some additional clarification... (???) if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { If this line is executed at a console session it will fail because the interpreter does not keep a copy of the last condition. If you moved the closing curley-brace to just befor the 'else', you should get the behavior you expect (if your other code is correct): Well, um.. not sure if this is what you meant, but what is happening at the console is that when you type return, the interpreter checks for a syntactically complete statement. If it finds what has been given to it **thus far** is, it tries to execute it (if not, it gives a continuation character and waits for more input) and, as you said, then starts anew to interpret the next line(s) entered, forgetting all previous. The problem above is that the if() statement up to the close bracket, } is syntactically complete, and so the else{ that follows makes no sense as the beginnig of a new line to be interpreted. The simplest and universal solution to this is to simply enclose the whole conditional in { }: {if(length ... ... else {...} } This forces the interpreter to wait for the last } before it will interpret and execute. Hoping this clarifies rather than obfuscates. -- Bert Try instead: if (length(dat2f$year)%%2==0) { md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Each individual piece works perfectly on its own, but together I get the following error: if (length(dat2f$year)%%2==0) { + md -dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] } else { Error: unexpected 'else' in else md -dat2f[, list(med_year = year[which(tot_km3y == median(tot_km3y))], med_TotQ = median(tot_km3y))] } Error: unexpected '}' in } When I tried to look up else I got this error: ?else Error: unexpected 'else' in ?else Try instead: ?else -- David I have used exactly the same set up with if...else in other code and it worked fine then. I tried to run it again, and I got the same error as above. What is the problem? I hope it isn't something simple and silly! I realize that I can use the first line: md - dat2f[, list(med_year = max(year[which(abs(tot_km3y - median(tot_km3y)) == min(abs(tot_km3y - median(tot_km3y ]), med_TotQ = median(tot_km3y))] for all data sets and it will give me the median for both odd and even-length data sets, but it is now about the principle; why won't the if...else work? Thank you very much for your time! Kara David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list
Re: [R] split dataframe by sample()
It's very simple to do this in steps: # to make separate dataframes df - data.frame(A=1:5, B=11:15) df.sample - c(1,3,4) df[df.sample, ] A B 1 1 11 3 3 13 4 4 14 df[-df.sample, ] A B 2 2 12 5 5 15 # or a list with two components split(df, 1:nrow(df) %in% df.sample) $`FALSE` A B 2 2 12 5 5 15 $`TRUE` A B 1 1 11 3 3 13 4 4 14 Sarah On Thu, Jun 23, 2011 at 12:12 PM, Paul Tanger paul.tan...@colostate.edu wrote: Hi, I seemingly have a simple problem, but I've spend hours reading guides posts on this forum and I can't seem to piece together what I need. I have a dataframe where I want to divide it into two subsets: a sample, and the remainder of the dataframe in a new frame. I've tried this: split(df, sample(nrow(df), size=100, replace=FALSE)) another way would be to make a new dataframe of my sample and (something I can do in SQL but not R) then select rows that are NOT in the sample dataframe. Thanks for any help! [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saved EPS does not match screen when using bquote(.(i))
I think that there may be a problem with the way bquote(), for() and savePlot() play together in the OP's example (multiple plots on a windows device; bquote using the loop index). Here's a version using replayPlot(): ## show mu with subscripts 4 and 9: x11() par(mfrow = c(2,1)) for (i in c(4, 9)) { plot(0, 0, main = bquote(mu[.(i)])) } ## now record and replay: z - recordPlot() replayPlot(z) ## both subscripts are now 9 Simple workarounds are: 1. As Uwe and Dennis show, for saving to file, use the appropriate device. 2. For recalling plots (e.g. with recording turned on), a) use substitute() instead of bquote() or b) insert something like i - i before the plot() call. sessionInfo() R version 2.13.0 Patched (2011-05-24 r55981) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C [5] LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Peter Ehlers On 2011-06-22 22:14, Dennis Murphy wrote: Hi: As Uwe suggested... pdf('testgraph.pdf') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() postscript('testgraph.ps') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() png('testgraph.png') layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } dev.off() The three graphs look the same (although the PS graph is rotated to landscape while the other two are portrait). The main point is that mu_1 and mu_2 show up correctly in the two panels in all three graphs (at least on my viewers). The following thread from last January describes some of the problems that certain viewers have with Greek letters, which appear to be viewer and platform dependent: http://r-project.markmail.org/search/?q=pdf%20incorrect#query:pdf%20incorrect+page:2+mid:egmb6utulrxgcznw+state:results I'm guessing that I've seen about a half dozen or so similar posts in this forum over the past year and a half, so you can check the list archives for related problems. HTH, Dennis On Wed, Jun 22, 2011 at 8:07 PM, John Kruschkejohnkrusc...@gmail.com wrote: Here's a fairly minimal-case example in which the saved EPS does not match the screen. The error comes when using bquote(.(i)) instead of bquote(1), as demonstrated by the two minimally different cases below. Very strange. Any clues as to why? # begin --- # Version A. X axis labels have subscripts as constants. EPS is correct. windows() layout( matrix( 1:2 , nrow=2 ) ) plot( 0 , 0 , xlab=bquote(mu[1]) ) plot( 0 , 0 , xlab=bquote(mu[2]) ) savePlot( file=SavePlotTestA.eps , type=eps ) # Axis labels are correct in EPS. # Version B. X axis labels have subscripts as variable index. EPS is wrong! windows() layout( matrix( 1:2 , nrow=2 ) ) for ( i in 1:2 ) { plot( 0 , 0 , xlab=bquote(mu[.(i)]) ) } savePlot( file=SavePlotTestB.eps , type=eps ) # X-AXIS OF PLOT 1 IS WRONG IN EPS. #-- end - Thanks! John K. Kruschke, Professor http://www.indiana.edu/%7Ekruschke/DoingBayesianDataAnalysis/ 2011/6/22 Uwe Liggeslig...@statistik.tu-dortmund.de On 22.06.2011 13:50, John Kruschke wrote: The error happens when using the savePlot() command, like this: savePlot( file=TestSavePlot.eps , type=eps ) savePlot( file=TestSavePlot.jpg , type=jpg ) Well, plot directly into a device, for postscript: postscript(estSavePlot.eps, additionalArguments .) plot(1:10) dev.off() Uwe Ligges The images in the two saved files are not the same, with the JPG being correct but the EPS being wrong. When you suggest starting separate devices explicitly, what do you mean? (I've skimmed through the results of ??device, but can't make sense of it.) Thank you! John K. Kruschke, Professor 2011/6/22 Uwe Liggeslig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de I guess you use the menu to save the plots from your Windows device into files rather than starting separate devices explicitly? If so, please use explicit calls to the devices and everything happens as you define it. Uwe Ligges On 22.06.2011 04:31, John Kruschke wrote: When I save a particular plot as JPG or BMP, the saved image is an accurate copy of that plot. But when I save the same plot as EPS or PDF, some text elements are incorrectly altered. An example is attached. Notice in the top middle and top right panels, the x-axis labels have correct subscript 1 in the JPG, but incorrect subscript 2 in the EPS. I'm using R 2.13.0 on Windows 7. Any clues regarding the source of this error and its solution would be appreciated. For example, are there EPS/PDF device drivers that need to be separately updated? Many thanks. John K. Kruschke, Professor
[R] Lattice xyplot to group by two parameters
My thanks to this mailing list and its members for their great help in the past. I have yet another question per the following code and comments: # I need individual graphs grouped by PARLABEL AND Event, with PARLABEL # controlling pct and lty, and Event controlling col (where Event==1 is green # and Event==2 is red). This is attempted in the xyplot parameter # groups = c(PARLABEL, Event). Current behavior appears to be that the # colors are controlled by PARLABEL along with the shape and line type. Some # experimentation with those two parameters (reverse order, one or the other only) # suggests that the second parameter is ignored. # # I have also experimented with panel.superpose, subsetting, and Extended formula # interface, as appears to be suggested by the man pages, but to no avail. # # Note that I can get the graphics layout I need with ggplot2, but that has its # own issues with secondary axis and subtitle text placement, formatting, and # pagination. # # Example script below. Datafile included. depthlimit=c(7,-1) xlimit=c(0.1,125000) TransectOrder = c(LLR19, LLR18, LLR17) OffsetOrder = c(T, U, V, Y, Z, A, B, C, D, E, F, G, H) # Parse the data for these plots - df - read.csv(file = 5.04-r02_LTC-SE-SO-Compared.csv) df$Transect - factor(df$Transect, levels = TransectOrder) levels(df$Transect) df$Offset - factor(df$Offset, levels = OffsetOrder) levels(df$Offset) xyplot((sbd+sed)/2 ~ Result | Offset+Transect, groups = c(PARLABEL, Event), as.table = TRUE, drop.unused.levels = FALSE, data = df, layout = c(13,3), type = b, xlim = c(0.01, 122000), ylim = depthlimit, pch =c(16, 18), cex = c(1, 1), # to be controlled by PARLABEL lty =c(dashed, solid), # to be controlled by PARLABEL col = c(green, red),# to be controlled by Event scales = list(alternating = 1, x = list(rot = 45, log = TRUE))) Yours, Guy Jett, R.G. Project Geologist ITSI, A Gilbane Company gj...@itsi.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Glmnet Variable Questions
Hi all, I have two questions about variables in glmnet: 1. We are doing a logistic regression with binary outcome variable using a set of predictors that include continuous and binary predictors(coded 0 and 1). If the latter are centered and standardized, they will be transformed into negative and positive numbers; when multiplied by a single beta, I believe they will have undesirable effects. Is there a way to standardize only specified variables? Alternatively, should glmnet be run with manually centered and standardized continuous variables, binary variables coded 0 and 1, and with standardize = FALSE. 2. We have predictors with missing values. We have been handling these by creating a dummy variable for the predictor with a value of 0 if a value is present and 1 if a value is absent. If the model is forced to include both the predictor and the dummy variable, the model-assigned coefficient will effectively interpolate for the missing value. How can I force the dummy variable to be included in glmnet whenever the predictor variable is included? -- View this message in context: http://r.789695.n4.nabble.com/Glmnet-Variable-Questions-tp3620379p3620379.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the levels in a data frame column
Try levels(df$city) or unique(df$city) depending on if it is a factor (default) or character string. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Abraham Mathew Sent: Thursday, June 23, 2011 10:01 AM To: r-help@r-project.org Subject: [R] Finding the levels in a data frame column I have a data frame that looks as follows. df - data.frame(city=c(Houston, Houston, El Paso, Waco, Houston, Plano, Plano) What I want to do is get a list of the city values. Currently, when I run df$city, I get all the values. I just want to know the four cities that appear. So instead of: Houston, Houston, El Paso, Waco, Houston, Plano, Plano I want: Houston, El Paso, Waco, Plano I'm running R 2.13 on Ubuntu 10.10 Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding the levels in a data frame column
I have a data frame that looks as follows. df - data.frame(city=c(Houston, Houston, El Paso, Waco, Houston, Plano, Plano) What I want to do is get a list of the city values. Currently, when I run df$city, I get all the values. I just want to know the four cities that appear. So instead of: Houston, Houston, El Paso, Waco, Houston, Plano, Plano I want: Houston, El Paso, Waco, Plano I'm running R 2.13 on Ubuntu 10.10 Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging multiple data sets
Hi, How can I best do this in R? I've looked into merge but it excludes ids that aren't in all 3 data sets. You need to look a bit harder at merge(), specifically the all.x and all.y options. Sarah On Thu, Jun 23, 2011 at 12:53 PM, cddesjar cddesjard...@gmail.com wrote: Hi, I am trying to merge data similar to the example data below dat0 id var1 var2 var3 2 1 0 1 3 1 0 1 4 0 1 1 5 0 1 1 dat1 id var4 var5 var6 2 1 0 1 3 1 0 1 6 0 1 1 7 0 1 1 dat2 id var7 var8 var9 2 1 0 1 5 1 0 1 6 0 1 1 8 0 1 1 Basically what I'd like to do is combine these variables on id and create one large data frame that looks like the following. dat3 id var1 var2 var3 var4 var5 var6 var7 var8 var9 2 1 0 1 1 0 1 1 0 1 3 1 0 1 1 0 1 NA NA NA 4 0 1 1 NA NA NA NA NA NA 5 0 1 1 NA NA NA NA NA NA 6 NA NA NA 0 1 1 0 1 1 7 NA NA NA 0 1 1 NA NA NA 8 NA NA NA NA NA NA 0 1 1 How can I best do this in R? I've looked into merge but it excludes ids that aren't in all 3 data sets. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging multiple data sets
Hi: Try this: merge(merge(dat0, dat1, by = 'id', all.x = TRUE, all.y = TRUE), dat2, by = 'id', all.x = TRUE, all.y = TRUE) Dennis On Thu, Jun 23, 2011 at 9:53 AM, cddesjar cddesjard...@gmail.com wrote: Hi, I am trying to merge data similar to the example data below dat0 id var1 var2 var3 2 1 0 1 3 1 0 1 4 0 1 1 5 0 1 1 dat1 id var4 var5 var6 2 1 0 1 3 1 0 1 6 0 1 1 7 0 1 1 dat2 id var7 var8 var9 2 1 0 1 5 1 0 1 6 0 1 1 8 0 1 1 Basically what I'd like to do is combine these variables on id and create one large data frame that looks like the following. dat3 id var1 var2 var3 var4 var5 var6 var7 var8 var9 2 1 0 1 1 0 1 1 0 1 3 1 0 1 1 0 1 NA NA NA 4 0 1 1 NA NA NA NA NA NA 5 0 1 1 NA NA NA NA NA NA 6 NA NA NA 0 1 1 0 1 1 7 NA NA NA 0 1 1 NA NA NA 8 NA NA NA NA NA NA 0 1 1 How can I best do this in R? I've looked into merge but it excludes ids that aren't in all 3 data sets. -- View this message in context: http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp3620431p3620431.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ranking submodels by AIC (more general question)
Here's a more general question following up on the specific question I asked earlier: Can anybody recommend an R command other than mle.aic() (from the wle package) that will give back a ranked list of submodels? It seems like a pretty basic piece of functionality, but the closest I've been able to find is stepAIC(), which as far as I can tell only gives back the best submodel, not a ranking of all submodels. Thanks in advance, Alexandra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two panel figure using barplot2
I am trying to make a two panel figure using barplot2 in gplots. When I use par(mfrow=c(1,2)), two panels are created but both barplots are overlaid on top of one another in the first panel and the second panel is left blank. Am I running into problems because I of some complexities of gplots? Is there a simple way to make a two panel figure with two barplots side by side using barplot2? Thank you, Adrienne Keller Graduate Student, College of Forestry University of Montana adrienne.kel...@umontana.edu (Phone) 651-485-5822 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help using cutreeHybrid
I am using the function cutreeHybrid from the package dynamic Tree Cut and I need a list of the resulting clusters but I do not know how to get it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset
Dear all, How can I do the subset fucntion from the table? I want to do the subset for the less than 50. I tried b8a-subset(b8, (table(g$book)50)==TRUE) but it didn't work. Thanks. table(g$ book ) 119 121 134 160 161 170 175 179 190 193 225 226 256 260 130 89 50 8774 23 8547 93 64 1669 48 38 -- View this message in context: http://r.789695.n4.nabble.com/subset-tp3620557p3620557.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help- subtitles for multiple charts
Hello, I have a problem with my script. I don'y know how to apply subtitles. I have 9 charts per page (for combination of 3 blocks and 3 treatments). I want to have subtitles for this interaction (e.g. Block A Trt 1, Block A Trt 2, ...) MBE$bt- interaction(MBE$Block,MBE$trt) par(mfrow=c(3,3)) for(i in unique(MBE$bt)){ ss - MBE$bt==i plot(MBE$Year[ss], MBE$DBH[ss]) sm-loess(DBH~Year, data=pMBE[ss,]) x=seq(2004, 2010, by=1) points(x, predict(sm, data.frame(Year=x)), type=l) } It was possible for the command: MBE$trt.name- factor(MBE$trt, label=c(Treatment 1, Treatment 2, Treatment 3)) xyplot(MBE$DBH~MBE$Year|MBE$trt.name*MBE$Block, xlab=Year, ylab=DBH)- but I need it for the upper script, cuz the diagram is more appropriate. I would be grateful for any advise! -- View this message in context: http://r.789695.n4.nabble.com/help-subtitles-for-multiple-charts-tp3620453p3620453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging multiple data sets
Hi, I am trying to merge data similar to the example data below dat0 idvar1var2var3 2 1 0 1 3 1 0 1 4 0 1 1 5 0 1 1 dat1 idvar4var5var6 2 1 0 1 3 1 0 1 6 0 1 1 7 0 1 1 dat2 idvar7var8var9 2 1 0 1 5 1 0 1 6 0 1 1 8 0 1 1 Basically what I'd like to do is combine these variables on id and create one large data frame that looks like the following. dat3 idvar1var2var3 var4var5 var6 var7 var8 var9 2 1 0 1 1 01 10 1 3 1 0 1 1 01 NA NA NA 4 0 1 1 NA NA NANA NA NA 5 0 1 1 NA NA NANA NA NA 6 NA NA NA 0 11 0 11 7 NA NA NA 0 11 NA NA NA 8 NA NA NA NA NA NA 011 How can I best do this in R? I've looked into merge but it excludes ids that aren't in all 3 data sets. -- View this message in context: http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp3620431p3620431.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret's Kappa for categorical resampling
Yes, that's true. On a test set, the highest probability of being in the smaller class is about 40%. (Incidentally, accuracy on the test set is much higher when I use the best-according-to-Kappa model instead of the best-according-to-Accuracy model.) It looks like the ctree() method supports weights, but all it does is multiply the class likelihoods, which isn't what I want. (That is, if I assign a weight of 2 to all of the small-class instances, it generates the same model, but says that the likelihood for the most-confident instances is about 80% instead of 40%!) I'm still not really understanding why Kappa is not acting like a positive monotonic function of Accuracy, though. Thanks! On Wed, Jun 22, 2011 at 8:12 PM, kuhnA03 max.k...@pfizer.com wrote: Harlan, It looks like your model is predicting (almost) everything to be the majority class (accuracy is almost the same as the largest class percentage). Try setting a test set aside and use confusionMatrix to look at how the model is predicting in more detail. You can try other models that will let you weight the minority class higher to get a more balanced prediction. Max On 6/22/11 3:37 PM, Harlan Harris har...@harris.name wrote: Hello, When evaluating different learning methods for a categorization problem with the (really useful!) caret package, I'm getting confusing results from the Kappa computation. The data is about 20,000 rows and a few dozen columns, and the categories are quite asymmetrical, 4.1% in one category and 95.9% in the other. When I train a ctree model as: model - train(dat.dts, dat.dts.class, method='ctree', tuneLength=8, trControl=trainControl(number = 5, workers=1), metric='Kappa') I get the following puzzling numbers: mincriterion Accuracy Kappa Accuracy SD Kappa SD 0.01 0.961 0.0609 0.00151 0.0264 0.15 0.962 0.049 0.00116 0.0248 0.29 0.963 0.0405 0.00227 0.035 0.43 0.964 0.0349 0.00257 0.0247 0.57 0.964 0.0382 0.0022 0.0199 0.71 0.964 0.0354 0.00255 0.0257 0.85 0.964 0.036 0.00224 0.024 0.99 0.965 0.0091 0.00173 0.0203 (mincriterion determines the likelihood of accepting a split into the tree.) The Accuracy numbers look sorta reasonable, if not great; the model overfits and barely beats the base rate if it builds a complicated tree. But the Kappa numbers go the opposite direction, and here's where I'm not sure what's going on. The examples in the vingette show Accuracy and Kappa being positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 - baserate), but the reported Kappa is definitely not that. Suggestions? Aside from looking for a better model, which would be good advice here, what metric would you recommend? Thank you! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help- subtitles for multiple charts
On Jun 23, 2011, at 1:00 PM, jalen wrote: Hello, I have a problem with my script. I don'y know how to apply subtitles. I have 9 charts per page (for combination of 3 blocks and 3 treatments). I want to have subtitles for this interaction (e.g. Block A Trt 1, Block A Trt 2, ...) MBE$bt- interaction(MBE$Block,MBE$trt) par(mfrow=c(3,3)) for(i in unique(MBE$bt)){ ss - MBE$bt==i plot(MBE$Year[ss], MBE$DBH[ss]) If this is working for you properly except for the subtitles ... Perhaps: plot(MBE$Year[ss], MBE$DBH[ss] sub=levels(MBE$bt)[i]) But I'm not sure you have properly separated the values of Year and DBH into categories of 'bt'. Perhaps that logical ss vector will work. Cannot tell in absence of data. sm-loess(DBH~Year, data=pMBE[ss,]) x=seq(2004, 2010, by=1) points(x, predict(sm, data.frame(Year=x)), type=l) } It was possible for the command: MBE$trt.name- factor(MBE$trt, label=c(Treatment 1, Treatment 2, Treatment 3)) xyplot(MBE$DBH~MBE$Year|MBE$trt.name*MBE$Block, xlab=Year, ylab=DBH)- but I need it for the upper script, cuz the diagram is more appropriate. I would be grateful for any advise! My advice: Post reproducible example. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two panel figure using barplot2
On Jun 23, 2011, at 1:19 PM, Adrienne Keller wrote: I am trying to make a two panel figure using barplot2 in gplots. When I use par(mfrow=c(1,2)), two panels are created but both barplots are overlaid on top of one another in the first panel and the second panel is left blank. Am I running into problems because I of some complexities of gplots? Is there a simple way to make a two panel figure with two barplots side by side using barplot2? Thank you, Adrienne Keller Hi, barplot2() from the gplots CRAN package is built upon the standard barplot() R function. I can run the following code here just fine: require(gplots) par(mfrow = c(1, 2)) barplot2(1:5) barplot2(6:10) and end up with side by side plots. If you post a minimal example of the code that you are using that reproduces the error, we can help you better in trying to determine what may be happening on your end. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset
On Jun 23, 2011, at 1:47 PM, yf wrote: Dear all, How can I do the subset fucntion from the table? What table? I want to do the subset for the less than 50. I tried b8a-subset(b8, (table(g$book)50)==TRUE) but it didn't work. And you neither shown us what these structures are nor posted an error message. Thanks. table(g$ book ) The subset function is for dataframes. Tables are essentially matrices. Use logical indexing with [. 119 121 134 160 161 170 175 179 190 193 225 226 256 260 130 89 50 8774 23 8547 93 64 1669 48 38 The top row are names and the bottom row are values Perhaps (but how b8 enters into this remains unclear) : tbl - table(g$ book ) tbl[ tbl 50] -- View this message in context: http://r.789695.n4.nabble.com/subset-tp3620557p3620557.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] concordance
Is there any package for computing concordance coefficient of incomplete ranking? If there is not, please help me to write it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ranking submodels by AIC (more general question)
Alexandra, Have a look at add1 and drop1. Regards, Jan On 06/23/2011 07:32 PM, Alexandra Thorn wrote: Here's a more general question following up on the specific question I asked earlier: Can anybody recommend an R command other than mle.aic() (from the wle package) that will give back a ranked list of submodels? It seems like a pretty basic piece of functionality, but the closest I've been able to find is stepAIC(), which as far as I can tell only gives back the best submodel, not a ranking of all submodels. Thanks in advance, Alexandra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using cutreeHybrid
On Thu, Jun 23, 2011 at 11:19 AM, Estefania Ruiz Vargas eruiz...@uwo.ca wrote: I am using the function cutreeHybrid from the package dynamic Tree Cut and I need a list of the resulting clusters but I do not know how to get it. Hi, I'm the author of the package. The function returns a list whose component 'labels' contains the cluster assignment (label) of all clustered objects. So you can call the function, say as cth = cutreeHybrid(myTree, ) and the labels are labels = cth$labels You can also use the function cutreeDynamic which (with default arguments) is a wrapper for cutreeHybrid and returns just the cluster labels. HTH, Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my Linux computer. Way better than iPad :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ranking submodels by AIC (more general question)
Thanks for the suggestion. Those functions only provide part of the functionality I want. After a great deal more of drawing the internet, I've concluded that the correct answer to my question is dredge() from the package MuMIn. It seems to use the same AIC algorithm as AIC, which is perfect for making comparisons! Thanks again to everybody who's tried to help me out on this! Alexandra On Thu, 2011-06-23 at 21:29 +0200, Jan van der Laan wrote: Alexandra, Have a look at add1 and drop1. Regards, Jan On 06/23/2011 07:32 PM, Alexandra Thorn wrote: Here's a more general question following up on the specific question I asked earlier: Can anybody recommend an R command other than mle.aic() (from the wle package) that will give back a ranked list of submodels? It seems like a pretty basic piece of functionality, but the closest I've been able to find is stepAIC(), which as far as I can tell only gives back the best submodel, not a ranking of all submodels. Thanks in advance, Alexandra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loops, Paste, Apply? What is the best way to set up a list of many equations?
Is there a way to apply paste to list(form1 = EQ1, form2 = EQ2, form3 = EQ3, form4 = EQ4) such that I don't have to write form1=EQ1 for all my models (I might have a list of 20 or more)? I also need the EQs to read the formulas associated with them. For example, below, I was able to automate the name assignment but I could not figure out how to to set up the list using paste or other functions: V2system-c(formula(paste(form1,=,EQ1)), formula(paste(form2,=,EQ2)), formula(paste(form3,=,EQ3)), formula(paste(form4,=,EQ4)))names(V2system)-paste(form,1:4,sep=) Any ideas?Thank you so much in advance for any help provided. Rita HERE'S MY SAMPLE PROGRAM YX-as.data.frame(matrix(rnorm(280),ncol=14, nrow=20)) ## generate variablesnames(YX) -c(paste(Y, 1:4, sep=), ## assign Y variables' names paste(X, 1:10, sep=)) ## assign X variables' names EQ1 - Y1 ~ X1 + X2 + X4 + X7 + X10 ## equation 1 formulaEQ2 - Y2 ~ X2 + X3 + X5 + X8 + X10 ## equation 2 formulaEQ3 - Y3 ~ X5 + X6 + X7 + X9 ## equation 3 formulaEQ4 - Y4 ~ X1 + X3 + X4 + X6 + X9 ## equation 4 formula ## VERSION 1 FOR SETTING UP THE LIST:V1system -list(form1 = EQ1, form2 = EQ2, form3 = EQ3, form4 = EQ4)V1system ## VERSION 2 FOR SETTING UP THE LIST:V2system-c(formula(paste(form1,=,EQ1)), formula(paste(form2,=,EQ2)), formula(paste(form3,=,EQ3)), formula(paste(form4,=,EQ4)))names(V2system)-paste(form,1:4,sep=)V2system HERE ARE MY RESULTS ## RESULTS OF VERSIONS 1 AND 2 ARE EXACTLY THE SAME: # $form1# Y1 ~ X1 + X2 + X4 + X7 + X10 # $form2# Y2 ~ X2 + X3 + X5 + X8 + X10 # $form3# Y3 ~ X5 + X6 + X7 + X9 # $form4# Y4 ~ X1 + X3 + X4 + X6 + X9 ## STRUCTURE OF THE LISTS IS SAME FOR BOTH VERSIONS AS WELL: # structure(list(form1 = quote(Y1 ~ X1 + X2 + X4 + X7 + X10), # form2 = quote(Y2 ~ X2 + X3 + X5 + X8 + X10), # form3 = quote(Y3 ~ X5 + X6 + X7 + X9), # form4 = quote(Y4 ~ X1 + X3 + X4 + X6 + X9)), # .Names = c(form1, form2, form3, form4)) Rita = If you think education is expensive, try ignorance.--Derek Bok __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rms package - problems with fit.mult.impute
There is a problem passing x in the ... arguments to fit.mult.impute when the function has a formal argument starting with x (xtrans). To get around this specify xtrans=a to fit.mult.impute instead of just listing a. Frank Lina Hellström wrote: Hi! Does anyone know how to do the test for goodness of fit of a logistic model (in rms package) after running fit.mult.impute? I am using the rms and Hmisc packages to do a multiple imputation followed by a logistic regression model using lrm. Everything works fine until I try to run the test for goodness of fit: residuals(type=c(gof)) One needs to specify y=T and x=T in the fit. But I get a warning message when I do that with fit.multiple.impute. a-aregImpute(~med.hist.err+ med.discr+newLiving+No.drugs+Days.categ+Los+Age+Ward+Sex, n.impute=20, nk=0,data=med.err) ddist-datadist(Age,No.drugs,Days.categ, Sex, Living, Ward) options(datadist=ddist) fmi-fit.mult.impute(med.hist.err~Age+No.drugs+Days.categ+Sex+Living+Ward, fitter=lrm, x=T, y=T,a,data=med.err) Error in 1:n.impute : NA/NaN argument In addition: Warning message: In 1:n.impute : numerical expression has 18 elements: only the first used It works to do the fit.mult.impute without x and y=T but then I get the following warning message when running residuals gof-residuals(fmi, type=c(gof)) Error in residuals.lrm(fmi, type = c(gof)) : you did not specify y=T in the fit It was no problem to do the goodness of fit test when I ran the lrm on my complete data set without multiple imputation and fit.mult.impute. model.lrm-lrm(med.hist.err~Age+No.drugs+Days +Sex+Living+Ward, x=TRUE, y=TRUE) gof-residuals(model.lrm, type=c(gof)) Thanks Lina _ PhD student Linnaeus University Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Rms-package-problems-with-fit-mult-impute-tp3619814p3621045.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading the results of a within subject test
Hi, I ran the following in R (on item means): aov(VAR ~(a*b)+Error(item/(a*b)), data = item) I got this result: Error: item Df Sum SqMean Sq F value Pr(F) a 1 7.7249e+13 7.7249e+13 11.329 0.003934 ** Residuals 16 1.0910e+14 6.8187e+12 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: item:a Df Sum SqMean Sq F value Pr(F) a 1 1.1218e+14 1.1218e+14 8.6909 0.009449 ** b 1 2.7264e+10 2.7264e+10 0.0021 0.963912 Residuals 16 2.0653e+14 1.2908e+13 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: item:b Df Sum SqMean Sq F value Pr(F) b 1 6.0538e+13 6.0538e+13 2.7157 0.11886 a:b 1 1.7894e+14 1.7894e+14 8.0272 0.01199 * Residuals 16 3.5666e+14 2.2292e+13 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: item:a:b Df Sum SqMean Sq F value Pr(F) a:b 1 4.9726e+12 4.9726e+12 0.629 0.4461 Residuals 10 7.9061e+13 7.9061e+12 and I'm not sure what is what in there. Is it reporting a between analysis as well?Why does the interaction between a and b appear several times? When I ran the anova I did get a warning message (In aov(VAR ~ (a * b) + Error(item/(a * b)), : Error() model is singular) I'm confused... Thanks! -- View this message in context: http://r.789695.n4.nabble.com/reading-the-results-of-a-within-subject-test-tp3621044p3621044.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using 'foreach' for parallel computing
Hi all, I'm currently working on updating an R package to run for loops in parallel to speed up computation time. I'm using the 'foreach' package with a foreach loop. When I run my code inside the loop, I get an error message that a number of the functions aren't recognized (even though the functions have been defined); if I call my R package inside the loop, everything runs properly. As you can see, this is a problem since I can't call my own package in the source code for the package itself! Any advice would be appreciated. Thanks, Stacey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gcc-4.5.2 and install.packages(glmnet)?
Hi, is there any chance to install glmnet with gcc-4.5.2? For me it fails on all systems with: trying URL 'http://mirrors.softliste.de/cran/src/contrib/glmnet_1.7.tar.gz' Content type 'application/x-gzip' length 522888 bytes (510 Kb) opened URL == downloaded 510 Kb * installing *source* package ‘glmnet’ ... This package has only been tested with gfortran. So some checks are needed. R_HOME is /usr/lib64/R Attempting to determine R_ARCH... R_ARCH is Attempting to detect how R was configured for Fortran 90 Unsupported Fortran 90 compiler or Fortran 90 compilers unavailable! Stop! ERROR: configuration failed for package ‘glmnet’ How can I get more information, what a problem has happened? Otherwise I have no problem to compile a small Fortran 90 program: root@orca:/home/rose/Txt/src/Test/Fortran(57)# cat hello.f90 program Hello print *,'Hello world' end program Hello root@orca:/home/rose/Txt/src/Test/Fortran(58)# gfortran -cpp -dM hello.f90 root@orca:/home/rose/Txt/src/Test/Fortran(59)# ./a.out Hello world __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time-series analysis with treatment effects - statistical approach
Mike Marchywka wrote: I discovered a way to do repetitive tasks that can be concisely specified using something called a computer. Now that's funny :) There were not controlled tests. It was a field experiment testing the effects that various pavement designs have on underlying soil moisture. Two designs incorporated a porous pavement surface course, while two others were based on standard impervious concrete pavement...the control was just bare, exposed soil. As you can see from the graph, the control responds quickly to rainfall events, but dries out quickly as well due to evaporation. The porous pavement allows for quick infiltration of precipitation, while the impervious pavement eventually allows infiltration of rainfall, but it's delayed. My objective is to be able to differentiate between the pavement treatments, such that I can state with statistical confidence that porous pavements affects underlying soil moisture differently than impervious pavements. I think this is obvious just looking at it, but I wanted to be able to back it up with stats. What I'd done previously is to average by week. But as I mentioned, I thought that an anova table with 104 rows relating to each week was a poor way of analyzing the data. But that being said, it effectively allows me to check for treatment-related differences. Thanks for the suggestions to date. Maybe the more I explain what I'm trying to achieve, the more focussed the suggestions will be. The vaguer the question, the broader the response, right? Thanks again, Justin -- View this message in context: http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3621179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] packages which call functions which run C code...
I am looking at the function ks.test in the stats package and trying to figure out why it gives a different result for a p-value than does the corresponding function in MATLAB. I am hoping for one of two responses: 1. You know about ks.tests and have a familiarity with both the R and MATLAB algorithms and can tell me why this should be happening. 2. You know where I can find the C code so that I can decipher it. Thanks, -Steve Wolf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generate the next column from previous column
Hi, I'm quite new to R and are stuck with the following problem. Lets say I have a column consisting of a 1 and the rest zero's, called G0. G0 - c(1,rep(0,5)) Now what I would like to do is to generate G1 from G0, and G2 from G1 etc... Just for the simplicity, let's say I need the first entry of the column to be increased by 5 each time. How could I do this? Thanks already! Kind regards, Michael -- View this message in context: http://r.789695.n4.nabble.com/Generate-the-next-column-from-previous-column-tp3621051p3621051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trying to import xls or xlsx files
library(xlsReadWrite) mydata-read.xls(file path, header=TRUE) however if I change xls to csv it works just fine. Any ideas what I'm doing wrong? I have have also using the package gdata with the exact same error. Below is the error that pops up. Error in findPerl(verbose = verbose) : perl executable not found. Use perl= argument to specify the correct path. Error in file.exists(tfn) : invalid 'file' argument -- View this message in context: http://r.789695.n4.nabble.com/trying-to-import-xls-or-xlsx-files-tp3620580p3620580.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] change plot area in barplot2
I would like to change the plot area for a figure I made with barplot2 so that it is rectangular (for example, 5 x 3 inches) rather than square, which is the default. What is the best way to do this? Thanks, Adrienne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] maximizing the LDV model
Hi Please find attached document to see my problem that I wish you would assist me to solve. I want to maximise the LDV model to get alpha estimates. I am looking forward to hearing from you soon. Edward Actuarial Science student __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help- subtitles for multiple charts
http://r.789695.n4.nabble.com/file/n3620982/mbeFORUM.csv mbeFORUM.csv I uploaded my data and one more time the script (*adjusted version*): pMBE- MBE[MBE$left!=0,] pMBE$bt- interaction(pMBE$Block,pMBE$trt) par(mfrow=c(3,3), oma=c(2,0,2,0)) for(i in unique(pMBE$bt)){ ss - pMBE$bt==i plot(pMBE$Year[ss], pMBE$DBH[ss], xlab=Year, ylab=DBH [in], ylim=range(0:12), col=28) sm-loess(DBH~Year, data=pMBE[ss,]) x=seq(2004, 2010, by=1) points(x, predict(sm, data.frame(Year=x)), type=l, col=650, title(Diameter, adj=0.5, outer=T)) } -- View this message in context: http://r.789695.n4.nabble.com/help-subtitles-for-multiple-charts-tp3620453p3620982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimating choice models at the individual level
The bayesm package does HB models for discrete choice. In practice, a hurdle is likely to be translating your data (if it's fielded in Sawtooth Software) to the correct format for bayesm (which uses a list format instead of a matrix, and difference coding instead of dummy coding). (BTW, note another R list: R-SIG-DCM, which you might consider joining if you're not already a member. https://stat.ethz.ch/mailman/listinfo/r-sig-dcm ) Good luck, -- Chris -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Data Analytics Corp. Sent: Tuesday, June 21, 2011 7:45 AM To: r-help@r-project.org Subject: [R] Estimating choice models at the individual level Hi, I have a discrete choice model to estimate for a client that I originally planned to estimate as an aggregate model using a clogit routine. Now the client is asking for results for many segments of the respondents which would mean, if I stayed with my original plan, I would have to estimate a large number of models. I could certainly do this, but I'm thinking that it would be better to estimate a Hierarchical Bayes model to get coefficients for each individual respondent. This way, I could pull out the people I need for a particular segment and use just those coefficients. Sawtooth's program for MaxDiff can do this. Is there any R package to do an HB estimation for a discrete choice (logit) model? Thanks, Walt Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733 w...@dataanalyticscorp.com www.dataanalyticscorp.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about Software for Data analysis book
the answer to the missing mars data can be found on pages 176/177 of the referenced book. -- View this message in context: http://r.789695.n4.nabble.com/question-about-Software-for-Data-analysis-book-tp1043713p3621277.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new to R need urgent help!
hi all- I am doing some research, have never used R before until today and need to understand the following program for a project. if some one could PLEASE help me understand this program ASAP i would GREATLY appreciate it (any syntax/ statistic comments would be great) PLEASE PLEASE HELP!! THANKYOU!!! -on a side note, it seems to me that R doesnt include the pv, and it was calculated seperatly, is this true? fit=gee(foci~as.factor(time)*cond,id=exper,data=drt,family=poisson(link = log)) Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27 running glm to get initial regression estimate (Intercept) as.factor(time)24 3.051177 -2.705675 condHypoxia as.factor(time)24:condHypoxia -0.402259 1.429034 pv=2*(1-pnorm(abs(summary(fit)$coef[,5]))) data.frame(summary(fit)$coef,pv) Estimate Naive.S.E. Naive.z Robust.S.E. Robust.z (Intercept)3.051177 0.02221052 137.37527 0.04897055 62.306363 as.factor(time)24 -2.705675 0.10890056 -24.84537 0.19987174 -13.537057 condHypoxia -0.402259 0.03907961 -10.29332 0.10661248 -3.773095 as.factor(time)24:condHypoxia 1.429034 0.12549576 11.38711 0.17867421 7.997988 pv (Intercept) 0.00e+00 as.factor(time)24 0.00e+00 condHypoxia 1.612350e-04 as.factor(time)24:condHypoxia 1.332268e-15 ftable(table(drt$cond,drt$time,predict(fit))) 0.345501643340608 1.37227675004058 2.64891772174934 3.05117673373261 Oxia0.5 00 0 485 24 3150 00 Hypoxia 0.5 00 3460 24 0 449 00 ## 3-th term gives the difference between the Hypoxia/Oxia at time=0.5 ## the difference between Hypoxia/Oxia at time=24 L=matrix(c(0,0,1,1),nrow=1) fit$coef[L==1] condHypoxia as.factor(time)24:condHypoxia -0.402259 1.429034 L%*%fit$coef [,1] [1,] 1.026775 wald.test(fit$robust.variance,fit$coef,L=L) Wald test: -- Chi-squared test: X2 = 23.8, df = 1, P( X2) = 1.1e-06 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to import xls or xlsx files
On Thu, Jun 23, 2011 at 2:00 PM, wwreith reith_will...@bah.com wrote: library(xlsReadWrite) mydata-read.xls(file path, header=TRUE) however if I change xls to csv it works just fine. Any ideas what I'm doing wrong? I have have also using the package gdata with the exact same error. Below is the error that pops up. Error in findPerl(verbose = verbose) : perl executable not found. Use perl= argument to specify the correct path. Error in file.exists(tfn) : invalid 'file' argument Regarding the read.xls in gdata the error message means that either perl is not installed or else its not on your path. If its in ~/bin/perl, say, then try: library(gdata) DF - read.xls(myfile, ...whatever..., perl = ~/bin/perl) or if you are on Windows and its in C:\Perl\bin\perl.exe, say, then try: library(gdata) DF - read.xls(myfile, ...whatever..., perl = C:\\Perl\\bin\\perl.exe) Alternately put perl on your path and then you can omit the perl= argument. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packages which call functions which run C code...
On Thu, Jun 23, 2011 at 4:46 PM, Steven Wolf wolfs...@msu.edu wrote: I am looking at the function ks.test in the stats package and trying to figure out why it gives a different result for a p-value than does the corresponding function in MATLAB. I am hoping for one of two responses: 1. You know about ks.tests and have a familiarity with both the R and MATLAB algorithms and can tell me why this should be happening. 2. You know where I can find the C code so that I can decipher it. You can download the source R bundle (a .tar.gz or tar.bz, for example http://cran.stat.ucla.edu/src/base/R-2/R-2.13.0.tar.gz), unpack it, then navigate to the R source directory, go to subdirectory src/library/stats/src and look at the file ks.c. The R function ks.test is defined in src/library/stats/R/ks.test.R. HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packages which call functions which run C code...
On 2011-06-23 16:46, Steven Wolf wrote: I am looking at the function ks.test in the stats package and trying to figure out why it gives a different result for a p-value than does the corresponding function in MATLAB. I am hoping for one of two responses: 1. You know about ks.tests and have a familiarity with both the R and MATLAB algorithms and can tell me why this should be happening. 2. You know where I can find the C code so that I can decipher it. Go to https://svn.r-project.org/R/trunk/src/library/stats/R/ks.test.R for the code of ks.test() and to https://svn.r-project.org/R/trunk/src/library/stats/src/ks.c for the relevant C code. Or download the source code from your favourite CRAN mirror. Peter Ehlers Thanks, -Steve Wolf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change plot area in barplot2
On Jun 23, 2011, at 4:11 PM, Adrienne Keller wrote: I would like to change the plot area for a figure I made with barplot2 so that it is rectangular (for example, 5 x 3 inches) rather than square, which is the default. What is the best way to do this? Thanks, Adrienne Hi, The size of the plot is going to be largely dependent upon the output device being used and the dimension settings for the device. If you are going to use a bitmapped device (eg. png()) as opposed to a vector device (eg. pdf()), then the physical size of the image will also be dependent upon the resolution of the display (in pixels per inch) upon which the file is being viewed. See ?Devices for more information on the available output devices on your system. Then select the particular device you wish to use for details on how to adjust the dimensions. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new to R need urgent help!
On Jun 23, 2011, at 4:42 PM, elisheva corn elishevac...@gmail.com wrote: hi all- I am doing some research, have never used R before until today and need to understand the following program for a project. if some one could PLEASE help me understand this program ASAP i would GREATLY appreciate it (any syntax/ statistic comments would be great) PLEASE PLEASE HELP!! THANKYOU!!! -on a side note, it seems to me that R doesnt include the pv, and it was calculated seperatly, is this true? fit=gee(foci~as.factor(time)*cond,id=exper,data=drt,family=poisson(link = log)) You apparently have count data (foci) which is measured repeatedly within exper, and you're interested in how foci changes with time and condition including their interaction. The code fits a generalized estimating equation (GEE) model, which can be an appropriate model for repeated measures data. See, for example, Diggle, Liang, Zeger Heagerty for background. Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27 running glm to get initial regression estimate (Intercept) as.factor(time)24 3.051177 -2.705675 condHypoxia as.factor(time)24:condHypoxia -0.402259 1.429034 pv=2*(1-pnorm(abs(summary(fit)$coef[,5]))) data.frame(summary(fit)$coef,pv) The gee package doesn't compute the value directly, though other functions like lm, glm and others do. What the code does is use the robust z statistic, which is the estimate/robust se, and relate it to the standard normal distribution. Estimate Naive.S.E. Naive.z Robust.S.E. Robust.z (Intercept)3.051177 0.02221052 137.37527 0.04897055 62.306363 as.factor(time)24 -2.705675 0.10890056 -24.84537 0.19987174 -13.537057 condHypoxia -0.402259 0.03907961 -10.29332 0.10661248 -3.773095 as.factor(time)24:condHypoxia 1.429034 0.12549576 11.38711 0.17867421 7.997988 pv (Intercept) 0.00e+00 as.factor(time)24 0.00e+00 condHypoxia 1.612350e-04 as.factor(time)24:condHypoxia 1.332268e-15 ftable(table(drt$cond,drt$time,predict(fit))) 0.345501643340608 1.37227675004058 2.64891772174934 3.05117673373261 Oxia0.5 00 0 485 24 3150 00 Hypoxia 0.5 00 3460 24 0 449 00 ## 3-th term gives the difference between the Hypoxia/Oxia at time=0.5 ## the difference between Hypoxia/Oxia at time=24 L=matrix(c(0,0,1,1),nrow=1) fit$coef[L==1] condHypoxia as.factor(time)24:condHypoxia -0.402259 1.429034 L%*%fit$coef [,1] [1,] 1.026775 wald.test(fit$robust.variance,fit$coef,L=L) Wald test: -- Chi-squared test: X2 = 23.8, df = 1, P( X2) = 1.1e-06 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to import xls or xlsx files
Gabor's answer explains the error perfectly. You might want to look at the xlsx package as well as the RODBC package if you're on Windows. RODBC is really fast, if you can use it. Abhijit On Jun 23, 2011, at 2:00 PM, wwreith reith_will...@bah.com wrote: library(xlsReadWrite) mydata-read.xls(file path, header=TRUE) however if I change xls to csv it works just fine. Any ideas what I'm doing wrong? I have have also using the package gdata with the exact same error. Below is the error that pops up. Error in findPerl(verbose = verbose) : perl executable not found. Use perl= argument to specify the correct path. Error in file.exists(tfn) : invalid 'file' argument -- View this message in context: http://r.789695.n4.nabble.com/trying-to-import-xls-or-xlsx-files-tp3620580p3620580.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate the next column from previous column
Basically you need to set up a recursive relationship. I'd do this with a 2D array: G = numeric(6*N) dim(G) = c(6,N) G[,1] = c(1,rep(0,5)) for (i in 2:N){G[,i]=G[,i-1]+5* c(1,rep(0,5))} HTH, -Steve -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mkkl Sent: Thursday, June 23, 2011 5:17 PM To: r-help@r-project.org Subject: [R] Generate the next column from previous column Hi, I'm quite new to R and are stuck with the following problem. Lets say I have a column consisting of a 1 and the rest zero's, called G0. G0 - c(1,rep(0,5)) Now what I would like to do is to generate G1 from G0, and G2 from G1 etc... Just for the simplicity, let's say I need the first entry of the column to be increased by 5 each time. How could I do this? Thanks already! Kind regards, Michael -- View this message in context: http://r.789695.n4.nabble.com/Generate-the-next-column-from-previous-column- tp3621051p3621051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'Rscript -e' and stdout() puzzle
Hello, I am curious to know why the output of Rscript -e cat(R.version.string,stdout()) includes a trailing 1, whereas Rscript -e cat(R.version.string) does not. I have tried various mechanisms to subvert this behavior, such as Rscript -e invisible(con-stdout()); cat(R.version.string, con); rm(con); q() but the trailing 1 remains. Thanks, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading the results of a within subject test
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Ann [trivialesc...@gmail.com] Subject: [R] reading the results of a within subject test Hi, I ran the following in R (on item means): aov(VAR ~(a*b)+Error(item/(a*b)), data = item) I'm confused... Could you say more about what 'item' in 'data=item' is? The data= argument is usually a data frame which would normally (for your model specification) contain a continuous variable 'VAR', and probably categorical variables (factors) a, b and item. It is possible to have a data fram called item containing a variable called item, but I'm not quite sure that's what you have. Assuming that you have - or at least hoping so, 'cos I can't see how the model would work properly otherwise - you seem to have specified a two-way anova with interactions against fixed effect factors a and b, (with a*b expanding to a, b and a:b) and an Error term which effectively says you want to test your a, b and a:b fixed effects against random effects a, b and a:b (from the a*b) nested within Item, so expanding to item:a, item:b and item:a:b. That is why you got all those extra tables. But that sounds a bit unlikely. Most experimenters either nest items in a:b subgroups (ie observing several different items for each a:b group) or, for blocked designs, use all combinations of a:b treatments on every one of a number of different items (obviously hard to do unless the treatments are reversible - an example might be experiments like reversible exterior temperature/pressure effects on, say, flow meter readings). I'd guess the more likely scenario is that you have multiple different items per a:b group. That would require only one error term - item, uniquely identified perhaps as a:b:item or simply as item if the items are already individually labelled. But you also say 'on the item means'. Asking for item as an error term implies that you have residual error within item. You clearly can't see that if you only supply item means (which may be one reason aov warned you that you had a singular model - that last table would have been seeking to calculate the residual variance, and after the item means were taken out, there wan't any residual variance). If you had a nested model (multiple different items per a:b subgroup), _either_ you should have provided the raw observations (and not the means) and specified Error(item) or Error(a:b:item) for which aov() would give you a table of tests against the residual MS and another table with tests against the item MS (the latter being the more sensible of the two if items are random and appreciably different), _or_ you should supply the item means, in which case you need not include item in the model because the variation between item means within each group is simply the residual error! . Admittedly I'm speculating quite a bit on what experiment you've actually done, but with a bit of luck some of the above will give you a clue... *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'Rscript -e' and stdout() puzzle
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Benjamin Tyner Sent: Thursday, June 23, 2011 5:31 PM To: r-help@r-project.org Subject: [R] 'Rscript -e' and stdout() puzzle Hello, I am curious to know why the output of Rscript -e cat(R.version.string,stdout()) includes a trailing 1, whereas Rscript -e cat(R.version.string) Use file=stdout(). Otherwise it prints the value of as.character(stdout()), which is 1. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com does not. I have tried various mechanisms to subvert this behavior, such as Rscript -e invisible(con-stdout()); cat(R.version.string, con); rm(con); q() but the trailing 1 remains. Thanks, Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to import xls or xlsx files
On Jun 23, 2011, at 2:00 PM, wwreith wrote: library(xlsReadWrite) mydata-read.xls(file path, header=TRUE) however if I change xls to csv it works just fine. Any ideas what I'm doing wrong? I have have also using the package gdata with the exact same error. Below is the error that pops up. Error in findPerl(verbose = verbose) : perl executable not found. Use perl= argument to specify the correct path. Error in file.exists(tfn) : invalid 'file' argument Your error message suggests you do not have a suitable version of Perl installed. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.