[R] How to mean, min lists and numbers
I would like to sum/mean/min a list of lists and numbers to return the related lists. -1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list. Using the suggestions of Gabor Grothendieck, Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, c(2,13,0). However, it seems that this way does not work to mean/min. So, how to mean/min a list of lists and numbers to return a list? Thanks, -james __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] make an model object (e.g. nlme) available in a user defined function (xyplot related)
Dear Deepayan, Thank you for taking the time to look into this issue. I have a data object called Data, please find it at the end of the message. Then I can run the code below separately in the console. #Construct the nlme object mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data,method='REML', fixed=E0+Emax+gamma+EC50~1, random=EC50~1, groups=~ID, start=list(fixed=c(E0=1,Emax=100,gamma=1,EC50=50)) ) #Plotting xyplot(RESP~CP,Data, groups=ID, panel=panel.superpose, panel.groups=function(x,y,subscripts,...){ panel.xyplot(x,y,...) subjectData-Data[subscripts,] ind.pred-predict(mod.nlme,newdata=subjectData) panel.xyplot(x,ind.pred,type='l',lty=2) } ) ## Then I constructed a test function to put the two tasks together and it seems OK. Strangely I don't even need to print() the xyplot, it is just automatically shown on the screen. test.function-function(Data=Data){ mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data,method='REML', fixed=E0+Emax+gamma+EC50~1, random=EC50~1, groups=~ID, start=list(fixed=c(E0=1,Emax=100,gamma=1,EC50=50)) ) xyplot(RESP~CP,data=Data, groups=ID, panel=panel.superpose, panel.groups=function(x,y,subscripts,...){ panel.xyplot(x,y,...) subjectData-Data[subscripts,] ind.pred-predict(mod.nlme,newdata=subjectData) panel.xyplot(x,ind.pred,type='l',lty=2) } ) } Then I have my real function as follows: If I run the code as, compare.curves(Data=Data) The analytical part is working but not the plotting part (Error using packet 1, object 'model' not found) == compare.curves-function(curve='ascending',Data=stop('A data object must be specified'),parameter='EC50',random.pdDiag=FALSE, start.values=c(Emax=100,E0=1,EC50=50,gamma=2),...){ if (curve=='ascending') model=as.formula('RESP ~ E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma)') if (curve=='descending') model=as.formula('RESP ~ E0+(Emax-E0)*(1-CP**gamma/(EC50**gamma+CP**gamma))') mod.nlme-nlme(model=model,data=Data,method='REML', fixed=Emax+E0+EC50+gamma~1, random= if (length(parameter)==1) eval(substitute(variable~1,list(variable=as.name(parameter else { variable-as.name(parameter[1]) for (i in 2:length(parameter)) variable- paste(variable,'+',as.name(parameter[i])) formula-as.formula(paste(variable,'~1')) if (random.pdDiag) list(pdDiag(formula)) else formula }, groups=~ID, start=list(fixed=start.values) ) mod.nlme.RSS-sum(resid(mod.nlme)^2) df.mod.nlme-dim(Data)[1]-(4+length(parameter)) # 4 fixed effects and plus the number of random effects constrained.fit.parameters-coef(mod.nlme) mod.nls.ind-lapply(split(Data,Data$ID),function(x){ nls(formula=model,data=x,start=start.values) }) mod.nls.ind.RSS-do.call(sum,lapply(mod.nls.ind,function(x)resid(x)^2)) df.mod.nls.ind-dim(Data)[1]-4*length(unique(Data$ID)) ind.fit.parameters-do.call(rbind,lapply(mod.nls.ind,coef)) F.statistic-mod.nlme.RSS/mod.nls.ind.RSS F.test.p.value-pf(F.statistic,df.mod.nlme,df.mod.nls.ind,lower.tail=FALSE) print( xyplot(RESP~CP,data=Data, groups=ID, panel=panel.superpose, panel.groups=function(x,y,subscripts,...){ panel.xyplot(x,y,...) subjectData-Data[subscripts,] ind.pred-predict(mod.nlme,newdata=subjectData) panel.xyplot(x,ind.pred,type='l',lty=2) } ) ) return(list(F_test_statistic=F.statistic,F_test_p_value=F.test.p.value, Individual_fit=ind.fit.parameters,Constrained_fit=constrained.fit.parameters)) } = The data object Data structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30,
Re: [R] How to mean, min lists and numbers
On 12/07/2010 11:10 AM, g...@ucalgary.ca wrote: I would like to sum/mean/min a list of lists and numbers to return the related lists. -1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list. Using the suggestions of Gabor Grothendieck, Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, c(2,13,0). However, it seems that this way does not work to mean/min. So, how to mean/min a list of lists and numbers to return a list? Thanks, You need to be careful of terminology: c(1,1,0) is not a list, it's a vector. What you want is to apply functions componentwise to lists of vectors. One way to do that is to bind them into a matrix, and use apply. For example: M - cbind(-1, c(1,1,0), c(-1,10,-1)) apply(M, 1, mean) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to mean, min lists and numbers
On Jul 12, 2010, at 11:10 AM, g...@ucalgary.ca wrote: I would like to sum/mean/min a list of lists and numbers to return the related lists. You will advance in your understanding faster if you adopt the correct terminology: -1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but ... which is NOT a list, it is a vector. sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list. Using the suggestions of Gabor Grothendieck, Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, c(2,13,0). However, it seems that this way does not work to mean/min. If you want a running cumulative mean of a vector, i.e, c( mean(vec[1]), mean(vec[1:2]), ,,, mean(vec) ): vec - sample(1:20) sapply(1:length(vec), function(x) mean(vec[1:x]) So, how to mean/min a list of lists and numbers to return a list? Not a list and not working on a list of lists. A vector. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating Gwet's AC1 statistic
Hi, After searching the archives and Google and not turning up anything, I thought I'd ask here. Has anyone done an R package for calculating Gwet's AC1 statistic and variance? K.L. Gwet. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48. http://www.ncbi.nlm.nih.gov/pubmed/18482474 Thanks, --pete __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quantmod Error Message
I am trying to create a model using the Quantmod package in R. I am using the following string of commands: ema-read.csv(file=ESU0 Jul 7 1 sec data.csv) Bid=(ema$Bid) twentysell=EMA(Bid,n=1200) fortysell=EMA(Bid,n=2400) sigup-ifelse(twentysellfortysell,1,0) sigdn-ifelse(twentysellfortysell,-1,0) specifyModel(Next(sigup)~lag(sigup,1) + Next(sigdn)~lag(sigdn,1), 1:31624) After this last command, I get this error message: Error in as.Date.default(x, origin = 1970-01-01) : do not know how to convert 'x' to class Date I've thought it was a time series issue, but I have tried converting the sigup and sigdn to a time series using sigup_ts=ts(sigup) sigdn_ts=ts(sigdn) But the error still comes up. Any help on this issue would be greatly appreciated. Thanks, Tyler Campbell tyler.campb...@tradeforecaster.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] print.trellis draw.in - plaintext (gmail mishap)
require(grid) require(lattice) fred = data.frame(x=1:5,y=runif(5)) vplayout - function (x,y) viewport(layout.pos.row=x, layout.pos.col=y) grid.newpage() pushViewport(viewport(layout=grid.layout(2,2))) p = xyplot(y~x,fred) print( p,newpage=FALSE,draw.in=vplayout(2,2)$name) On Mon, Jul 12, 2010 at 8:58 AM, Felix Andrews fe...@nfrac.org wrote: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Yes, please, reproducible code. On 10 July 2010 00:49, Mark Connolly wmcon...@ncsu.edu wrote: I am attempting to plot a trellis object on a grid. vplayout = viewport(layout.pos.row=x, layout.pos.col=y) grid.newpage() pushViewport(viewport(layout=grid.layout(2,2))) g1 = ggplot() ... g2 = ggplot() ... g3 = ggplot() ... p = xyplot() ... # works as expected print(g1, vp=vplayout(1,1)) print(g2, vp=vplayout(1,2)) print(g3, vp=vplayout(2,1)) # does not work print( p, newpage=FALSE, draw.in=vplayout(2,2)$name) Error in grid.Call.graphics(L_downviewport, name$name, strict) : Viewport 'GRID.VP.112' was not found What am I doing wrong? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to mean, min lists and numbers
On Jul 12, 2010, at 11:19 AM, Duncan Murdoch wrote: On 12/07/2010 11:10 AM, g...@ucalgary.ca wrote: I would like to sum/mean/min a list of lists and numbers to return the related lists. -1+2*c(1,1,0)+2+c(-1,10,-1) returns c(2,13,0) but sum(1,2*c(1,1,0),2,c(-1,10,-1)) returns 15 not a list. Using the suggestions of Gabor Grothendieck, Reduce('+',list(-1,2*c(1,1,0),2,c(-1,10,-1))) returns what we want, c(2,13,0). However, it seems that this way does not work to mean/min. So, how to mean/min a list of lists and numbers to return a list? Thanks, You need to be careful of terminology: c(1,1,0) is not a list, it's a vector. What you want is to apply functions componentwise to lists of vectors. One way to do that is to bind them into a matrix, and use apply. For example: M - cbind(-1, c(1,1,0), c(-1,10,-1)) apply(M, 1, mean) As usual Duncan's understanding is better than mine. Just so you know, there are also utility functions row-oriented functions which are conisderably faster when they are the correct solution: ?rowSums ?rowMeans rowMeans(cbind(-1, c(1,1,0), c(-1,10,-1)) ) [1] -0.333 3.333 -0.667 apply(cbind(-1, c(1,1,0), c(-1,10,-1)), 1, mean) [1] -0.333 3.333 -0.667 ... and there is a parallel version of a minimum functions, pmin that would have given you results for arguments that are just the vectors of varying length you were working with: pmin(2, c(2,2,0) ,-1 , c(-1,10,-1)) #[1] -1 -1 -1 #(Done with argument recycling.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] long to wide on larger data set
Juliet, I've been corrected off list. I did not read properly that you are on 64bit. The calculation should be : 53860858 * 4 * 8 /1024^3 = 1.6GB since pointers are 8 bytes on 64bit. Also, data.table is an add-on package so I should have included : install.packages(data.table) require(data.table) data.table is available on all platforms both 32bit and 64bit. Please forgive mistakes: 'someoone' should be 'someone', 'percieved' should be 'perceived' and 'testDate' should be 'testData' at the end. The rest still applies, and you might have a much easier time than I thought since you are on 64bit. I was working on the basis of squeezing into 32bit. Matthew Matthew Dowle mdo...@mdowle.plus.com wrote in message news:i1faj2$lv...@dough.gmane.org... Hi Juliet, Thanks for the info. It is very slow because of the == in testData[testData$V2==one_ind,] Why? Imagine someoone looks for 10 people in the phone directory. Would they search the entire phone directory for the first person's phone number, starting on page 1, looking at every single name, even continuing to the end of the book after they had found them ? Then would they start again from page 1 for the 2nd person, and then the 3rd, searching the entire phone directory from start to finish for each and every person ? That code using == does that. Some of us call that a 'vector scan' and is a common reason for R being percieved as slow. To do that more efficiently try this : testData = as.data.table(testData) setkey(testData,V2)# sorts data by V2 for (one_ind in mysamples) { one_sample - testData[one_id,] reshape(one_sample) } or just this : testData = as.data.table(testData) setkey(testDate,V2) testData[,reshape(.SD,...), by=V2] That should solve the vector scanning problem, and get you on to the memory problems which will need to be tackled. Since the 4 columns are character, then the object size should be roughly : 53860858 * 4 * 4 /1024^3 = 0.8GB That is more promising to work with in 32bit so there is hope. [ That 0.8GB ignores the (likely small) size of the unique strings in global string hash (depending on your data). ] Its likely that the as.data.table() fails with out of memory. That is not data.table but unique. There is a change in unique.c in R 2.12 which makes unique more efficient and since factor calls unique, it may be necessary to use R 2.12. If that still doesn't work, then there are several more tricks (and we will need further information), and there may be some tweaks needed to that code as I didn't test it, but I think it should be possible in 32bit using R 2.12. Is it an option to just keep it in long format and use a data.table ? testDate[, somecomplexrfunction(onecolumn, anothercolumn), by=list(V2) ] Why you you need to reshape from long to wide ? HTH, Matthew Juliet Hannah juliet.han...@gmail.com wrote in message news:aanlktinyvgmrvdp0svc-fylgogn2ro0omnugqbxx_...@mail.gmail.com... Hi Jim, Thanks for responding. Here is the info I should have included before. I should be able to access 4 GB. str(myData) 'data.frame': 53860857 obs. of 4 variables: $ V1: chr 23 26 200047 200050 ... $ V2: chr cv0001 cv0001 cv0001 cv0001 ... $ V3: chr A A A B ... $ V4: chr B B A B ... sessionInfo() R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base On Mon, Jul 12, 2010 at 7:54 AM, jim holtman jholt...@gmail.com wrote: What is the configuration you are running on (OS, memory, etc.)? What does your object consist of? Is it numeric, factors, etc.? Provide a 'str' of it. If it is numeric, then the size of the object is probably about 1.8GB. Doing the long to wide you will probably need at least that much additional memory to hold the copy, if not more. This would be impossible on a 32-bit version of R. On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah juliet.han...@gmail.com wrote: I have a data set that has 4 columns and 53860858 rows. I was able to read this into R with: cc - rep(character,4) myData - read.table(myData.csv,header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=,) I need to reshape this data from long to wide. On a small data set the following lines work. But on the real data set, it didn't finish even when I took a sample of two (rows in new data). I didn't receive an error. I just stopped it because it was taking too long. Any suggestions for improvements? Thanks. # start example # i have commented out the write.table statement below testData -
Re: [R] [R-pkgs] New package list for analyzing list survey experiments
I encourage all authors and maintainers of packages that they use findFn in the sos package to search for other uses of a name you want to use. The findFn function searches for matches in the help pages of contributed packages, including all of CRAN plus some elsewhere. The grepFn can identify help pages whose name contains a particular term. The R Journal from last December contains an article describing this: http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Graves~et~al.pdf;. Hope this helps. Spencer Graves On 7/12/2010 7:08 AM, Jeffrey J. Hallman wrote: I know nothing about your package, but list is a terrible name for it, as list is also the name of a data type in R. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help on date calculation
If you want to use the mondate package you will need to specify the day of the month. Dates March-2010 and May-2010 are ambiguous. Recommend you choose last day of both months as representative days. Then those days will be integer months apart. a-mondate(March 31, 2010, displayFormat=%B %d, %Y) b-mondate(May 31, 2010, displayFormat=%B %d, %Y) print(a) [1] March 31, 2010 print(b) [1] May 31, 2010 b-a [1] 2 attr(,timeunits) [1] months c(b-a) ## strip away the attribute [1] 2 Technically speaking, since mondates are fundamentally numeric, the result of b-a is *numeric*, not *integer*, but is as close to the integer 2 as an R *numeric* can be: is.integer(c(b-a)) [1] FALSE is.integer(2) [1] FALSE identical(c(b-a),2) [1] TRUE Even easier, use the mondate.ymd function, which assumes the last day of the month if not provided, and you won't have to worry about the number of days in a month or leap years. You can also retain your Month-Year format when printed if that is a requirement: a-mondate.ymd(2010,3, displayFormat=%B-%Y) b-mondate.ymd(2010,5, displayFormat=%B-%Y) print(a) [1] March-2010 print(b) [1] May-2010 b-a [1] 2 attr(,timeunits) [1] months identical(c(b-a),2) [1] TRUE This works for any last-day-of-the-month's because mondate represents them as numerics with zero fractional part. Hope that helps, Dan Murphy = Message: 26 Date: Sat, 10 Jul 2010 15:17:07 -0400 From: Gabor Grothendieck ggrothendi...@gmail.com To: Bogaso Christofer bogaso.christo...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Need help on date calculation Message-ID: aanlktimp7z4hhwn-qabdmeeigiyfn0um_aymsn5dx...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Sat, Jul 10, 2010 at 3:34 PM, Bogaso Christofer bogaso.christo...@gmail.com wrote: Thanks Gabor for your input. However my question is, is your solution general for any value of a and b? #1 and #2 are general. For #3 I think there is a possibility you might in general have to do the same tricks as #1 and #2 but am not sure. Suggest you discuss with author of mondate package. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract Clusters from Biclust Object
Dear all, I share the problem Linda Garcia and Ram Kumar Basnet described; I have a biclust object, containing several clusters. For drawing a heatmap, it is possible to specify the cluster to be plotted. However, I'd like to extract the clusters in this manner: Cond.1 Cond.2 Gene - value - value just like drawHeatmap specifies each cluster. Is there a way to extract single clusters? E.g. like saying obj...@object3, meaning cluster no. 3 of my biclust object? Unfortunately, the given answers I found in older posts could'nt help me out... Any help is strongly appreciated! Best regards, Christine -- View this message in context: http://r.789695.n4.nabble.com/Extract-Clusters-from-Biclust-Object-tp2286066p2286066.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] in continuation with the earlier R puzzle
When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- 'Raghu' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a small puzzle?
I know the following may sound too basic but I thought the mailing list is for the benefit of all levels of people. I ran a simple if statement on two numeric vectors (news1o and s2o) which are of equal length. I have done an str on both of them for your kind perusal below. I am trying to compare the numbers in both and initiate a new vector s as 1 or 0 depending on if the elements in the arrays are greater or lesser than each other. When I do a simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs but when I try to override using the if statements this cribs. I get only one element in s and that is a puzzle. Any ideas on this please? Many thanks. if(news1os2o)(s-1) else + (s--1) [1] -1 Warning message: In if (news1o s2o) (s - 1) else (s - -1) : the condition has length 1 and only the first element will be used s [1] -1 length(s) [1] 1 str(news1o) num [1:3588] 891 890 890 888 886 ... str(s2o) num [1:3588] 895 892 890 888 885 ... -- 'Raghu' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I move axis labels closer to plot box?
Hi there, I place a vector of strings as labels at the tick points by using axis(1,at=seq(0.1,0.7,by=0.1), labels=paste(seq(10,70,by=10),%,sep=), tick=FALSE) However, there is a large space between those labels and the boundary of plot box. I want to reduce this space so that the labels appear just next to the boundary of the plot box. How do I do that? Thanks. Best, Jia -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I convert a XML file to a data.frame?
Hi, I have a problem converting a XML file, via the XML package, to a data.frame. The XML file looks like this: Transaction ID value=0044/ Var1 value=XYZ159/ Var2 value=_/ Var3 value=AMR1.0-INT-1005/ Var4 value=2010-05-25 10:44:16:673/ Var5 value=1/ Var6 value=0/ /Transaction Transaction ID value=0046/ Var1 value=XBC254/ Var2 value=GLOBAL/ Var3 value=AMR2.0-INT-9997/ Var4 value=2010-05-25 11:22:50:803/ Var5 value=2/ Var6 value=0/ /Transaction Transaction ID value=unknown/ Var1 value=HGF358/ Var2 value=REGION_A/ Var3 value=AMR2.5-INT-1154/ Var4 value=2010-05-24 10:08:26:711/ Var5 value=3/ Var6 value=0/ /Transaction I don't usually use XML files, but I have searched for an answer for quite a while. I have tried xmlToDataFrame but it demands a structure similar to this: top obs var1value/var1 var2value/var2 var3value/var3 /obs obs var1value/var1 var2value/var2 var3value/var3 /obs /top The top node top could in my case maybe be added to the XML file directly (or via a R command?), but the main issue is to use the children structure in my file (which is different compared to the one that can be used with the xmlToDataFrame), var1 value=/ to convert the XML file to a meaningful data.frame with both categorical and quantitative data. Any tips or tricks? They are highly appreciated. Thanks, Magnus _ Hotmail: Free, trusted and rich email service. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
I don't know what is wrong with your code but I believe you should use ifelse instead of a for loop: s - ifelse(news1o s2o, 1 , -1 ) Alain On 12-Jul-10 16:09, Raghu wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a small puzzle?
You probably want to use ifelse s - ifelse(news1os2o, 1, -1) 'if' only handle a single logical expression. On Mon, Jul 12, 2010 at 10:02 AM, Raghu r.raghura...@gmail.com wrote: I know the following may sound too basic but I thought the mailing list is for the benefit of all levels of people. I ran a simple if statement on two numeric vectors (news1o and s2o) which are of equal length. I have done an str on both of them for your kind perusal below. I am trying to compare the numbers in both and initiate a new vector s as 1 or 0 depending on if the elements in the arrays are greater or lesser than each other. When I do a simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs but when I try to override using the if statements this cribs. I get only one element in s and that is a puzzle. Any ideas on this please? Many thanks. if(news1os2o)(s-1) else + (s--1) [1] -1 Warning message: In if (news1o s2o) (s - 1) else (s - -1) : the condition has length 1 and only the first element will be used s [1] -1 length(s) [1] 1 str(news1o) num [1:3588] 891 890 890 888 886 ... str(s2o) num [1:3588] 895 892 890 888 885 ... -- 'Raghu' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } You might give ifelse() a shot here. s - ifelse(news1o s2o, 1, -1) Learning to think in vectors is important in R, just like thinking in sets is important for SQL, or thinking in rows and steps is important in SAS. cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.c...@epa.gov 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a small puzzle?
In an if statement, you can use only elements. In your example, news1o and s2o are vectors so there is a warning saying the two vectors have a bigger length than one. If you don't send two messages about the same problem in two minutes, you can see what people answer you... For example, I advised you to use ifelse which works on vectors. Alain On 12-Jul-10 16:02, Raghu wrote: I know the following may sound too basic but I thought the mailing list is for the benefit of all levels of people. I ran a simple if statement on two numeric vectors (news1o and s2o) which are of equal length. I have done an str on both of them for your kind perusal below. I am trying to compare the numbers in both and initiate a new vector s as 1 or 0 depending on if the elements in the arrays are greater or lesser than each other. When I do a simple s=(news1os2o) I get the values of S as a string of TRUEs and FALSEs but when I try to override using the if statements this cribs. I get only one element in s and that is a puzzle. Any ideas on this please? Many thanks. if(news1os2o)(s-1) else + (s--1) [1] -1 Warning message: In if (news1o s2o) (s- 1) else (s- -1) : the condition has length 1 and only the first element will be used s [1] -1 length(s) [1] 1 str(news1o) num [1:3588] 891 890 890 888 886 ... str(s2o) num [1:3588] 895 892 890 888 885 ... -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can anybody help me understand AIC and BIC and devise a new metric?
Hi, one comment: Claeskens and Hjort define AIC as 2*log L - 2*p for a model with likelihood L and p parameters; consequently, they look for models with *maximum* AIC in model selection and averaging. This differs from the vast majority of authors (and R), who define AIC as -2*log L + 2*p and search for the model with *minimum* AIC. Their definition of BIC is similarly the negative of normal BIC. I would compare this to defining \pi as the base of the natural logarithm and e as the ratio of a circle's circumference to its diameter: of course, you can do perfectly valid mathematics with your own definitions, but it is a recipe for confusion. Anyone who only reads Claeskens and Hjort, fires up R and selects the model with the maximum AIC from the candidate models is in for some *nasty* surprises. Worse, as far as I see, Claeskens and Hjort nowhere mention that they are using a definition that is diametrically opposed to what is (overwhelmingly) common, and they do not comment on this. However, Claeskens and Hjort managed to publish a book, which I have yet to do, so it is quite possible that there is a major flaw in my thinking. If so, I haven't found it yet, and I would be very grateful if somebody pointed out what I misunderstand. Otherwise, I would be *very* careful indeed about basing my analysis strategy on their book, although the rest of the content is very helpful indeed - you only need to remember where to switch signs and change maximize to minimize etc. For AIC and BIC novices, I would recommend going with Burnham Anderson, which Kjetil cited below. Best, Stephan Kjetil Halvorsen schrieb: You should have a look at: Model Selection and Model Averaging Gerda Claeskens K.U. Leuven Nils Lid Hjort University of Oslo Among other this will explain that AIC and BIC really aims at different goals. On Mon, Jul 5, 2010 at 4:20 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Mon, Jul 5, 2010 at 7:35 AM, LosemindL comtech@gmail.com wrote: Hi all, Could anybody please help me understand AIC and BIC and especially why do they make sense? Any good text that discusses model selection in detail will have some discussion of AIC and BIC. Frank Harrell's book 'Regression Modeling Strategies' comes immediately to mind, along with Hastie, Tibshirani and Friedman (Elements of Statistical Learning) and Burnham and Anderson's book (Model Selection and Multi-Model Inference), but there are many other worthy texts that cover the topic. The gist is that AIC and BIC penalize the log likelihood of a model by subtracting different functions of its number of parameters. David's suggestion of Wikipedia is also on target. Furthermore, I am trying to devise a new metric related to the model selection in the financial asset management industry. As you know the industry uses Sharpe Ratio as the main performance benchmark, which is the annualized mean of returns divided by the annualized standard deviation of returns. I didn't know, but thank you for the information. Isn't this simply a signal-to-noise ratio quantified on an annual basis? In model selection, we would like to choose a model that yields the highest Sharpe Ratio. However, the more parameters you use, the higher Sharpe Ratio you might potentially get, and the higher risk that your model is overfitted. I am trying to think of a AIC or BIC version of the Sharpe Ratio that facilitates the model selection... You might be able to make some progress if you can express the (penalized) log likelihood as a function of the Sharpe ratio. But if you have several years of data in your model and the ratio is computed annually, then isn't it a random variable rather than a parameter? If so, it changes the nature of the problem, no? (Being unfamiliar with the Sharpe ratio, I fully recognize that I may be completely off-base in this suggestion, but I'll put it out there anyway :) BTW, you might find the R-sig-finance list to be a more productive resource in this problem than R-help due to the specialized nature of the question. HTH, Dennis Anybody could you please give me some pointers? Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/Can-anybody-help-me-understand-AIC-and-BIC-and-devise-a-new-metric-tp2278448p2278448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] in continuation with the earlier R puzzle
On Jul 12, 2010, at 10:09 AM, Raghu wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } Perhaps: s - 2*( news1o s2o[1:length(news1o)] ) - 1 ...which I think will throw errors under pretty much the same conditions that would cause errors in that loop. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
On 12-Jul-10 14:09:30, Raghu wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- 'Raghu' Many operations over the whole length of vectors can be done in vectorised form, in which an entire vector is changed in one operation based on the values of the separate elemnts of other vectors, also all take into account in a single operation. What happens behind to scenes is that the single element by element operations are performed by a function in a precompiled (usually from C) library. Hence R already does what you are suggesting as a might as well alternative! Below is an example, using long vectors. The first case is a copy of your R loop above (with some additional initialisation of the vectors). The second achieves the same result in the vectorised form. news1o - runif(100) s2o- runif(100) s - numeric(length(news1o)) proc.time() #user system elapsed # 1.728 0.680 450.257 for(i in 1:length(news1o)){ ### Using a loop if(news1o[i]s2o[i]) s[i]- 1 else s[i]- (-1) } proc.time() #user system elapsed # 11.184 0.756 460.340 s2 - 2*(news1o s2o) - 1 ### Vectorised proc.time() #user system elapsed # 11.348 0.852 460.663 sum(s2 != s) # [1] 0 ### Results identical Result: The loop took (11.184 - 1.728) = 9.456 seconds, Vectorised, it took (11.348 - 11.184) = 0.164 seconds. Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854 i.e. nearly 60 times as long. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-Jul-10 Time: 17:36:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compress string memCompress/Decompress
On Mon, Jul 12, 2010 at 9:17 AM, Erik Wright eswri...@wisc.edu wrote: Hi Seth, Can you recreate the example below using dbWriteTable? Not sure if that is possible with the current dbWriteTable code (don't have time to explore that right now). You are welcome to poke around. You could wrap the example in a helper function to provide your own BLOB respecting write table function if you can't get dbWriteTable to work for your case. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
Richardson, Patrick Patrick.Richardson at vai.org writes: I'm trying to use multiple plotting colors in my code. My first ifelse statement successfully does what I want. However, now I want anything less than -4.5 to be green and the rest black. I want another col argument but can only use one. How could I go about getting separate colors for anything above 4.5 and less than -4.5? plot(three, type=h, col=ifelse(three 4.5, red, black), xlim=c(0,500), ylim=range(three), lwd=2, xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated Genes in Patient Sample) Thanks in advance, Patrick The information transmitted is intended only for the p...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use mpi.allreduce() in Rmpi?
Hi everybody! I have the next code which makes a reduction of the *a *variable in two slaves, using the Rmpi package. library(Rmpi) mpi.spawn.Rslaves(nslaves=2) reduc-function(){ a-mpi.comm.rank()+2 mpi.reduce(a,type=2, op=prod) return(paste(a=,a)) } mpi.bcast.Robj2slave(reduc) mpi.remote.exec(reduc()) cat(Product: ) mpi.reduce(1,op=prod) mpi.close.Rslaves() I want to use the function mpi.allreduce() instead of mpi.reduce(), so slaves should receive the value of the reduction. I don't know how to do it and the avaliable documentation is very small and i'm starting with Rmpi. I also tried with the next two changes but nothing: library(Rmpi) mpi.spawn.Rslaves(nslaves=2) reduc-function(){ a-mpi.comm.rank()+2 mpi.reduce(a,type=2, op=prod) return(paste(a=,a)) } mpi.bcast.Robj2slave(reduc) mpi.remote.exec(reduc()) cat(Product: ) mpi.allreduce(1,op=prod) mpi.close.Rslaves() and library(Rmpi) mpi.spawn.Rslaves(nslaves=2) reduc-function(){ a-mpi.comm.rank()+2 mpi.allreduce(a,type=2, op=prod) return(paste(a=,a)) } mpi.bcast.Robj2slave(reduc) mpi.remote.exec(reduc()) cat(Product: ) mpi.allreduce(1,op=prod) mpi.close.Rslaves() Could somebody help me? Thanks a lot in advance for your help !* * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
Hi, On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick patrick.richard...@vai.org wrote: I'm trying to use multiple plotting colors in my code. My first ifelse statement successfully does what I want. However, now I want anything less than -4.5 to be green and the rest black. I want another col argument but can only use one. How could I go about getting separate colors for anything above 4.5 and less than -4.5? plot(three, type=h, col=ifelse(three 4.5, red, black), xlim=c(0,500), ylim=range(three), lwd=2, xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated Genes in Patient Sample) How about: my.colors - ifelse(three 4.5, red, black) my.colors[three -4.5] - 'green' plot(three, type='h', col=my.colors, ...) -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
One more thing: On Mon, Jul 12, 2010 at 12:55 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick patrick.richard...@vai.org wrote: I'm trying to use multiple plotting colors in my code. My first ifelse statement successfully does what I want. However, now I want anything less than -4.5 to be green and the rest black. I want another col argument but can only use one. How could I go about getting separate colors for anything above 4.5 and less than -4.5? plot(three, type=h, col=ifelse(three 4.5, red, black), xlim=c(0,500), ylim=range(three), lwd=2, xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated Genes in Patient Sample) How about: my.colors - ifelse(three 4.5, red, black) my.colors[three -4.5] - 'green' plot(three, type='h', col=my.colors, ...) Depending on what you want the plot for, perhaps you might consider changing your color palette from green - black - red to something like blue - black - yellow, since many folks who are color can not differentiate green from red all that well. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] densities greater than 1 for values within an (0,1) intervall
Hello, I used the command kdensity in order to calculate the density of fractions/ratios (e.g. number of longterm unemployed on total unemployment). Thus I try to calculate the denisty of values less than 1. However, the values of the kernel densitiy R provided (y-scale) are all greater than 1. Where is the problem and how may I solve it? Does R have problems in calculating distributions of variables within an intervall of 0 and 1? Best, Katja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I move axis labels closer to plot box?
chen jia chen_1002 at fisher.osu.edu writes: Check out the ?par(). Specifically mgp. HTH, Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
Ugh: Depending on what you want the plot for, perhaps you might consider changing your color palette from green - black - red to something like blue - black - yellow, since many folks who are color can not differentiate green from red all that well. ... folks who are color *blind* can not differentiate green from red ... -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
Steve, That worked perfectly. Thank You! Best regards, Patrick -Original Message- From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com] Sent: Monday, July 12, 2010 12:55 PM To: Richardson, Patrick Cc: r-help@r-project.org Subject: Re: [R] Multiple Ploting Colors Hi, On Mon, Jul 12, 2010 at 12:02 PM, Richardson, Patrick patrick.richard...@vai.org wrote: I'm trying to use multiple plotting colors in my code. My first ifelse statement successfully does what I want. However, now I want anything less than -4.5 to be green and the rest black. I want another col argument but can only use one. How could I go about getting separate colors for anything above 4.5 and less than -4.5? plot(three, type=h, col=ifelse(three 4.5, red, black), xlim=c(0,500), ylim=range(three), lwd=2, xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated Genes in Patient Sample) How about: my.colors - ifelse(three 4.5, red, black) my.colors[three -4.5] - 'green' plot(three, type='h', col=my.colors, ...) -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact The information transmitted is intended only for the per...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Ploting Colors
Richardson, Patrick Patrick.Richardson at vai.org writes: I'm trying to use multiple plotting colors in my code. My first ifelse statement successfully does what I want. However, now I want anything less than -4.5 to be green and the rest black. I want another col argument but can only use one. How could I go about getting separate colors for anything above 4.5 and less than -4.5? plot(three, type=h, col=ifelse(three 4.5, red, black), xlim=c(0,500), ylim=range(three), lwd=2, xlab=Chromosome, ylab=Z-Score, font.lab=2, font=2, main=Upregulated Genes in Patient Sample) Thanks in advance, Patrick The information transmitted is intended only for the p...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] densities greater than 1 for values within an (0, 1) intervall
There is no constraint on the magnitude of probability density values, though the area under the curve must be equal to one. You may be thinking of cumulative probability distributions? If so, take a look at smoothed.df() in library (cwhmisc). Katja Hillmann katja.hillm...@wiso.uni-hamburg.de wrote: Hello, I used the command kdensity in order to calculate the density of fractions/ratios (e.g. number of longterm unemployed on total unemployment). Thus I try to calculate the denisty of values less than 1. However, the values of the kernel densitiy R provided (y-scale) are all greater than 1. Where is the problem and how may I solve it? Does R have problems in calculating distributions of variables within an intervall of 0 and 1? Best, Katja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating weibull's distribution mean, standard deviation, and variance
Dear R community: Sorry if this question has a simple answer, but I am a new user of R. Do you know a command, or package that can estimate the weibull distribution's mean, standard deviation and variance? or can direct me to where to find it? Thanks in advance, Oscar Rodriguez Gonzalez Mobile: 519.823.3409 PhD Student Canadian Research Institute for Food Safety [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compress string memCompress/Decompress
Hi Seth, Can you recreate the example below using dbWriteTable? Thanks!, Erik On Jul 11, 2010, at 6:13 PM, Seth Falcon wrote: On Sun, Jul 11, 2010 at 11:31 AM, Matt Shotwell shotw...@musc.edu wrote: On Fri, 2010-07-09 at 20:02 -0400, Erik Wright wrote: Hi Matt, This works great, thanks! At first I got an error message saying BLOB is not implemented in RSQLite. When I updated to the latest version it worked. SQLite began to support BLOBs from version 3.0. And RSQLite began supporting BLOBs only just recently :-) See the NEWS file for details. Below is a minimal example of how you might use BLOBs: db - dbConnect(SQLite(), dbname = :memory:) dbGetQuery(db, CREATE TABLE t1 (name TEXT, data BLOB)) z - paste(hello, 1:10) df - data.frame(a = letters[1:10], z = I(lapply(z, charToRaw))) dbGetPreparedQuery(db, insert into t1 values (:a, :z), df) a - dbGetQuery(db, select name from t1) checkEquals(10, nrow(a)) a - dbGetQuery(db, select data from t1) checkEquals(10, nrow(a)) a - dbGetQuery(db, select * from t1) checkEquals(10, nrow(a)) checkEquals(2, ncol(a)) checkEquals(z, sapply(a$data, rawToChar)) dbDisconnect(db) -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
Using Ted Harding's example: news1o - runif(100) s2o- runif(100) pt1 - proc.time() s - numeric(length(news1o))-1 # Set all of s to -1 s[news1os2o] -1# Change to 1 only those values of s # for which news1os2o pt2- proc.time() pt2-pt1 # Takes even less time... # user system elapsed # 0.040.000.05 Please note: I will be out of the office and out of email contact from 7/11-7/25/2010 Manuela Huso Consulting Statistician 201H Richardson Hall Department of Forest Ecosystems and Society Oregon State University Corvallis, OR 97331 ph: 541-737-6232 fx: 541-737-1393 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ted Harding Sent: Monday, July 12, 2010 9:36 AM To: r-help@r-project.org Cc: Raghu Subject: Re: [R] in continuation with the earlier R puzzle On 12-Jul-10 14:09:30, Raghu wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- 'Raghu' Many operations over the whole length of vectors can be done in vectorised form, in which an entire vector is changed in one operation based on the values of the separate elemnts of other vectors, also all take into account in a single operation. What happens behind to scenes is that the single element by element operations are performed by a function in a precompiled (usually from C) library. Hence R already does what you are suggesting as a might as well alternative! Below is an example, using long vectors. The first case is a copy of your R loop above (with some additional initialisation of the vectors). The second achieves the same result in the vectorised form. news1o - runif(100) s2o- runif(100) s - numeric(length(news1o)) proc.time() #user system elapsed # 1.728 0.680 450.257 for(i in 1:length(news1o)){ ### Using a loop if(news1o[i]s2o[i]) s[i]- 1 else s[i]- (-1) } proc.time() #user system elapsed # 11.184 0.756 460.340 s2 - 2*(news1o s2o) - 1 ### Vectorised proc.time() #user system elapsed # 11.348 0.852 460.663 sum(s2 != s) # [1] 0 ### Results identical Result: The loop took (11.184 - 1.728) = 9.456 seconds, Vectorised, it took (11.348 - 11.184) = 0.164 seconds. Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854 i.e. nearly 60 times as long. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-Jul-10 Time: 17:36:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
Thanks to you all. I stand corrected Ted and Manuela:) I am just an end user and trying to pick up from such forums. Many thanks sirs. On Mon, Jul 12, 2010 at 5:45 PM, Huso, Manuela manuela.h...@oregonstate.edu wrote: Using Ted Harding's example: news1o - runif(100) s2o- runif(100) pt1 - proc.time() s - numeric(length(news1o))-1 # Set all of s to -1 s[news1os2o] -1# Change to 1 only those values of s # for which news1os2o pt2- proc.time() pt2-pt1 # Takes even less time... # user system elapsed # 0.040.000.05 Please note: I will be out of the office and out of email contact from 7/11-7/25/2010 Manuela Huso Consulting Statistician 201H Richardson Hall Department of Forest Ecosystems and Society Oregon State University Corvallis, OR 97331 ph: 541-737-6232 fx: 541-737-1393 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ted Harding Sent: Monday, July 12, 2010 9:36 AM To: r-help@r-project.org Cc: Raghu Subject: Re: [R] in continuation with the earlier R puzzle On 12-Jul-10 14:09:30, Raghu wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- 'Raghu' Many operations over the whole length of vectors can be done in vectorised form, in which an entire vector is changed in one operation based on the values of the separate elemnts of other vectors, also all take into account in a single operation. What happens behind to scenes is that the single element by element operations are performed by a function in a precompiled (usually from C) library. Hence R already does what you are suggesting as a might as well alternative! Below is an example, using long vectors. The first case is a copy of your R loop above (with some additional initialisation of the vectors). The second achieves the same result in the vectorised form. news1o - runif(100) s2o- runif(100) s - numeric(length(news1o)) proc.time() #user system elapsed # 1.728 0.680 450.257 for(i in 1:length(news1o)){ ### Using a loop if(news1o[i]s2o[i]) s[i]- 1 else s[i]- (-1) } proc.time() #user system elapsed # 11.184 0.756 460.340 s2 - 2*(news1o s2o) - 1 ### Vectorised proc.time() #user system elapsed # 11.348 0.852 460.663 sum(s2 != s) # [1] 0 ### Results identical Result: The loop took (11.184 - 1.728) = 9.456 seconds, Vectorised, it took (11.348 - 11.184) = 0.164 seconds. Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854 i.e. nearly 60 times as long. Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 12-Jul-10 Time: 17:36:07 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- 'Raghu' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculate confidence interval of the mean based on ANOVA
I am trying to recreate an analysis that has been done by another group (in SAS I believe). I'm stuck on one part, I think because my stats knowledge is lacking, and while it's OT, I'm hoping someone here can help. Given this dataframe; foo*-*structure(list(OBS = structure(1:18, .Label = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54), class = factor), NOM = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(0.05, 0.1, 1), class = factor), RUN = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c(1, 2, 3, 4, 5, 6), class = factor), CALC = c(0.04989, 0.04872, 0.04544, 0.05645, 0.06516, 0.0622, 0.04868, 0.05006, 0.04746, 0.05574, 0.04442, 0.04742, 0.05508, 0.0593, 0.04898, 0.06373, 0.05537, 0.04674)), .Names = c(OBS, NOM, RUN, CALC), row.names = c(NA, 18L), class = data.frame) I want to perform an anova on CALC~RUN, and based on that calculate the 95% confidence interval. However the interval produced by the earlier analysis is [0.04741, 0.05824]. Is there some way to calculate a confidence interval based on an ANOVa that I'm completely missing ? nrow(foo) [1] 18 mean(foo$CALC) [1] 0.05282444 fooaov-aov(CALC~RUN,data=foo) print(fooaov) Call: aov(formula = CALC ~ RUN, data = foo) Terms: RUN Residuals Sum of Squares 0.0003991420 0.0003202277 Deg. of Freedom 5 12 Residual standard error: 0.005165814 Estimated effects may be unbalanced print(summary(fooaov)) Df Sum Sq Mean Sq F value Pr(F) RUN 5 0.00039914 7.9828e-05 2.9914 0.05565 . Residuals 12 0.00032023 2.6686e-05 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 model.tables(fooaov,type=means,se=TRUE) Tables of means Grand mean 0.05282444 RUN RUN 1 2 3 4 5 6 0.04802 0.06127 0.04873 0.04919 0.05445 0.05528 Standard errors for differences of means RUN 0.004218 replic. 3 t.test(foo$CALC,conf.level=0.95) One Sample t-test data: foo$CALC t = 34.4524, df = 17, p-value 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.04958955 0.05605934 sample estimates: mean of x 0.05282444 Thanks Paul. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create sequence in month
Hi all, can anyone please guide me how to create a sequence of months? Here I have tried following however couldn't get success library(zoo) seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month) Error in del/by : non-numeric argument to binary operator What is the correct way to do that? Thanks for your time. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create sequence in month
As in this example: seq(as.Date(2000/1/1), as.Date(2003/1/1), by=mon) On 7/12/10 11:25 AM, Bogaso Christofer bogaso.christo...@gmail.com wrote: Hi all, can anyone please guide me how to create a sequence of months? Here I have tried following however couldn't get success library(zoo) seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month) Error in del/by : non-numeric argument to binary operator What is the correct way to do that? Thanks for your time. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory 925 423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating weibull's distribution mean, standard deviation, and variance
Try fitdistr() in pkg MASS. -Peter Ehlers On 2010-07-12 11:17, Oscar Rodriguez wrote: Dear R community: Sorry if this question has a simple answer, but I am a new user of R. Do you know a command, or package that can estimate the weibull distribution's mean, standard deviation and variance? or can direct me to where to find it? Thanks in advance, Oscar Rodriguez Gonzalez Mobile: 519.823.3409 PhD Student Canadian Research Institute for Food Safety __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eliminating constant variables
What was the question and answer here? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of pdb Sent: Sunday, July 11, 2010 5:23 AM To: r-help@r-project.org Subject: Re: [R] eliminating constant variables Importance: Low Awsome! It made sense once I realised SD=standard deviation ! pdb -- View this message in context: http://r.789695.n4.nabble.com/eliminating-constant-variables-tp2284831p2 284915.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create sequence in month
Am 12.07.2010 20:25, schrieb Bogaso Christofer: library(zoo) seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month) seq(as.Date(2010-01-01), as.Date(2010-03-01), by=1 month) hth Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in storage.mode(test) - logical
Hi There, I get the following error from the code pasted below: Error in storage.mode(test) - logical : object 'HGBmt12_Natl_Ave_or_Facility' not found library(RODBC) library(car) setwd(c://temp//cms) a07.connect - odbcConnectAccess2007(DFC.accdb) sqlTables(a07.connect) ##provides list of tables## dataset - sqlFetch(a07.connect,'Analysis File 2007-2009') #puts dfc data into table mydata str(dataset) #this works and gives correct values HGlt102009=dataset[,6] HGBL10_F_2007 =dataset[,11] HGmt122009=dataset[,7] HGBL12_F_2007 = dataset[,16] URRmt65Perc2009 = dataset[,3] URRG65_F_2007=dataset[,22] yes1=HGlt102009-HGBL10_F_2007 no1=HGlt102009-2 yes2=HGmt122009-HGBL12_F_2007 no2=HGmt122009-26 yes3=URRG65_F_2007-URRmt65Perc2009 no3=96-URRmt65Perc2009 Analysis2009 - transform(dataset , HGBlt10_Natl_Ave_or_Facility = recode(HGBL10_F_2007,0:2='National'; else='Facility') , HGBmt12_Natl_Ave_or_Facility = recode(HGBL12_F_2007,0:26='National'; else='Facility') , URRmt65_Natl_Ave_or_Facility = recode(URRG65_F_2007,96:100='National'; else='Facility') , HGlt10RawPerc = ifelse(HGBlt10_Natl_Ave_or_Facility == Facility,yes1,no1) , HGmt12RawPerc = ifelse(HGBmt12_Natl_Ave_or_Facility == Facility,yes2,no2) , URRmt65Perc = ifelse(URRmt65_Natl_Ave_or_Facility == Facility,yes3,no3) , HGBlt10Points-recode(HGlt10RawPerc ,-1001:0=10; 0:1=8; 1:2=6; 2:3=4; 3:4=2; else=0 ) , HGBlt12Points-recode(HGlt12RawPerc ,-1001:2=6; 2:3=4; 3:4=2; else=0 ) , URRmt65Points-recode(HGlt10RawPerc ,-1001:0=10; 0:1=8; 1:2=6; 2:3=4; 3:4=2; else=0 ) ) Any ideas on what it means and why? I'd really appreciate it!! Thank you!! Sincerely, tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in continuation with the earlier R puzzle
I wanted to point out one thing that Ted said, about initializing the vectors ('s' in your example). This can make a dramatic speed difference if you are using a for loop (the difference is neglible with vectorized computations). Also, a lot of benchmarks have been flying around, each from a different system and using random numbers without identical seeds. So to provide an overall comparison of all the methods I saw here plus demonstrate the speed difference for initializing a vector (if you know its desired length in advance), I ran these benchmarks. Notes: I did not want to interfere with your objects so I used different names. The equivalencies are: news1o = x; s2o = y; s = z. system.time() automatically calculates the time difference from proc.time() between start and finish . ##R version info sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 #snipped ##Some Sample Data set.seed(10) x - rnorm(10^6) set.seed(15) y - rnorm(10^6) ##Benchmark 1 z.1 - NULL system.time(for(i in 1:length(x)) { + if(x[i] y[i]) { + z.1[i] - 1 + } else { + z.1[i] - -1} + } + ) user system elapsed 1303.83 174.24 1483.74 ##Benchmark 2 #initialize 'z' at length z.2 - vector(numeric, length = 10^6) system.time(for(i in 1:length(x)) { + if(x[i] y[i]) { + z.2[i] - 1 + } else { + z.2[i] - -1} + } + ) user system elapsed 3.770.003.77 ##Benchmark 3 z.3 - NULL system.time(z.3 - ifelse(x y, 1, -1)) user system elapsed 0.380.000.38 ##Benchmark 4 z.4 - vector(numeric, length = 10^6) system.time(z.4 - ifelse(x y, 1, -1)) user system elapsed 0.310.000.31 ##Benchmark 5 system.time(z.5 - 2*(x y) - 1) user system elapsed 0.010.000.01 ##Benchmark 6 system.time(z.6 - numeric(length(x))-1) user system elapsed 0 0 0 system.time(z.6[x y] - 1) user system elapsed 0.030.000.03 ##Show that all results are identical identical(z.1, z.2) [1] TRUE identical(z.1, z.3) [1] TRUE identical(z.1, z.4) [1] TRUE identical(z.1, z.5) [1] TRUE identical(z.1, z.6) [1] TRUE I have not replicated these on other system, but tentatively, it appears that loops are significantly slower than ifelse(), which in turn is slower than options 5 and 6. However, when using the same test data and the same system, I did not find an appreciable difference between options 5 and 6 speed wise. Cheers, Josh On Mon, Jul 12, 2010 at 7:09 AM, Raghu r.raghura...@gmail.com wrote: When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]s2o[i]) + s[i]-1 + else + s[i]--1 + } -- 'Raghu' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] exercise in frustration: applying a function to subsamples
From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: ddply(moreinfo,c(m_id,sale_year,sale_week), + function(df) data.frame(res = fitdist(df$elapsed_time,exp),est = res$estimate,sd = res$sd)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 and lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)), + function(df) fitdist(df$elapsed_time,exp)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 Now, in retrospect, unless I misunderstood the properties of a data.frame, I suppose a data.frame might not have been entirely appropriate as the m_id samples start and end on very different dates, but I would have thought a list data structure should have been able to handle that. It would seem that split is making groups that have the same start and end dates (or that if, for example, I have sale data for precisely the last year, split would insist on both 2009 and 2010 having weeks from 0 through 52 instead of just the weeks in each year that actually have data: 26 through 52 for last year and 1 through 25 for this year). I don't see how else the data passed to fitdist could have a sample size of 0. I'd appreciate understanding how to resolve this. However, it isn't s show stopper as it now seems trivial to just break it out into a loop (followed by a lapply/split combo using only sale year and sale month). While I am asking, is there a better way to split such temporally ordered data into weekly samples that respective the year in which the sample is taken as well as the week in which it is taken? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting Lists
I am looking for a way to create a vector which contains the second element of every vector in the list. However, not every vector has two components, so I need to generate an NA for those missing. For example, I have created the following list: lst - list(c(a, b), c(c), c(d, e), c(f, g)) lst [[1]] [1] a b [[2]] [1] c [[3]] [1] d e [[4]] [1] f g I would like the output to be the following: output [1] b NA e g I know I can accomplish this using a for loop, but I am wondering if there's a simple and neat way of getting this done in one step? Thanks in advance. - Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting Lists
Hi Andrew, Try sapply(lst, [, 2) HTH, Jorge On Mon, Jul 12, 2010 at 3:12 PM, Andrew Leeser wrote: I am looking for a way to create a vector which contains the second element of every vector in the list. However, not every vector has two components, so I need to generate an NA for those missing. For example, I have created the following list: lst - list(c(a, b), c(c), c(d, e), c(f, g)) lst [[1]] [1] a b [[2]] [1] c [[3]] [1] d e [[4]] [1] f g I would like the output to be the following: output [1] b NA e g I know I can accomplish this using a for loop, but I am wondering if there's a simple and neat way of getting this done in one step? Thanks in advance. - Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exercise in frustration: applying a function to subsamples
Your code is not reproducible. Can you come up with a small example showing the crux of your data structures/problem, that we can all run in our R sessions? You're likely get much higher quality responses this way. Ted Byers wrote: From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: ddply(moreinfo,c(m_id,sale_year,sale_week), + function(df) data.frame(res = fitdist(df$elapsed_time,exp),est = res$estimate,sd = res$sd)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 and lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)), + function(df) fitdist(df$elapsed_time,exp)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 Now, in retrospect, unless I misunderstood the properties of a data.frame, I suppose a data.frame might not have been entirely appropriate as the m_id samples start and end on very different dates, but I would have thought a list data structure should have been able to handle that. It would seem that split is making groups that have the same start and end dates (or that if, for example, I have sale data for precisely the last year, split would insist on both 2009 and 2010 having weeks from 0 through 52 instead of just the weeks in each year that actually have data: 26 through 52 for last year and 1 through 25 for this year). I don't see how else the data passed to fitdist could have a sample size of 0. I'd appreciate understanding how to resolve this. However, it isn't s show stopper as it now seems trivial to just break it out into a loop (followed by a lapply/split combo using only sale year and sale month). While I am asking, is there a better way to split such temporally ordered data into weekly samples that respective the year in which the sample is taken as well as the week in which it is taken? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What is the degrees of freedom in an nlme model
Dear all, I want to do a F test, which involves calculation of the degrees of freedom for the residuals. Now say, I have a nlme object mod.nlme. I have two questions 1.How do I extract the degrees of freedom? 2.How is this degrees of freedom calculated in an nlme model? Thanks. Jun Shen Some sample code and data = mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data, fixed=E0+Emax+gamma+EC50~1, random=list(pdDiag(EC50+E0+gamma~1)), groups=~ID, start=list(fixed=c(E0=1,Emax=100,gamma=1,b=50)) ) The Data object structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90), RESP = c(3.19, 2.52, 2.89, 3.28, 3.82, 7.15, 11.2, 16.25, 30.32, 55.25, 73.56, 82.07, 89.08, 95.86, 97.97, 99.03, 3.49, 4.4, 3.54, 4.99, 3.81, 10.12, 21.59, 24.93, 40.18, 61.01, 78.65, 88.81, 93.1, 94.61, 98.83, 97.86, 0.42, 0, 2.58, 5.67, 3.64, 8.01, 12.75, 13.27, 24.65, 46.1, 65.16, 77.74, 87.99, 94.4, 96.05, 100.4, 2.43, 0, 6.32, 5.59, 8.48, 12.32, 26.4, 28.36, 43.38, 69.56, 82.53, 91.36, 95.37, 98.36, 98.66, 98.8, 5.16, 2, 5.65, 3.48, 5.78, 5.5, 11.55, 8.53, 18.02, 38.11, 58.93, 70.93, 85.62, 89.53, 96.19, 96.19, 2.76, 2.99, 3.75, 3.02, 5.44, 3.08, 8.31, 10.85, 13.79, 32.06, 50.22, 63.7, 81.34, 89.59, 93.06, 92.47, 3.32, 1.14, 2.43, 2.75, 3.02, 5.4, 8.49, 7.91, 15.17, 35.01, 53.91, 68.51, 83.12, 86.85, 92.17, 95.72, 3.58, 0.02, 3.69, 4.34, 6.32, 5.15, 9.7, 11.39, 23.38, 42.9, 61.91, 71.82, 87.83)), .Names = c(ID, CP, RESP), class = data.frame, row.names = c(NA, -125L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Robust regression error: Too many singular resamples
Hello. I've got a dataset that may have outliers in both x and y. While I am not at all familiar with robust regression, it looked like the function lmrob in package robustbase should handle this situation. When I try to use it, I get: Too many singular resamples Aborting fast_s_w_mem() Looking into it further, it appears that for an indicator variable in one of my interaction terms, 98% of the data have value 1 and only 2% have value 0. I believe this is the cause of the problem, but am confused as to why the algorithm cannot handle this situation. The probability of actually getting a singular sample ought to be fairly low, unless the sample sizes are fairly tiny. Is there some parameter I can tweak to increase the sample size, or is something else going on? You can easily reproduce this by running the following. Any advice would be appreciated. Thank you. library(robustbase) x - rnorm(1) isZ - c(rep(1,9800),rep(0,200)) y - rnorm(1) model - lmrob(y~x*isZ) -- View this message in context: http://r.789695.n4.nabble.com/Robust-regression-error-Too-many-singular-resamples-tp2286585p2286585.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: How to change font of labels in geom_text
I have the same problem and I wonder if there is any answer from the community. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/ggplot2-How-to-change-font-of-labels-in-geom-text-tp991579p2286671.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create sequence in month
On Mon, Jul 12, 2010 at 2:25 PM, Bogaso Christofer bogaso.christo...@gmail.com wrote: Hi all, can anyone please guide me how to create a sequence of months? Here I have tried following however couldn't get success library(zoo) seq(as.yearmon(2010-01-01), as.yearmon(2010-03-01), by=1 month) There currently is no seq method (we will make a note to add one) for yearmon but you can add the appropriate sequence to the starting yearmon object, use zooreg or convert to numeric or Date, perform the seq and then convert back # yearmon + seq as.yearmon(2010-01-01) + 0:2/12 as.yearmon(2010-01) + 0:2/12 # zooreg time(zooreg(1:3, as.yearmon(2010-01), freq = 12)) # seq.default as.yearmon(seq(as.numeric(as.yearmon(2010-01)), as.numeric(as.yearmon(2010-03)), 1/12)) # seq.Date as.yearmon(seq(as.Date(2010-01-01), as.Date(2010-03-01), by = month)) Also note that if the reason you are doing this is to create a zooreg object then its not necessary to explicitly form the sequence in the first place since zooreg only requires the starting time and a frequency as shown above. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exercise in frustration: applying a function to subsamples
try 'drop=TRUE' on the split function call. This will prevent the NULL set from being sent to the function. On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers r.ted.by...@gmail.com wrote: From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: ddply(moreinfo,c(m_id,sale_year,sale_week), + function(df) data.frame(res = fitdist(df$elapsed_time,exp),est = res$estimate,sd = res$sd)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 and lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)), + function(df) fitdist(df$elapsed_time,exp)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 Now, in retrospect, unless I misunderstood the properties of a data.frame, I suppose a data.frame might not have been entirely appropriate as the m_id samples start and end on very different dates, but I would have thought a list data structure should have been able to handle that. It would seem that split is making groups that have the same start and end dates (or that if, for example, I have sale data for precisely the last year, split would insist on both 2009 and 2010 having weeks from 0 through 52 instead of just the weeks in each year that actually have data: 26 through 52 for last year and 1 through 25 for this year). I don't see how else the data passed to fitdist could have a sample size of 0. I'd appreciate understanding how to resolve this. However, it isn't s show stopper as it now seems trivial to just break it out into a loop (followed by a lapply/split combo using only sale year and sale month). While I am asking, is there a better way to split such temporally ordered data into weekly samples that respective the year in which the sample is taken as well as the week in which it is taken? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ed50
I am using semiparametric Model library(mgcv) sm1=gam(y~x1+s(x2),family=binomial, f) How should I find out standard error for ed50 for the above model ED50 =( -sm1$coef[1]-f(x2)) / sm1$coef [2] f(x2) is estimated value for non parametric term. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exercise in frustration: applying a function to subsamples
OK, here is a stripped down variant of my code. I can run it here unchanged (apart from the credentials for connecting to my DB). Sys.setenv(MYSQL_HOME='C:/Program Files/MySQL/MySQL Server 5.0') library(TSMySQL) library(plyr) library(fitdistrplus) con - dbConnect(MySQL(), user=rejbyers, password=jesakos, dbname=merchants2) x - sprintf(SELECT m_id,sale_date,YEAR(sale_date) AS sale_year,WEEK(sale_date) AS sale_week,return_type,0.0001 + DATEDIFF(return_date,sale_date) AS elapsed_time FROM `risk_input` WHERE DATEDIFF(return_date,sale_date) IS NOT NULL) x moreinfo - dbGetQuery(con, x) str(moreinfo) #moreinfo #print(moreinfo) dbDisconnect(con) f1 - fitdist(moreinfo$elapsed_time,exp); summary(f1) lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week),drop = TRUE), function(df) fitdist(df$elapsed_time,exp)) I guess that for others to run this script, it is just necessary to create some sample data, consisting of two or more m_id values (I have several hundred), and temporally ordered data for each. I am not familiar enough with R to know how to do that using R.Usually, if I need dummy data, I make it with my favourite rng using either C++ or Perl. I am still trying to get used to R. Each record in my data has one random variate and a MySQL TIMESTAMP (nn-nn- nn:nn:nn), anywhere from hundreds to thousands each week for anywhere from a few months to several years. My SQL actually produces the random variate by taking the difference between the sale date and return date, and is structured as it is because I know how to group by year and week from a timestamp field using SQL but didn't know how to accomplish the same thing in R. The statement 'x' by itself, always shows me the correct SQL statement to get the data (I can execute it unchanged in the mysql commandline client). 'str(moreinfo)' always gives me the data structure I expect. E.g.: str(moreinfo) 'data.frame': 177837 obs. of 6 variables: $ m_id: num 171 206 206 206 206 206 206 218 224 224 ... $ sale_date : chr 2008-04-25 07:41:09 2008-05-09 20:58:12 2008-09-06 19:51:52 2008-05-01 21:26:40 ... $ sale_year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ... $ sale_week : int 16 18 35 17 31 21 19 52 44 35 ... $ return_type : num 1 1 1 1 1 1 1 1 1 1 ... $ elapsed_time: num 0.0001 0.0001 3.0001 4.0001 21.0001 ... 'summary(f1)' shows me the results I expect from the aggregate data. E.g.: summary(f1) FITTING OF THE DISTRIBUTION ' exp ' BY MAXIMUM LIKELIHOOD PARAMETERS estimate Std. Error rate 0.0652917 0.0001547907 Loglikelihood: -663134.7 AIC: 1326271 BIC: 1326281 -- GOODNESS-OF-FIT STATISTICS _ Chi-squared_ Chi-squared statistic: 400277239 Degree of freedom of the Chi-squared distribution: 56 Chi-squared p-value: 0 !!! the p-value may be wrong with some theoretical counts 5 !!! !!! For continuous distributions, Kolmogorov-Smirnov and Anderson-Darling statistics should be prefered !!! _ Kolmogorov-Smirnov_ Kolmogorov-Smirnov statistic: 0.1660987 Kolmogorov-Smirnov test: rejected !!! The result of this test may be too conservative as it assumes that the distribution parameters are known !!! _ Anderson-Darling_ Anderson-Darling statistic: Inf Anderson-Darling test: rejected And at the end, I get the error I mentioned. NB: In this variant, I added drop = TRUE as Jim suggested. lapply(split(all_samples,list(all_samples$m_id,all_samples$sale_year,all_samples$sale_week),drop = TRUE), + function(df) fitdist(df$elapsed_time,exp)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 If, then, drop = TRUE results in all empty combinations of m_id, year and week being excluded, then (noticing the requirement is actually that the sample size be greater than 1), I can only conclude that at least one of the samples has only 1 record. But that is too small. Is there a way to allow the above code to apply fitdist only if the sample size of a given subsample is greater than, say, 100? Even better, is there a way to make the split more dynamic, so that it groups a given m_id's data by month if the average weekly subsample size is less than 100, or by day if the average weekly subsample is greater than 1000? Thanks Ted On Mon, Jul 12, 2010 at 3:20 PM, Erik Iverson er...@ccbr.umn.edu wrote: Your code is not reproducible. Can you come up with a small example showing the crux of your data structures/problem, that we can all run in our R sessions? You're likely get much higher quality responses this way. Ted Byers wrote: From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have
Re: [R] exercise in frustration: applying a function to subsamples
Thanks Jim, I acted on your suggestion and found the result unchanged. :-( Then I noticed that fitdist doesn't like a sample size of 1 either. If, then, drop = TRUE results in all empty combinations of m_id, year and week being excluded, then (noticing the requirement is actually that the sample size be greater than 1), I can only conclude that at least one of the samples has only 1 record. I hadn't realized that some of the subsamples were that small. In my reply to Erik, I wrote: But that is too small. Is there a way to allow the above code to apply fitdist only if the sample size of a given subsample is greater than, say, 100? Even better, is there a way to make the split more dynamic, so that it groups a given m_id's data by month if the average weekly subsample size is less than 100, or by day if the average weekly subsample is greater than 1000? Thanks Ted On Mon, Jul 12, 2010 at 4:02 PM, jim holtman jholt...@gmail.com wrote: try 'drop=TRUE' on the split function call. This will prevent the NULL set from being sent to the function. On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers r.ted.by...@gmail.com wrote: From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: ddply(moreinfo,c(m_id,sale_year,sale_week), + function(df) data.frame(res = fitdist(df$elapsed_time,exp),est = res$estimate,sd = res$sd)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 and lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)), + function(df) fitdist(df$elapsed_time,exp)) Error in fitdist(df$elapsed_time, exp) : data must be a numeric vector of length greater than 1 Now, in retrospect, unless I misunderstood the properties of a data.frame, I suppose a data.frame might not have been entirely appropriate as the m_id samples start and end on very different dates, but I would have thought a list data structure should have been able to handle that. It would seem that split is making groups that have the same start and end dates (or that if, for example, I have sale data for precisely the last year, split would insist on both 2009 and 2010 having weeks from 0 through 52 instead of just the weeks in each year that actually have data: 26 through 52 for last year and 1 through 25 for this year). I don't see how else the data passed to fitdist could have a sample size of 0. I'd appreciate understanding how to resolve this. However, it isn't s show stopper as it now seems trivial to just break it out into a loop (followed by a lapply/split combo using only sale year and sale month). While I am asking, is there a better way to split such temporally ordered data into weekly samples that respective the year in which the sample is taken as well as the week in which it is taken? Thanks Ted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- R.E.(Ted) Byers, Ph.D.,Ed.D. t...@merchantservicecorp.com CTO Merchant Services Corp. 350 Harry Walker Parkway North, Suite 8 Newmarket, Ontario L3Y 8L3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical Learning and Datamining Course October 2010 Washington DC
Short course: Statistical Learning and Data Mining III: Ten Hot Ideas for Learning from Data Trevor Hastie and Robert Tibshirani, Stanford University Georgetown University Conference Center Washington DC, October 11-12, 2010. This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. In this course we emphasize the tools useful for tackling modern-day data analysis problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our top-ten list of topics are: * Regression and Logistic Regression (two golden oldies), * Lasso and Related Methods, * Support Vector and Kernel Methodology, * Principal Components (SVD) and Variations: sparse SVD, supervised PCA, Nonnegative Matrix Factorization * Boosting, Random Forests and Ensemble Methods, * Rule based methods (PRIM), * Graphical Models, * Cross-Validation, * Bootstrap, * Feature Selection, False Discovery Rates and Permutation Tests. Our earlier courses are not a prerequisite for this new course. Although there is some overlap with past courses, our new course contains many topics not covered by us before. The material is based on recent papers by the authors and other researchers, as well as the new second edition of our best selling book: Statistical Learning: data mining, inference and prediction Hastie, Tibshirani Friedman, Springer-Verlag, 2009 http://www-stat.stanford.edu/ElemStatLearn/ A copy of this book will be given to all attendees. The lectures will consist of video-projected presentations and discussion. Go to the site http://www-stat.stanford.edu/~hastie/sldm.html for more information and online registration. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the degrees of freedom in an nlme model
Jun: Short answer: There is no such thing as df for a nonlinear model (whether or not mixed effects). Longer answer: df is the dimension of the null space when the data are projected on the linear subspace of the model matrix of a **linear model ** . So, strictly speaking, no linear model, no df. HOWEVER... nonlinear models are usually (always??) fit by successive linear approximations, and approximate df are obtained from these approximating subspaces. However, the problem with this is that there is no guarantee that the relevant residual distributions are sufficiently chisq with the approximate df to give reasonable answers. In fact, lots of people much smarter than I have spent lots of time trying to figure out what sorts of approximations one should use to get trustworthy results. The thing is, in nonlinear models, it can DEPEND on the exact form of the model -- indeed, that's what distinguishes nonlinear models from linear ones! So this turns out to be really hard and afaik these smart people don't agree on what should be done. To see what one of the smartest people have to say about this, search the archives for Doug Bates's comments on this w.r.t. lmer (he won't compute such distributions nor provide P values because he doesn't know how to do it reliably. Doug -- please correct me if I have it wrong). A stock way to extricate oneself from this dilemma is: bootstrap! Unfortunately, this is also probably too facile: for one thing, with a nondiagonal covariance matrix (as in mixed effects models), how do you resample to preserve the covariance structure? I believe this is an area of active research in the time series literature, for example. For another, this may be too computationally demanding to be practicable due to convergence issues. Bottom line: there may be no good way to do what you want. Note to experts: Please view this post as an invitation to correct my errors and provide authoritative info. Cheers to all, Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jun Shen Sent: Monday, July 12, 2010 12:34 PM To: R-help Subject: [R] What is the degrees of freedom in an nlme model Dear all, I want to do a F test, which involves calculation of the degrees of freedom for the residuals. Now say, I have a nlme object mod.nlme. I have two questions 1.How do I extract the degrees of freedom? 2.How is this degrees of freedom calculated in an nlme model? Thanks. Jun Shen Some sample code and data = mod.nlme-nlme(RESP~E0+(Emax-E0)*CP**gamma/(EC50**gamma+CP**gamma),data=Data , fixed=E0+Emax+gamma+EC50~1, random=list(pdDiag(EC50+E0+gamma~1)), groups=~ID, start=list(fixed=c(E0=1,Emax=100,gamma=1,b=50)) ) The Data object structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), CP = c(1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90, 120, 150, 200, 1, 2, 3, 4.5, 5, 7.5, 11.25, 12, 18, 30, 45, 60, 90), RESP = c(3.19, 2.52, 2.89, 3.28, 3.82, 7.15, 11.2, 16.25, 30.32, 55.25, 73.56, 82.07, 89.08, 95.86, 97.97, 99.03, 3.49, 4.4, 3.54, 4.99, 3.81, 10.12, 21.59, 24.93, 40.18, 61.01, 78.65, 88.81, 93.1, 94.61, 98.83, 97.86, 0.42, 0, 2.58, 5.67, 3.64, 8.01, 12.75, 13.27, 24.65, 46.1, 65.16, 77.74, 87.99, 94.4, 96.05, 100.4, 2.43, 0, 6.32, 5.59, 8.48, 12.32, 26.4, 28.36, 43.38, 69.56, 82.53, 91.36, 95.37, 98.36, 98.66, 98.8, 5.16, 2, 5.65, 3.48, 5.78, 5.5, 11.55, 8.53, 18.02, 38.11, 58.93, 70.93, 85.62, 89.53, 96.19, 96.19, 2.76, 2.99, 3.75, 3.02, 5.44, 3.08, 8.31, 10.85, 13.79, 32.06, 50.22, 63.7, 81.34, 89.59, 93.06, 92.47, 3.32, 1.14, 2.43, 2.75, 3.02, 5.4, 8.49, 7.91, 15.17, 35.01, 53.91, 68.51, 83.12, 86.85, 92.17, 95.72, 3.58, 0.02, 3.69, 4.34, 6.32, 5.15, 9.7, 11.39, 23.38, 42.9, 61.91, 71.82, 87.83)), .Names = c(ID, CP, RESP), class = data.frame, row.names = c(NA, -125L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] SAS to R
Hi everyone I dont know how to code in SAS but I do know how to code in R. Can someone please be kind enough to translate this into R code for me: proc mixed data = small method = reml; class id day; model weight = day/ solution ddfm = bw; repeated day/ subject=id type = unstructured; run; === so far i think it is gls(weight~day,corr=corSymm(???),method=REML,data=small) my main problem is I dont know how to get the unstructured covariance matrix to work Thank you -- View this message in context: http://r.789695.n4.nabble.com/SAS-to-R-tp2286695p2286695.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to select the column header with \Sexpr{}
Hi: Since I work with a few different fish runs my column headers change everytime I start a new Year. I have been using \Sexpr{} for my row and columns and now I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1, what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like it..Thanks in advance for any hints Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select the column header with \Sexpr{}
On 12/07/2010 5:10 PM, Felipe Carrillo wrote: Hi: Since I work with a few different fish runs my column headers change everytime I start a new Year. I have been using \Sexpr{} for my row and columns and now I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1, what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like it..Thanks in advance for any hints \Sexpr takes an R expression, and inserts the first element of the result into your text. Using just 0,1 (not including the quotes) is not a valid R expression. You need to use paste() or some other function to construct the label you want to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] findInterval and data resolution
On 12/07/2010 5:25 PM, Bryan Hanson wrote: Hello Wise Ones... I need a clever way around a problem with findInterval. Consider: vec1 - 1:10 vec2 - seq(1, 10, by = 0.1) x1 - c(2:3) a1 - findInterval(x1, vec1); a1 # example 1 a2 - findInterval(x1, vec2); a2 # example 2 In the problem I'm working on, vec* may be either integer or numeric, like vec1 and vec2. I need to remove one or more sections of this vector; for instance if I ask to remove values 2:3 I want to remove all values between 2 and 3 regardless of the resolution of the data (in my thinking, vec2 is more dense or has better resolution than vec1). So example 1 above works fine because the values 2 and 3 are the end points of a range that includes no values in-between (a1). But, in example 2 the answer is, correctly, also the end points, but now there are values in between these end points. Hence a2 doesn't include the indices of the values in-between the end points. I have looked at cut, but it doesn't quite behave the way I want since if I set x1 - c(2:4) I get more intervals than I really want and cleaning it up will be laborious. I think I can construct the full set of indices I want with a2[1]:a2[2] but is there a more clever way to do this? I'm thinking there might be a function out there that I am not aware of. I'm not sure I understand what you want. If you know x1 will always be an increasing vector, you could use something like a2[1]:a2[length(a2)] to select the full range of indices that it covers. If x1 is not necessarily in increasing order, you'll have to do min(a2):max(a2) (which might be clearer in any case). If you're more interested in the range of values in vec*, maybe range(vec2[min(a2):max(a2)]) will give you want you want. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about food sampling analysis
Greetings to all, and my apologies for a question that is mostly about statistics and secondarily about R. I have just started a new job that (this week, apparently) requires statistical knowledge beyond my training (as an epidemiologist). The problem: - We have 57 food production facilities in three categories - Samples of 4-6 different foods were tested for listeria at each facility - I need to describe the presence of listeria in food (1) overall and (2) by facility category. I know that samples within each facility cannot be treated as independent, so I need an approach that accounts for (1) clustering within facilities and (2) the different number of samples taken at each facility. If someone could kindly point me towards the right type of analysis for this and/or its associated R functions/packages, I would greatly appreciate it. Many thanks, Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] findInterval and data resolution
How about this: these = which(vec2 x1[1] | vec2 x1[2]) vec2[these] # Or using logical indexation directly: vec2[vec2 x1[1] | vec2 x1[2]] From: Bryan Hanson han...@depauw.edu To:R Help r-h...@stat.math.ethz.ch Date: 13/Jul/2010 9:28a Subject: [R] findInterval and data resolution Hello Wise Ones... I need a clever way around a problem with findInterval. Consider: vec1 - 1:10 vec2 - seq(1, 10, by = 0.1) x1 - c(2:3) a1 - findInterval(x1, vec1); a1 # example 1 a2 - findInterval(x1, vec2); a2 # example 2 In the problem I'm working on, vec* may be either integer or numeric, like vec1 and vec2. I need to remove one or more sections of this vector; for instance if I ask to remove values 2:3 I want to remove all values between 2 and 3 regardless of the resolution of the data (in my thinking, vec2 is more dense or has better resolution than vec1). So example 1 above works fine because the values 2 and 3 are the end points of a range that includes no values in-between (a1). But, in example 2 the answer is, correctly, also the end points, but now there are values in between these end points. Hence a2 doesn't include the indices of the values in-between the end points. I have looked at cut, but it doesn't quite behave the way I want since if I set x1 - c(2:4) I get more intervals than I really want and cleaning it up will be laborious. I think I can construct the full set of indices I want with a2[1]:a2[2] but is there a more clever way to do this? I'm thinking there might be a function out there that I am not aware of. TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select the column header with \Sexpr{}
Thanks for the quick reply Duncan. I don't think I have explained myself well, I have a dataset named report and my column headers are run1,run2,run3,run4 and so on. I know how to access the data below those columns with \Sexpr{report[1,1]} \Sexpr{report[1,2]} and so on, but I can't access my column headers with \Sexpr{} because I can't find the way to reference run1,run2,run3 and run4. Sorry if I am not explain myself really well. - Original Message From: Duncan Murdoch murdoch.dun...@gmail.com To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, July 12, 2010 2:18:15 PM Subject: Re: [R] How to select the column header with \Sexpr{} On 12/07/2010 5:10 PM, Felipe Carrillo wrote: Hi: Since I work with a few different fish runs my column headers change everytime I start a new Year. I have been using \Sexpr{} for my row and columns and now I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1, what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like it..Thanks in advance for any hints \Sexpr takes an R expression, and inserts the first element of the result into your text. Using just 0,1 (not including the quotes) is not a valid R expression. You need to use paste() or some other function to construct the label you want to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] findInterval and data resolution
Thanks Duncan... More appended at the bottom... On 7/12/10 5:38 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 12/07/2010 5:25 PM, Bryan Hanson wrote: Hello Wise Ones... I need a clever way around a problem with findInterval. Consider: vec1 - 1:10 vec2 - seq(1, 10, by = 0.1) x1 - c(2:3) a1 - findInterval(x1, vec1); a1 # example 1 a2 - findInterval(x1, vec2); a2 # example 2 In the problem I'm working on, vec* may be either integer or numeric, like vec1 and vec2. I need to remove one or more sections of this vector; for instance if I ask to remove values 2:3 I want to remove all values between 2 and 3 regardless of the resolution of the data (in my thinking, vec2 is more dense or has better resolution than vec1). So example 1 above works fine because the values 2 and 3 are the end points of a range that includes no values in-between (a1). But, in example 2 the answer is, correctly, also the end points, but now there are values in between these end points. Hence a2 doesn't include the indices of the values in-between the end points. I have looked at cut, but it doesn't quite behave the way I want since if I set x1 - c(2:4) I get more intervals than I really want and cleaning it up will be laborious. I think I can construct the full set of indices I want with a2[1]:a2[2] but is there a more clever way to do this? I'm thinking there might be a function out there that I am not aware of. I'm not sure I understand what you want. If you know x1 will always be an increasing vector, you could use something like a2[1]:a2[length(a2)] to select the full range of indices that it covers. If x1 is not necessarily in increasing order, you'll have to do min(a2):max(a2) (which might be clearer in any case). If you're more interested in the range of values in vec*, maybe range(vec2[min(a2):max(a2)]) min(a2):max(a2) is very helpful, as it fixes another problem that I did not post about. More generally, I want to pass a vector of pairs of values to be removed, like this: x1 - c(2:3, 8:9); a3 - findInterval(x1, vec2) a3 # which turns out to be 11 21 71 81 Where I want my function to remove all values between 2 and 3, and between 8 and 9, regardless of how many values are between these indices. So in the example of a3, I want to remove everything between 11 and 21, and everything between 71 and 81, keeping everything else. I think I can put together a function pretty quickly that takes x1 in sequential pairs and returns all the intervening indicies which can then be used to clean up the original vector. Thanks again, and if anyone has another idea, do tell! Bryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparison of two very large strings
Hi, I have a function in R that compares two very large strings for about 1 million records. The strings are very large URLs like:- http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8. .. or of larger lengths. The data-frame looks like:- id url 1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8. .. 2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N. .. 4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt and so on for about 1 million records. Here is the function that I am using to compare the two strings:- stringCompare - function(currentURL, currentId){ j - currentId - 1 while(j=1) previousURL - urlDataFrame[j,url] previousURLLength - nchar(previousURL) #Compare smaller with bigger if(nchar(currentURL) = previousURLLength){ matchPhrase - substr(previousURL,1,nchar(currentURL)) if(matchPhrase == currentURL){ return(TRUE) } }else{ matchPhrase - substr(currentURL,1,previousURLLength) if(matchPhrase == previousURL){ return(TRUE) } } j - j -1 } return(FALSE) } Here, I compare the URL at a given row with all the previous URLs in the data-frame. I compare the smaller of the two given URls with the larger one (upto the length of the smaller). When I run the above function for about 1 million records, the execution becomes really slow, which otherwise is fast if I remove the string comparison step. Any ideas how it can be implemented in a fast and efficient way. Thanks and Regards, Harsh Yadav [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix Column Names
Hi, Is there a way to create a matrix in which the column names are not checked to see if they are valid variable names? I'm looking something similar to the check.names argument to data.frame. If so, would such an approach work for the sparse matrix classes in the Matrix package. Many thanks! Cheers, Dave __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select the column header with \Sexpr{}
On Jul 12, 2010, at 5:45 PM, Felipe Carrillo wrote: Thanks for the quick reply Duncan. I don't think I have explained myself well, I have a dataset named report and my column headers are run1,run2,run3,run4 and so on. I know how to access the data below those columns with \Sexpr{report[1,1]} \Sexpr{report[1,2]} and so on, but I can't access my column headers with \Sexpr{} because I can't find the way to reference run1,run2,run3 and run4. Sorry if I am not explain myself really well. Wouldn't this just be: \Sexpr{names(report)} # ? or perhaps you want specific items in that vector? Sexpr{names(report)[1]}, Sexpr{names(report)[2]}, etc -- David. - Original Message From: Duncan Murdoch murdoch.dun...@gmail.com To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, July 12, 2010 2:18:15 PM Subject: Re: [R] How to select the column header with \Sexpr{} On 12/07/2010 5:10 PM, Felipe Carrillo wrote: Hi: Since I work with a few different fish runs my column headers change everytime I start a new Year. I have been using \Sexpr{} for my row and columns and now I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1, what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like it..Thanks in advance for any hints \Sexpr takes an R expression, and inserts the first element of the result into your text. Using just 0,1 (not including the quotes) is not a valid R expression. You need to use paste() or some other function to construct the label you want to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparison of two very large strings
On Jul 12, 2010, at 6:03 PM, harsh yadav wrote: Hi, I have a function in R that compares two very large strings for about 1 million records. The strings are very large URLs like:- http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 . .. or of larger lengths. The data-frame looks like:- id url 1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 . .. 2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N . .. 4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt and so on for about 1 million records. Here is the function that I am using to compare the two strings:- stringCompare - function(currentURL, currentId){ j - currentId - 1 while(j=1) previousURL - previousURLLength - nchar(previousURL) #Compare smaller with bigger if(nchar(currentURL) = previousURLLength){ matchPhrase - substr(previousURL,1,nchar(currentURL)) if(matchPhrase == currentURL){ return(TRUE) } }else{ matchPhrase - substr(currentURL,1,previousURLLength) if(matchPhrase == previousURL){ return(TRUE) } } j - j -1 } return(FALSE) } Couldn't you just store the url vector after running through nchar and then do the comparison in a vectorized manner? test - rd.txt('id url 1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt ', stringsAsFactors=FALSE) copyUrls - test[,url] sizeUrls - nchar(copyUrls) lengU - length(sizeUrls) sizidx - pmax(sizeUrls[1:(lengU-1)], sizeUrls[2:(lengU)]) substr(copyUrls[2:lengU], 1, sizidx) == substr(copyUrls[1:(lengU-1)], 1, sizidx) #[1] FALSE FALSE TRUE FALSE Here, I compare the URL at a given row with all the previous URLs in the data-frame. I compare the smaller of the two given URls with the larger one (upto the length of the smaller). When I run the above function for about 1 million records, the execution becomes really slow, which otherwise is fast if I remove the string comparison step. Any ideas how it can be implemented in a fast and efficient way. Thanks and Regards, Harsh Yadav [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Xyplot or Tin-R problem?
I ran the following script from xyplot Examples using Tin-R on Windows and saw no plot produced. EE - equal.count(ethanol$E, number=9, overlap=1/4) xyplot(NOx ~ C | EE, data=ethanol, prepanel = function(x,y) prepanel.loess(x, y, span=1), xlab=Compression Ratio, ylab=NOx (micrograms/J), panel = function(x,y) { panel.grid()(h = -1, v=2) panel.xyplot(x,y) panel.loess(x,y, span=1) }, aspect = xy) The Rgui showed source(.trPaths[5]) Without any error msg. Did I miss anything? Please enlighten me. Richard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Continuing on with a loop when there's a failure
Hi R sages, Here is my latest problem. Consider the following toy example: x - read.table(textConnection(y1 y2 y3 x1 x2 indv.1 bagels donuts bagels 4 6 indv.2 donuts donuts donuts 5 1 indv.3 donuts donuts donuts 1 10 indv.4 donuts donuts donuts 10 9 indv.5 bagels donuts bagels 0 2 indv.6 bagels donuts bagels 2 9 indv.7 bagels donuts bagels 8 5 indv.8 bagels donuts bagels 4 1 indv.9 donuts donuts donuts 3 3 indv.10 bagels donuts bagels 5 9 indv.11 bagels donuts bagels 9 10 indv.12 bagels donuts bagels 3 1 indv.13 donuts donuts donuts 7 10 indv.14 bagels donuts bagels 2 10 indv.15 bagels donuts bagels 9 6), header = TRUE) I want to fit a logistic regression of y1 on x1 and x2. Then I want to run a logistic regression of y2 on x1 and x2. Then I want to run a logistic regression of y3 on x1 and x2. In reality I have many more Y columns than simply y1, y2, and y3, so I must design a loop. Notice that y2 is invariant and thus it will fail. In reality, some y columns will fail for much more subtle reasons. Simply screening my data to eliminate invariant columns will not eliminate the problem. What I want to do is output a piece of the results from each run of the loop to a matrix. I want the to try each of my y columns, and not give up and stop running simply because a particular y column is bad. I want it to give me NA or something similar in my results matrix for the bad y columns, but I want it to keep going give me good data for the good y columns. For instance: results - matrix(nrow = 1, ncol = 3) colnames(results) - c(y1, y2, y3) for (i in 1:2) { mod.poly3 - lrm(x[,i] ~ pol(x1, 3) + pol(x2, 3), data=x) results[1,i] - anova(mod.poly3)[1,3] } If I run this code, it gives up when fitting y2 because the y2 is bad. It doesn't even try to fit y3. Here's what my console shows: results y1 y2 y3 [1,] 0.6976063 NA NA As you can see, it gave up before fitting y3, which would have worked. How do I force my code to keep going through the loop, despite the rotten apples it encounters along the way? Exact code that gets the job done is what I am interested in. I am a post-doc -- I am not taking any classes. I promise this is not a homework assignment! Thanks in advance, --- Josh Banta, Ph.D Center for Genomics and Systems Biology New York University 100 Washington Square East New York, NY 10003 Tel: (212) 998-8465 http://plantevolutionaryecology.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparison of two very large strings
On Jul 12, 2010, at 6:46 PM, David Winsemius wrote: On Jul 12, 2010, at 6:03 PM, harsh yadav wrote: Hi, I have a function in R that compares two very large strings for about 1 million records. The strings are very large URLs like:- http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 . .. or of larger lengths. The data-frame looks like:- id url 1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 . .. 2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N . .. 4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt and so on for about 1 million records. Here is the function that I am using to compare the two strings:- stringCompare - function(currentURL, currentId){ j - currentId - 1 while(j=1) previousURL - previousURLLength - nchar(previousURL) #Compare smaller with bigger if(nchar(currentURL) = previousURLLength){ matchPhrase - substr(previousURL,1,nchar(currentURL)) if(matchPhrase == currentURL){ return(TRUE) } }else{ matchPhrase - substr(currentURL,1,previousURLLength) if(matchPhrase == previousURL){ return(TRUE) } } j - j -1 } return(FALSE) } Couldn't you just store the url vector after running through nchar and then do the comparison in a vectorized manner? test - rd.txt('id url 1 http://query.nytimes.com/gst/sitesearch_selector.html?query=US+Visa+Lawstype=nytx=25y=8 2 http://query.nytimes.com/search/sitesearch?query=US+Visa+Lawssrchst=cse 3 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 4 http://www.google.com/search?hl=enq=us+student+visa+changes+9/11+washington+poststart=10sa=N 5 http://www.google.com/url?sa=Ustart=11q=http://app1.chinadaily.com.cn/star/2004/0610/fo4-1.htmlei=uUKwSe7XN9CCt ', stringsAsFactors=FALSE) copyUrls - test[,url] sizeUrls - nchar(copyUrls) lengU - length(sizeUrls) sizidx - pmax(sizeUrls[1:(lengU-1)], sizeUrls[2:(lengU)]) substr(copyUrls[2:lengU], 1, sizidx) == substr(copyUrls[1: (lengU-1)], 1, sizidx) #[1] FALSE FALSE TRUE FALSE Let me hasten to admit that when I tried to fix what I thought was an error in that program, I go the same result. It seemed as though I should have been getting errors by choosing the maximum string length. Changing the pmax to pmin did not alter the results ... to my puzzlement ... until I further noticed that urls #3 and #4 were of the same length. When I extend the lengths, then only the version using pmin works properly. -- David. Here, I compare the URL at a given row with all the previous URLs in the data-frame. I compare the smaller of the two given URls with the larger one (upto the length of the smaller). When I run the above function for about 1 million records, the execution becomes really slow, which otherwise is fast if I remove the string comparison step. Any ideas how it can be implemented in a fast and efficient way. Thanks and Regards, Harsh Yadav [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] print.trellis draw.in - plaintext (gmail mishap)
The problem is that you have not pushed your viewport so it doesn't exist in the plot. (You only pushed the layout viewport). grid.ls(viewports = TRUE) ROOT GRID.VP.82 Try this: vp - vplayout(2,2) pushViewport(vp) upViewport() grid.ls(viewports = TRUE) #ROOT # GRID.VP.82 #GRID.VP.86 print(p, newpage = FALSE, draw.in = vp$name) -Felix On 13 July 2010 01:22, Mark Connolly wmcon...@ncsu.edu wrote: require(grid) require(lattice) fred = data.frame(x=1:5,y=runif(5)) vplayout - function (x,y) viewport(layout.pos.row=x, layout.pos.col=y) grid.newpage() pushViewport(viewport(layout=grid.layout(2,2))) p = xyplot(y~x,fred) print( p,newpage=FALSE,draw.in=vplayout(2,2)$name) On Mon, Jul 12, 2010 at 8:58 AM, Felix Andrews fe...@nfrac.org wrote: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Yes, please, reproducible code. On 10 July 2010 00:49, Mark Connolly wmcon...@ncsu.edu wrote: I am attempting to plot a trellis object on a grid. vplayout = viewport(layout.pos.row=x, layout.pos.col=y) grid.newpage() pushViewport(viewport(layout=grid.layout(2,2))) g1 = ggplot() ... g2 = ggplot() ... g3 = ggplot() ... p = xyplot() ... # works as expected print(g1, vp=vplayout(1,1)) print(g2, vp=vplayout(1,2)) print(g3, vp=vplayout(2,1)) # does not work print( p, newpage=FALSE, draw.in=vplayout(2,2)$name) Error in grid.Call.graphics(L_downviewport, name$name, strict) : Viewport 'GRID.VP.112' was not found What am I doing wrong? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Continuing on with a loop when there's a failure
On Jul 12, 2010, at 6:18 PM, Josh B wrote: Hi R sages, Here is my latest problem. Consider the following toy example: x - read.table(textConnection(y1 y2 y3 x1 x2 indv.1 bagels donuts bagels 4 6 indv.2 donuts donuts donuts 5 1 indv.3 donuts donuts donuts 1 10 indv.4 donuts donuts donuts 10 9 indv.5 bagels donuts bagels 0 2 indv.6 bagels donuts bagels 2 9 indv.7 bagels donuts bagels 8 5 indv.8 bagels donuts bagels 4 1 indv.9 donuts donuts donuts 3 3 indv.10 bagels donuts bagels 5 9 indv.11 bagels donuts bagels 9 10 indv.12 bagels donuts bagels 3 1 indv.13 donuts donuts donuts 7 10 indv.14 bagels donuts bagels 2 10 indv.15 bagels donuts bagels 9 6), header = TRUE) I want to fit a logistic regression of y1 on x1 and x2. Then I want to run a logistic regression of y2 on x1 and x2. Then I want to run a logistic regression of y3 on x1 and x2. In reality I have many more Y columns than simply y1, y2, and y3, so I must design a loop. Notice that y2 is invariant and thus it will fail. In reality, some y columns will fail for much more subtle reasons. Simply screening my data to eliminate invariant columns will not eliminate the problem. What I want to do is output a piece of the results from each run of the loop to a matrix. I want the to try each of my y columns, and not give up and stop running simply because a particular y column is bad. I want it to give me NA or something similar in my results matrix for the bad y columns, but I want it to keep going give me good data for the good y columns. For instance: results - matrix(nrow = 1, ncol = 3) colnames(results) - c(y1, y2, y3) for (i in 1:2) { mod.poly3 - lrm(x[,i] ~ pol(x1, 3) + pol(x2, 3), data=x) results[1,i] - anova(mod.poly3)[1,3] } If I run this code, it gives up when fitting y2 because the y2 is bad. It doesn't even try to fit y3. Here's what my console shows: results y1 y2 y3 [1,] 0.6976063 NA NA As you can see, it gave up before fitting y3, which would have worked. How do I force my code to keep going through the loop, despite the rotten apples it encounters along the way? ?try http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f (Doesn't only apply to simulations.) Exact code that gets the job done is what I am interested in. I am a post-doc -- I am not taking any classes. I promise this is not a homework assignment! -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Xyplot or Tin-R problem?
On Jul 12, 2010, at 6:26 PM, YANG, Richard ChunHSi wrote: I ran the following script from xyplot Examples using Tin-R on Windows and saw no plot produced. EE - equal.count(ethanol$E, number=9, overlap=1/4) xyplot(NOx ~ C | EE, data=ethanol, prepanel = function(x,y) prepanel.loess(x, y, span=1), xlab=Compression Ratio, ylab=NOx (micrograms/J), panel = function(x,y) { panel.grid()(h = -1, v=2) panel.xyplot(x,y) panel.loess(x,y, span=1) }, aspect = xy) The Rgui showed source(.trPaths[5]) Without any error msg. Did I miss anything? Please enlighten me. I got the example to work fine but had no plotting with your version and cannot see the difference in the code. I assigned them to t1 nd t2 and ... all.equal(t1, t2) [1] Component 5: target, current do not match when deparsed [2] Component 29: target, current do not match when deparsed Looking at str applied to both does not illuminate me. I have seen problems on my Mac with examples copied from the help page and I suspect there is some invisible character sitting in a copy-pasted version that out mail-clients are not displaying. What happens if you try: examples(xyplot) #??? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correct function name for text display of S4 object
Hello, I am working on an R package for storing order book data. I currently have a display method that has the following output (ob is an S4 object): display(ob) Current time is 09:35:02 Price Ask Size -- 11.42 900 11.41 1,400 11.40 1,205 11.39 1,600 11.38 400 -- 2,700 11.36 1,100 11.35 1,100 11.34 1,600 11.33 700 11.32 -- Bid Size Price The package already has show, summary, and plot methods. Is there a more conventional name than display for the above output, or is display as good as any other name? Thanks, Andrew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] do the standard R analysis functions handle spatial grid data?
Hi everyone, I'm doing a resource function analysis with radio collared dingos and GIS info. The ecologist I'm working with wants to send me the data in a 'grid format'...straight out of ARCVIEW GIS. I want to model the data using a GLM and maybe a LOGISTIC model as well. And I was planning on using the glm and logistic functions in R. Now I'm pretty sure that these functions require the data to be in a 2-D spreadsheet format. And for me to call the responses and predictors as columns from a data.frame (or 2-D matrix) However I'm being told they can handle the data in a 'grid' format. So I'm pretty sure this would mean I would be calling the responses and predictors as 2-d matrices...and I don't think these functions can do that? Can anyone enlighten me? Am I right in thinking these function cannot handle data in a 3-D 'grid' format and require data to be entered as a 2-d data.frame or matrix? Are there other special functions out there that can handle this type of data, and I should be using these instead? Thanks for your help Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do the standard R analysis functions handle spatial grid data?
Have a look at the Task View for spatial data... http://cran.ms.unimelb.edu.au/web/views/Spatial.html From: chris howden tall.chr...@yahoo.com.au To:r-help@r-project.org, r-sig-ecology-requ...@r-project.org Date: 13/Jul/2010 2:01p Subject: [R] do the standard R analysis functions handle spatial grid data? Hi everyone, I'm doing a resource function analysis with radio collared dingos and GIS info. The ecologist I'm working with wants to send me the data in a 'grid format'...straight out of ARCVIEW GIS. I want to model the data using a GLM and maybe a LOGISTIC model as well. And I was planning on using the glm and logistic functions in R. Now I'm pretty sure that these functions require the data to be in a 2-D spreadsheet format. And for me to call the responses and predictors as columns from a data.frame (or 2-D matrix) However I'm being told they can handle the data in a 'grid' format. So I'm pretty sure this would mean I would be calling the responses and predictors as 2-d matrices...and I don't think these functions can do that? Can anyone enlighten me? Am I right in thinking these function cannot handle data in a 3-D 'grid' format and require data to be entered as a 2-d data.frame or matrix? Are there other special functions out there that can handle this type of data, and I should be using these instead? Thanks for your help Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing files on password-protected FTP sites
Hello everyone, Is it possible to download data from password-protected ftp sites? I saw another thread with instructions for uploading files using RCurl, but I could not find information for downloading them in the RCurl documentation. I am using R 2.11 on a Windows XP 32-bit machine. Thanks in advance, Cliff -- View this message in context: http://r.789695.n4.nabble.com/Accessing-files-on-password-protected-FTP-sites-tp2286862p2286862.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS Proc summary/means as a R function
Hi, I am new to R. I am trying to create an R function to do a SAS proc means/summary proc.means ( data=bsebal; class team year; var ab h; output out=BseBalAvg mean=; run;) I have a solution if I quote the the argument. The working code to produce BseBalAvg is very elegant. normalize - melt(bsebal, id=c(team, year))# normalize data transpose - cast(normalize, team + year ~ variable ,mean) # team year h ab (means) Here is the problem In SAS we have the option parmbuff which puts all the 'macro arguments' text into one string ie %macro procmeans(text)/parmbuff; %put text; %mend procmeans; %procmeans(This is a sentence); result This is a sentence Here is my R code # This works proc.means - function() { sapply(match.call()[-1],deparse) } proc.means(thisisasentence) Result thisisasentence Note sapply allows for multiple arguments and is not needed but is more robust. # However this does not work proc.means(this is a sentence) unexpected symbol in proc means(this is) It appears that the second space causes the error I have had some luck using formulas # This works in spite of the spaces proc.means - function(formula) { parmbuff - deparse(substitute(formula)) print(parmbuff) } proc.means(team + year + variable) # this does not work - same issue as above proc.means(team year variable) unexpected symbol in proc means(team year) -- View this message in context: http://r.789695.n4.nabble.com/SAS-Proc-summary-means-as-a-R-function-tp2286888p2286888.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS Proc summary/means as a R function
On 07/12/2010 07:16 PM, Roger Deangelis wrote: Hi, I am new to R. I am trying to create an R function to do a SAS proc means/summary proc.means ( data=bsebal; class team year; var ab h; output out=BseBalAvg mean=; run;) So you're actually trying to have R generate the SAS code? R is not, in general, a macro language, and attempts to use it as such are fighting against the current. Since you're writing your own function, you can make it accept as many arguments as you want, even an arbitrary number. For instance, test - data.frame(a = rnorm(100), b = rnorm(100, 10), c = rnorm(100, 20)) summarize - function(...) { dots - list(...) lapply(dots, summary) } summarize(test$a, test$b, test$c) Is that what you'd like? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing files on password-protected FTP sites
There is a standard notation for passwords in urls ... see for example http://www.devx.com/tips/Tip/5604 Cliff Clive cliffcl...@gmail.com wrote: Hello everyone, Is it possible to download data from password-protected ftp sites? I saw another thread with instructions for uploading files using RCurl, but I could not find information for downloading them in the RCurl documentation. I am using R 2.11 on a Windows XP 32-bit machine. Thanks in advance, Cliff -- View this message in context: http://r.789695.n4.nabble.com/Accessing-files-on-password-protected-FTP-sites-tp2286862p2286862.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select the column header with \Sexpr{}
I had tried that earlier and didn't work either, I probably have \Sexpr in the wrong place. See example: Column one header gets blank: \documentclass[11pt]{article} \usepackage{longtable,verbatim,ctable} \usepackage{longtable,pdflscape} \usepackage{fmtcount,hyperref} \usepackage{fullpage} \title{United States} \begin{document} \setkeys{Gin}{width=1\textwidth} \maketitle echo=F,results=hide= report - structure(list(Date = c(3/12/2010, 3/13/2010, 3/14/2010, 3/15/2010), Run1 = c(33 (119 ? 119), n (0 ? 0), 893 (110 ? 146), 140 (111 ? 150)), Run2 = c(33 (71 ? 71), n (0 ? 0), 337 (67 ? 74), 140 (68 ? 84)), Run3 = c(890 (32 ? 47), n (0 ? 0), 10,602 (32 ? 52), 2,635 (34 ? 66)), Run4 = c(0 ( ? ), n (0 ? 0), 0 ( ? ), 0 ( ? )), Run4 = c(0 ( ? ), n (0 ? 0), 0 ( ? ), 0 ( ? ))), .Names = c(ID_Date, Run1, Run2, Run3, Run4, Run5), row.names = c(NA, 4L), class = data.frame) require(stringr) report - t(apply(report, 1, function(x) {str_replace(x, \\?, -)})) #report #latex(report,file=) @ \begin{landscape} \begin{table}[!tbp] \begin{center} \begin{tabular}{ll}\hline\hline \multicolumn{1}{c}{\Sexpr{names(report)[1]}} # Using \Sexpr here \multicolumn{1}{c}{Run1} \multicolumn{1}{c}{Run2} \multicolumn{1}{c}{Run3} \multicolumn{1}{c}{Run4} \multicolumn{1}{c}{Run5}\tabularnewline \hline 13/12/201033 (119 ? 119)33 (71 ? 71)890 (32 ? 47)0 ( ? )0 ( ? )\tabularnewline 23/13/2010n (0 ? 0)n (0 ? 0)n (0 ? 0)n (0 ? 0)n (0 ? 0)\tabularnewline 33/14/2010893 (110 ? 146)337 (67 ? 74)10,602 (32 ? 52)0 ( ? )0 ( ? )\tabularnewline 43/15/2010140 (111 ? 150)140 (68 ? 84)2,635 (34 ? 66)0 ( ? )0 ( ? )\tabularnewline \hline \end{tabular} \end{center} \end{table} \end{landscape} \end{document} Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: David Winsemius dwinsem...@comcast.net To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: Duncan Murdoch murdoch.dun...@gmail.com; r-h...@stat.math.ethz.ch Sent: Mon, July 12, 2010 3:14:49 PM Subject: Re: [R] How to select the column header with \Sexpr{} On Jul 12, 2010, at 5:45 PM, Felipe Carrillo wrote: Thanks for the quick reply Duncan. I don't think I have explained myself well, I have a dataset named report and my column headers are run1,run2,run3,run4 and so on. I know how to access the data below those columns with \Sexpr{report[1,1]} \Sexpr{report[1,2]} and so on, but I can't access my column headers with \Sexpr{} because I can't find the way to reference run1,run2,run3 and run4. Sorry if I am not explain myself really well. Wouldn't this just be: \Sexpr{names(report)} # ? or perhaps you want specific items in that vector? Sexpr{names(report)[1]}, Sexpr{names(report)[2]}, etc --David. - Original Message From: Duncan Murdoch murdoch.dun...@gmail.com To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, July 12, 2010 2:18:15 PM Subject: Re: [R] How to select the column header with \Sexpr{} On 12/07/2010 5:10 PM, Felipe Carrillo wrote: Hi: Since I work with a few different fish runs my column headers change everytime I start a new Year. I have been using \Sexpr{} for my row and columns and now I am trying to use with my report column headers. \Sexpr{1,1} is row 1 column 1, what can I use for headers? I tried \Sexpr{0,1} but sweave didn't like it..Thanks in advance for any hints \Sexpr takes an R expression, and inserts the first element of the result into your text. Using just 0,1 (not including the quotes) is not a valid R expression. You need to use paste() or some other function to construct the label you want to put in place, e.g. \Sexpr{paste(0,1,sep=,)} will give you 0,1. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS Proc summary/means as a R function
Please get a copy of R for SAS and SPSS Users *by* *Muenchen*, Robert A. http://www.springer.com/statistics/computanional+statistics/book/978-0-387-09417-5 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Xyplot or Tin-R problem?
You missed FAQ 7.22 7.22 Why do lattice/trellis graphics not work? The most likely reason is that you forgot to tell R to display the graph. Lattice functions such as xyplot() create a graph object, but do not display it (the same is true of *ggplot2* graphics, and Trellis graphics in S-Plus). The print() method for the graph object produces the actual display. When you use these functions interactively at the command line, the result is automatically printed, but in source() or inside your own functions you will need an explicit print() statement. The FAQ on R and the separate FAQ on R for Windows are both accessible from the Help menu item on the R Console on the R GUI. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast string comparison
I am asking this question because String comparison in R seems to be awfully slow (based on profiling results) and I wonder if perhaps '==' alone is not the best one can do. I did not ask for anything particular and I don't think I need to provide a self-contained source example for the question. So, to re-phrase my question, are there more (runtime) effective ways to find out if two strings (about 100-150 characters long) are equal? Ralf On Sun, Jul 11, 2010 at 2:37 PM, Sharpie ch...@sharpsteen.net wrote: Ralf B wrote: What is the fastest way to compare two strings in R? Ralf Which way is not fast enough? In other words, are you asking this question because profiling showed one of R's string comparison operations is causing a massive bottleneck in your code? If so, which one and how are you using it? -Charlie - Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast string comparison
strings - replicate(1e5, paste(sample(letters, 100, rep = T), collapse = )) system.time(strings[-1] == strings[-1e5]) # user system elapsed # 0.016 0.000 0.017 So it takes ~1/100 of a second to do ~100,000 string comparisons. You need to provide a reproducible example that illustrates why you think string comparisons are slow. Hadley On Tue, Jul 13, 2010 at 6:52 AM, Ralf B ralf.bie...@gmail.com wrote: I am asking this question because String comparison in R seems to be awfully slow (based on profiling results) and I wonder if perhaps '==' alone is not the best one can do. I did not ask for anything particular and I don't think I need to provide a self-contained source example for the question. So, to re-phrase my question, are there more (runtime) effective ways to find out if two strings (about 100-150 characters long) are equal? Ralf On Sun, Jul 11, 2010 at 2:37 PM, Sharpie ch...@sharpsteen.net wrote: Ralf B wrote: What is the fastest way to compare two strings in R? Ralf Which way is not fast enough? In other words, are you asking this question because profiling showed one of R's string comparison operations is causing a massive bottleneck in your code? If so, which one and how are you using it? -Charlie - Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cbind in for loops
I have 30 files in the current directories, i would like to perform the cbind(fil1,file2,file3,file4file30) how could i do this in a for loop: such as: file2 - list.files(pattern=.out3$) for (j in file2) { cbind(j)...how to implement cbind here } Thanks. -- View this message in context: http://r.789695.n4.nabble.com/cbind-in-for-loops-tp2285690p2285690.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] make an model object (e.g. nlme) available in a user defined function (xyplot related)
On Mon, Jul 12, 2010 at 2:51 AM, Jun Shen jun.shen...@gmail.com wrote: Dear all, When I construct an nlme model object by calling nlme(...)-mod.nlme, this object can be used in xyplot(). Something like xyplot(x,y,.. .. ind.predict-predict(mod.nlme) .. ) is working fine in console environment. But the same structure is not working in a user defined function. It seems the mod.nlme created in a user defined function can not be called in xyplot(). Why is that? Appreciate any comment. (The error message says Error in using packet 1, object model not found) Quoting from the footer: PLEASE [...] provide commented, minimal, self-contained, reproducible code. -Deepayan Thanks. Jun Shen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boxplot on all the columns
how to use boxplot on all the columns from he date frame instead of manually entering the columns like below bhtest1 - read.table(bhtest1.txt, header=TRUE) boxplot (bhtest1[,2], bhtest1[,3], bhtest1[, 4], bhtest1[,5], bhtest1[,6], bhtest1[,7]) please help, Thanks, -- View this message in context: http://r.789695.n4.nabble.com/boxplot-on-all-the-columns-tp2285693p2285693.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with comparisons for vectors
This is also mentioned in FAQ 7.31 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Also if you search the R-help archives for 'precision' you can find a lot of threads discussing the issue in further depth. On Sun, Jul 11, 2010 at 9:02 PM, Wu Gong w...@mtmail.mtsu.edu wrote: I don't know the real reason, but help(==) gives some clues. For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. See the examples. x1 - 0.5 - 0.3 x2 - 0.3 - 0.1 x1 == x2 # FALSE on most machines identical(all.equal(x1, x2), TRUE) # TRUE everywhere - A R learner. -- View this message in context: http://r.789695.n4.nabble.com/problem-with-comparisons-for-vectors-tp2285557p2285685.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot on all the columns
Actually, boxplot (bhtest1) Should do what you want... Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Mon, Jul 12, 2010 at 7:14 AM, jd6688 jdsignat...@gmail.com wrote: boxplot (bhtest1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind in for loops
Hi, Assuming that you have read the files into R, and that their names (in R) are held in some object (e.g., 'file2'), then this works do.call(what = cbind, args = mget(x = file2, envir = .GlobalEnv) Here is a reproducible example: x1 - data.frame(x = 1:10) x2 - data.frame(y = 1:10) file.names - c(x1, x2) do.call(cbind, mget(file.names, envir=.GlobalEnv)) Best regards, Josh On Sun, Jul 11, 2010 at 9:08 PM, jd6688 jdsignat...@gmail.com wrote: I have 30 files in the current directories, i would like to perform the cbind(fil1,file2,file3,file4file30) how could i do this in a for loop: such as: file2 - list.files(pattern=.out3$) for (j in file2) { cbind(j)...how to implement cbind here } Thanks. -- View this message in context: http://r.789695.n4.nabble.com/cbind-in-for-loops-tp2285690p2285690.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interrupt R?
On Sun, 11 Jul 2010, Spencer Graves wrote: Hi, Richard and Duncan: Thank you both very much. You provided different but workable solutions. 1. With Rgui 2.11.1 on Vista x64, the escape worked, but neither ctrl-c nor ctrl-C worked for me. Why did you expect them too? Ctrl-C is documented to implement 'Copy' (the standard Windows shortcut). (Did you mean Ctrl-Shift-C by 'Ctrl-C' as distinct from 'Ctrl-c'? I don't think that works anywhere.) As Duncan said, Ctrl-C works in Rterm, and in almost all other R implementations (the Mac R.app GUI is the only other exception I know: it also uses Escape). This is documented in the README (called something like README.R-2.11.1 in the binary distribution) and in the rw-FAQ Q5.1. Maybe it would be a good idea to refresh your memory of the basic documentation? 2. The TCLTK version works but seems to require either more skill from the programmer or more user training than using escape under Rgui or ctrl-g/c under Emacs. Best Wishes, Spencer On 7/11/2010 12:02 PM, Duncan Murdoch wrote: On 11/07/2010 2:29 PM, Spencer Graves wrote: How can one interrupt the following gracefully: while(TRUE){ Sys.sleep(1) } In R2.11.1 under Emacs+ESS, some sequence of ctrl-g, ctrl-c eventually worked for me. Under Rgui 2.11.1, the only way I've found was to kill R. Suggestions on something more graceful? This is an Emacs+ESS bug. In the Windows GUI or using Rterm, the standard methods (ESC or Ctrl-C resp.) work fine. Duncan Murdoch Beyond this, what would you suggest to update a real-time report when new data arrives in a certain directory? A generalization of the above works, but I'd like something more graceful. Thanks, Spencer Graves sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] SIM_0.5-0 RCurl_1.4-2bitops_1.0-4.1 R2HTML_2.1 oce_0.1-80 -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply is slower than for loop?
On Fri, Jul 9, 2010 at 9:11 PM, Gene Leynes gleyne...@gmail.com wrote: I thought the apply functions are faster than for loops, but my most recent test shows that apply actually takes a significantly longer than a for loop. Am I missing something? Check Rnews for an article discussing proper usage of apply and for. Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply is slower than for loop?
On 12/07/10 08:16, Liviu Andronic wrote: On Fri, Jul 9, 2010 at 9:11 PM, Gene Leynesgleyne...@gmail.com wrote: [...] Check Rnews for an article discussing proper usage of apply and for. Liviu I am guessing you are referencing @article{Rnews:Ligges+Fox:2008 http://cran.r-project.org/doc/Rnews/bib/Rnews.html#Rnews:Ligges+Fox:2008, author = {Uwe Ligges and John Fox}, title = {{{R} {H}elp {D}esk}: {H}ow Can {I} Avoid This Loop or Make It Faster?}, journal = {R News}, year = 2008, volume = 8, number = 1, pages = {46--50}, month = {May}, url = {http://CRAN.R-project.org/doc/Rnews/}, pdf = {http://CRAN.R-project.org/doc/Rnews/Rnews_2008-1.pdf} } Allan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpretation of svm models with the e1071 package
Thanks a lot for the reply, some comments below On 07/10/2010 04:11 AM, Steve Lianoglou wrote: Hi, On Fri, Jul 9, 2010 at 12:15 PM, manuel.martin manuel.mar...@orleans.inra.fr wrote: Dear all, after having calibrated a svm model through the svm() command of the e1071 package, is there a way to i) represent the modeled relationships between the y and X variables (response variable vs. predictors)? Can you explain a bit more ... how do you want them represented? I was thinking to a simple ŷ = fi(Xi) plot, fi resulting from the fitted svm model. Xi is the predictor, among the whole set of predictors, X, one wish to see the relationship with the response. For boosted regression trees, which I am more familiar with, this is fi function is estimated by averaging the effects of all predictors but Xi, and plotting how ŷ varies as Xi does. Hope this is a bit clearer, Manuel ii) rank the influence of the predictors used in the model? One technique that's often/sometimes used is to calculate the SVM's W vector by using the support vectors along with their learned weights/alphas. This comes up every now and again. Here's an older post explaining how you might do that with the svm model from e1071: http://article.gmane.org/gmane.comp.lang.r.general/158272/match=w+b+vector+svr Hope that helps. -- INRA - InfoSol Centre de recherche d'Orléans 2163 Avenue de la Pomme de Pin CS 40001 ARDON 45075 ORLEANS Cedex 2 tel : (33) (0)2 38 41 48 21 fax : (33) (0)2 38 41 78 69 http://www.gissol.fr http://bdat.orleans.inra.fr 00-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.