Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-31 Thread Thomas Lumley
On Sat, 29 Jul 2006, Kevin B. Hendricks wrote: Hi Bill, sum : igroupSums Okay, after thinking about this ... # assumes i is the small integer factor with n levels # v is some long vector # no sorting required igroupSums - function(v,i) { sums - rep(0,max(i)) for (j in

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-31 Thread Kevin B. Hendricks
Hi Thomas, Here is a comparison of performance times from my own igroupSums versus using split and rowsum: x - rnorm(2e6) i - rep(1:1e6,2) unix.time(suma - unlist(lapply(split(x,i),sum))) [1] 8.188 0.076 8.263 0.000 0.000 names(suma)- NULL unix.time(sumb - igroupSums(x,i)) [1]

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-30 Thread Kevin B. Hendricks
Hi Bill, After playing with this some more and adding an implementation to handle NAs in the data vector, I have run into the problem of what to return when the only data values for a particular bin (or level) in the data vector were NAs and the user selected na.rm=T 1. Should it return 0

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-29 Thread Kevin B. Hendricks
Hi Bill, So you wrote one routine that can calculate any single of a variety of stats and allows weights, is that right? Can it return a data frame of any subset of requested stats as well (that is what I was thinking of doing anyway). I think someone can easily calculate all of those

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi, I was using my installed R which is 2.3.1 for the first tests. I moved to the r-devel tree (I svn up and rebuild everyday) for my by tests to see if it would work any better. I neglected to retest merge with the devel version. So it appears merge is already fixed and I just need to

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Gabor Grothendieck
There was a performance comparison of several moving average approaches here: http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html The author of that message ultimately wrote the caTools R package which contains some optimized versions. Not sure if these results suggest anything of interest

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Martin Maechler
Kevin == Kevin B Hendricks [EMAIL PROTECTED] on Fri, 28 Jul 2006 14:53:57 -0400 writes: [.] Kevin The idea is to somehow make functions that work well Kevin over small sub- sequences of a much longer vector Kevin without resorting to splitting the vector into many

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi Bill, Splus8.0 has something like what you are talking about that provides a fast way to compute sapply(split(xVector, integerGroupCode), summaryFunction) for some common summary functions. The 'integerGroupCode' is typically the codes from a factor, but you could compute it in

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Bill Dunlap
On Fri, 28 Jul 2006, Kevin B. Hendricks wrote: Hi Bill, Splus8.0 has something like what you are talking about that provides a fast way to compute sapply(split(xVector, integerGroupCode), summaryFunction) for some common summary functions. The 'integerGroupCode' is typically the

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-28 Thread Kevin B. Hendricks
Hi Bill, sum : igroupSums Okay, after thinking about this ... # assumes i is the small integer factor with n levels # v is some long vector # no sorting required igroupSums - function(v,i) { sums - rep(0,max(i)) for (j in 1:length(v)) { sums[[i[[j - sums[[i[[j + v[[j]]

[Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-27 Thread Kevin B. Hendricks
Hi Developers, I am looking for another new project to help me get more up to speed on R and to learn something outside of R internals. One recent R issue I have run into is finding a fast implementations of the equivalent to the following SAS code: /* MDPC is an integer sort key made

Re: [Rd] Any interest in merge and by implementations specifically for sorted data?

2006-07-27 Thread Seth Falcon
Kevin B. Hendricks [EMAIL PROTECTED] writes: My first R attempt was a simple # sort the data.frame gd and the sort key sorder - order(MDPC) gd - gd[sorder,] MDPC - MDPC[sorder] attach(gd) # find the length and sum for each unique sort key XN - by(MVE, MDPC, length) XSUM - by(MVE, MDPC,