Jaroslaw's article was great. In fact, it was used as the basis for rapply and some optimized special cases that will be included in the R 2.1.0 version of zoo (which have been coded but not yet released).
Regarding numerically stable summation, check out the idea behind the following which I coincidentally am also considering for the zoo implementation: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090 On Apr 1, 2005 8:07 PM, Vadim Ogranovich <[EMAIL PROTECTED]> wrote: > Hi, > > First, let me thank Jaroslaw for making this survey. I find it quite > illuminating. > > Now the questions: > > * the #1 solution below (based on cumsum) is numerically unstable. > Specifically if you do the runmean on a positive vector you can easily > get negative numbers due to rounding errors. Does anyone see a > modification which is free of this deficiency? > * is it possible to optimize the algorithm of the filter function, > solution #2 below, for the case of the rep(1/k,k) kernel? > > Thanks, > Vadim > > [R] Survey of "moving window" statistical functions - still looking f or > fast mad function > > * This message: [ Message body > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] [ More > options > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#options2> ] > * Related messages: [ Next message > <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> ] [ Previous > message <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> ] [ > Next in thread <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > ] [ Replies > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#replies> ] > > From: Tuszynski, Jaroslaw W. <JAROSLAW.W.TUSZYNSKI_at_saic.com > <mailto:JAROSLAW.W.TUSZYNSKI_at_saic.com?Subject=Re:%20%5BR%5D%20Survey% > 20of%20"moving%20window"%20statistical%20functions%20-%20still > %20lookingf%20or%20fast%20mad%20function> > > Date: Sat 09 Oct 2004 - 06:30:32 EST > > Hi, > > Lately I run into a problem that my code R code is spending hours > performing simple moving window statistical operations. As a result I > did searched archives for alternative (faster) ways of performing: mean, > max, median and mad operation over moving window (size 81) on a vector > with about 30K points. And performed some timing for several ways that > were suggested, and few ways I come up with. The purpose of this email > is to share some of my findings and ask for more suggestions (especially > about moving mad function). > > Sum over moving window can be done using many different ways. Here are > some sorted from the fastest to the slowest: > > 1. runmean = function(x, k) { n = length(x) y = x[ k:n ] - x[ > c(1,1:(n-k)) ] # this is a difference from the previous cell y[1] = > sum(x[1:k]); # find the first sum y = cumsum(y) # apply precomputed > differences return(y/k) # return mean not sum } > 2. filter(x, rep(1/k,k), sides=2, circular=T) - (stats package) > 3. kernapply(x, kernel("daniell", m), circular=T) > 4. apply(embed(x,k), 1, mean) > 5. mywinfun <- function(x, k, FUN=mean, ...) { # suggested in news > group n <- length(x) A <- rep(x, length=k*(n+1)) dim(A) <- c(n+1, k) > sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] } > 6. rollFun(x, k, FUN=mean) - (fSeries package) > 7. rollMean(x, k) - (fSeries package) > 8. SimpleMeanLoop = function(x, k) { n = length(x) # simple-minded > loop used as a baseline y = rep(0, n) k = k%/%2; for (i in (1+k):(n-k)) > y[i] = mean(x[(i-k):(i+k)]) } > 9. running(x, fun=mean, width=k) - (gtools package) > > Some of above functions return results that are the same length as x and > some return arrays with length n-k+1. The relative speeds (on Windows > machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63, > 345. As one can see there are about 5 orders of magnitude between the > fastest and the slowest. > > Maximum over moving window can be done as follow, in order of speed > > 1. runmax = function(x, k) { n = length(x) y = rep(0, n) m = k%/%2; > a = 0; for (i in (1+m):(n-m)) { if (a==y[i-1]) y[i] = > max(x[(i-m):(i+m)]) # calculate max of the window else y[i] = > max(y[i-1], x[i+m]); # max of the window is =y[i-1] a = x[i-m] # point > that will be removed from the window } return(y) } > 2. apply(embed(x,k), 1, max) > 3. SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above > 4. mywinfun(x, k, FUN=max) - see above > 5. rollFun(x, k, FUN=max) - fSeries package > 6. rollMax(x, k) - fSeries package > 7. running(x, fun=max, width=k) - gtools package The relative > speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3 > > Median over moving window can be done as follows: > > 1. runmed(x, k) - from stats package > 2. SimpleMedLoop(x, k) - similar to SimpleMeanLoop above > 3. apply(embed(x,k), 1, median) > 4. mywinfun(x, k, FUN=median) - see above > 5. rollFun (x, k, FUN=median) - fSeries package > 6. running(x, fun=max, width=k) - gtools package Speeds: <0.01, > 3.4, 9, 15, 29, 165 > > Mad over moving window can be done as follows: > > 1. runmad = function(x, k) { n = length(x) A = embed(x,k) A = abs(A > - rep(apply(A, 1, median), k)) dim(A) = c(n-k+1, k) apply(A, 1, median) > } > 2. apply(embed(x,k), 1, mad) > 3. mywinfun(x, k, FUN=mad) - see above > 4. SimpleMadLoop(x, k) - similar to SimpleMeanLoop above > 5. rollFun(x, k, FUN=mad) - fSeries package > 6. running(x, fun=mad, width=k) - gtools package Speeds: 11, 18, > 25, 50, 50, 400 > > Some thoughts about those results: > > * All functions from Stats package (runmed, filter, kernapply) are > fast and hard to improve on. > * In case of Mean and Max a simple un-optimized R codes are much > faster than specialized functions build for the same purpose. > * apply(embed(x,k), 1, fun) - seem to be always faster than any > functions from fSeries package or "mywinfun" > * "running" function from gtools package is horribly slow compared > to anything else > * "mywinfun" proposed as a faster version of "apply(embed(x,k), 1, > fun)" seems to be always slower > > Finally a question: I still need to get moving windows mad > function faster my "runmad" function is not that much faster than > apply/embed combo, and that I used before, and this is where my code > spends most of its time. I need something like "runmed" but for a mad > function. Any suggestions? > > Jarek > > =====================================\==== > Jarek Tuszynski, PhD. o / \ > Science Applications International Corporation <\__,| > (703) 676-4192 "> \ > [EMAIL PROTECTED] ` \ > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting > guide! http://www.R-project.org/posting-guide.html Received on Sat Oct > 09 06:43:05 2004 > > * This message: [ Message body > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] > * Next message: Emili Tortosa-Ausina: "Re: [R] RWinEdt" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> > * Previous message: Brian S Cade: "Re: [R] reading Systat into R" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> > * Next in thread: Prof Brian Ripley: "Re: [R] Survey of "moving > window" statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" > statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" > statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > > * Contemporary messages sorted: [ By Date > <http://tolstoy.newcastle.edu.au/R/help/04/10/date.html#5161> ] [ By > Thread <http://tolstoy.newcastle.edu.au/R/help/04/10/index.html#5161> ] > [ By Subject > <http://tolstoy.newcastle.edu.au/R/help/04/10/subject.html#5161> ] [ By > Author <http://tolstoy.newcastle.edu.au/R/help/04/10/author.html#5161> > ] [ By messages with attachments > <http://tolstoy.newcastle.edu.au/R/help/04/10/attachment.html> ] > > This archive was generated by hypermail 2.1.8 > <http://www.hypermail.org/> : Fri 18 Mar 2005 - 03:03:35 EST > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
