I think this is an interesting discussion -- I've learned from both Steve's and Brian's comments, and I'm broadening it to R-help since I think others will be interested as well.
The problem up for comment is:
result <- apply(array.3D, 1:2, sum)
Where array.3D is 3000 by 300 by 3.
The original poster already had a perfectly good replacement for this problem that was virtually instantaneous. A solution for this particular problem is not the issue, it is merely the starting point for cases where there wouldn't be a trivial workaround.
Steve Karmesin wrote:
SK> As others have said, what apply has to do in this case is loop over the 900,000
SK> cases and do a 'sum' over three elements each time. In this case the overhead
SK> of calling an S+ function totally swamps the numeric operations.
SK>
SK> Doing this on smaller datasets (300x30x3) on my machine (2CPU, 3GHz Xeon
SK> running Windows 2000 and S-Plus 6.1) shows an overhead of about 140
SK> microseconds per call to sum, so I would expect it to take 100*1e-6*9e5=90 seconds.
SK>
SK> The thing is, it is worse than this. If I do a case with 900x90x3 it takes 300 usec per 'sum'.
SK>
SK> R is fairly stable at just under 15usec per 'sum' on my machine.
SK>
SK> A little more investigation (together with office mate Tony Plate) provides some insight.
SK>
SK> Using mem.tally.reset() and mem.tally.report() shows that for this case it is allocating a
SK> whopping 1280 bytes for each call to 'sum'.
SK>
SK> Just touching that much memory is going to be slow. So why would it do that? Looking
SK> at the definition of the apply function shows that it is allocating a general list for the result,
SK> not a vector-based array or matrix.
SK>
SK> Why? It has a shortcut that lets it use efficient matrices if the input is a 2D matrix, but this
SK> one is 3D, so it uses the general code, which is much, much slower and uses a lot more memory.
SK>
SK> If you collapse the first two dimensions of the array the times are stable at <80usec per
SK> call to sum and it allocates 8 bytes per call, which is just the amount of space needed.
SK>
SK> Still, the R code seems to always build a list, and it is about 15usec per call. Somehow
SK> the underlying function call and perhaps list storage mechanisms are more efficient there.
Prof Brian Ripley wrote:
BR> There are almost always pros and cons with these issues. S's sum() is an BR> S4 generic whereas R's is internal *unless* you define an S4 method for BR> it (which S-PLUS has already done). S needs to create several frames for BR> what is a nested set of function calls -- 1280b looks modest for that.
BR> BR> Also, S has an ability to back out calculations that R does not, and that BR> costs memory (and can have benefits).
BR> BR> We know there are overheads in making functions generic, especially BR> S4-generic, but then there are benefits too. I am not sure designers who BR> add features take enough account of the costs.
Using R 1.8.1 (precompiled) on SuSe Linux with a Xeon 2.4GHz and 1G of memory:
set.seed(2) jja <- array(rnorm(3000*300*3), c(3000, 300, 3)) gc() system.time(jjsa <- apply(jja, 1:2, sum)) # takes 30 seconds
sumS3 <- function(x, ...) UseMethod("sumS3")
sumS3.default <- function(x, ...) sum(x, ...)
gc()
system.time(jjsa3 <- apply(jja, 1:2, sumS3)) # takes 65 secondssumS4 <- function(x, ...) standardGeneric("sumS4")
setMethod("sumS4", signature(x="numeric"), function(x, ...) sum(x, ...))
gc()
system.time(jjsa4 <- apply(jja, 1:2, sumS4)) # takes 58 secondsQuestions:
It looks to me like the penalty for making the functions generic is similar to one extra function call. Does the penalty grow as there are more methods? Are there other types of penalties for making a function generic?
Is the test with sumS4 still an unfair comparison with S-PLUS?
Are things better with S-PLUS 6.2?
Patrick Burns
Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User")
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
