On Apr 16, 2012, at 4:32 PM, Bert Gunter wrote:

David:

Here is a comparison of the gains to be made by vectorization (again,
assuming I have interpreted your query correctly)

## create a list of arrays
z <- lapply(seq_len(10000),function(i)array(runif(24),dim=2:4))
## Using an apply type approach
system.time(ans1 <- array(do.call(mapply,c(sum,z)),dim=2:4))
  user  system elapsed
  0.62    0.00    0.62
## vectorizing via rowSums and cbind
system.time(ans2 <-array(rowSums(do.call(cbind,z)),dim=2:4))
  user  system elapsed
  0.02    0.00    0.02
identical(ans1,ans2)
[1] TRUE


It's an example as well for the possibility that different OSes may perform differently. My Mac (an early 2008 model) is nowhere nearly as efficient with the second solution, despite being the the same ballpark with the first:

> system.time(ans1 <- array(do.call(mapply,c(sum,z)),dim=2:4))
   user  system elapsed
  0.841   0.007   0.851
> system.time(ans2 <-array(rowSums(do.call(cbind,z)),dim=2:4))
   user  system elapsed
  0.132   0.003   0.145

And on my system ....  the Reduce strategy is fastest:

> system.time(ans3 <- Reduce("+", z) )
   user  system elapsed
  0.129   0.001   0.134

And ...the Reduce() strategy would preserve other object attributes, something I'm quite sure the re-dimensioning of rowSums(cbind(.)) could not preserve.

 L <- list( table(a, sample(a)) ,
            table(a, sample(a)),
            table(a, sample(a)),
            table(a, sample(a)),
            table(a, sample(a)) )

 str(Reduce("+", L) )
 'table' int [1:3, 1:3] 1 1 3 4 0 1 0 4 1
 - attr(*, "dimnames")=List of 2
  ..$ a: chr [1:3] "a" "b" "c"
  ..$  : chr [1:3] "a" "b" "c"

 str( array(rowSums(do.call(cbind,L)),dim=c(3,3))  )
 num [1:3, 1:3] 5 5 5 5 5 5 5 5 5


-- David.


Cheers,
Bert



On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra <dava...@verizon.net> wrote:
Thanks Bill,



For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid
doing so, I will.



I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have
shown this to be true.



DAV





-----Original Message-----
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Monday, April 16, 2012 3:26 PM
To: David A Vavra; 'Bert Gunter'
Cc: r-help@r-project.org
Subject: RE: [R] Effeciently sum 3d table



Example in partial code:



Env <- CreatEnv() # my own function

Assign('final',T1-T1,envir=env)

L<-listOfTables



lapply(L,function(t) {

    final <- get('final',envir=env) + t

    assign('final',final,envir=env)

    NULL

})



First, finish writing that code so it runs and you can make sure its

output is ok:



L <- lapply(1:50000, function(i) array(i:(i+3), c(2,2))) # list of 50,000
2x2 matrices

env <- new.env()

assign('final', L[[1]] - L[[1]], envir=env)

junk <- lapply(L, function(t) {

    final <- get('final', envir=env) + t

    assign('final', final, envir=env)

    NULL

})

get('final', envir=env)

#            [,1]       [,2]

# [1,] 1250025000 1250125000

# [2,] 1250075000 1250175000

sum( (2:50001) ) # should be final[2,1]

# [1] 1250075000



You asked for something less "clunky".

You are fighting the system by using get() and assign(), just use

ordinary expression syntax to get and set variables:

final <- L[[1]]

for(i in seq_along(L)[-1]) final <- final + L[[i]]

final

#           [,1]       [,2]

# [1,] 1250025000 1250125000

# [2,] 1250075000 1250175000



The former took 0.22 seconds on my machine, the latter 0.06.



You don't have to compute the whole list of matrices before

doing the sum, just add to the current sum when you have

computed one matrix and then forget about it.



Bill Dunlap

Spotfire, TIBCO Software

wdunlap tibco.com





-----Original Message-----

From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ]
On Behalf

Of David A Vavra

Sent: Monday, April 16, 2012 11:35 AM

To: 'Bert Gunter'

Cc: r-help@r-project.org

Subject: Re: [R] Effeciently sum 3d table



Thanks Gunter,



I mean what I think is the normal definition of 'sum' as in:

   T1 + T2 + T3 + ...

It never occurred to me that there would be a question.



I have gotten the impression that a for loop is very inefficient. Whenever
I

change them to lapply calls there is a noticeable improvement in run time

for whatever reason. The problem with lapply here is that I effectively
need

a global table to hold the final sum. lapply also  wants to return a
value.



You may be correct that in the long run, the loop is the best. There's a
lot

of extraneous memory wastage holding all of the tables in a list as well
as

the return 'values'.



As an alternate and given a pre-existing list of tables, I was thinking of

creating a temporary environment to hold the final result so it could be

passed globally to each lapply execution level but that seems clunky and

wasteful as well.



Example in partial code:



Env <- CreatEnv() # my own function

Assign('final',T1-T1,envir=env)

L<-listOfTables



lapply(L,function(t) {

    final <- get('final',envir=env) + t

    assign('final',final,envir=env)

    NULL

})



But I was hoping for a more elegant and hopefully more efficient solution.

Greg's suggestion for using reduce seems in order but as yet I'm
unfamiliar

with the function.



DAV







-----Original Message-----

From: Bert Gunter [mailto:gunter.ber...@gene.com]

Sent: Monday, April 16, 2012 12:42 PM

To: Greg Snow

Cc: David A Vavra; r-help@r-project.org

Subject: Re: [R] Effeciently sum 3d table



Define "sum" . Do you mean you want to get a single sum for each

array? -- get marginal sums for each array? -- get a single array in

which each value is the sum of all the individual values at the

position?



Due thought and consideration for those trying to help by formulating

your query carefully and concisely vastly increases the chance of

getting a useful answer. See the posting guide -- this is a skill that

needs to be learned and the guide is quite helpful. And I must

acknowledge that it is a skill that I also have not yet mastered.



Concerning your query, I would only note that the two responses from

Greg and Petr that you received are unlikely to be significantly

faster than just using loops, since both are still essentially looping

at the interpreted level. Whether either give you what you want, I do

not know.



-- Bert



On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538...@gmail.com> wrote:

Look at the Reduce function.



On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <dava...@verizon.net>

wrote:

I have a large number of 3d tables that I wish to sum

Is there an efficient way to do this? Or perhaps a function I can call?



I tried using do.call("sum",listoftables) but that returns a single

value.



So far, it seems only a loop will do the job.





TIA,

DAV





--



Bert Gunter

Genentech Nonclinical Biostatistics



Internal Contact Info:

Phone: 467-7374

Website:


http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost

atistics/pdb-ncb-home.htm



______________________________________________

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to