Re: [R] Effeciently sum 3d table
On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly faster than just using loops, since both are still essentially looping at the interpreted level. Whether either give you what you want, I do not know. -- Bert On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow 538...@gmail.com wrote: Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly faster than just using loops, since both are still essentially looping at the interpreted level. Whether either give you what you want, I do not know. -- Bert On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow 538...@gmail.com wrote: Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost atistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Thanks Petr, I'm after T1 + T2 + T3 + ... and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Thanks Greg, I think this may be what I'm after but the documentation for it isn't particularly clear. I hate it when someone documents a piece of code saying it works kinda like some other code (running elsewhere, of course) making the tacit assumption that everybody will immediately know what that means and implies. I'm sure I'll understand it once I know what it is trying to say. :) There's an item in the examples which may be exactly what I'm after. DAV -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: Monday, April 16, 2012 11:54 AM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
David: 1. My first name is Bert. 2. It never occurred to me that there would be a question. Indeed. But in fact you got solutions for two different interpretations (Greg's is what you wanted). That is what I meant when I said that clarity in asking the question is important. 3. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. I'd like to see your data on this. My experience is that they are typically comparable. Chambers in his Software for Data Analysis book says (pp 213): (with apply type functions rather than explicit loops), The computation should run faster... However, none of the apply mechanisms changes the number of times the supplied functions is called, so serious improvements will be limited to iterating simple calculations many times. 4. You can get serious improvements by vectorizing; and you can do that here, if I understand correctly, because all your arrays have identical dim = d. Here's how: ## assume your list of arrays is in listoftables alldat - do.call(cbind,listoftables) ## this might be the slow part ans - array(.rowSums (allDat), dim = d) See ?rowSums for explanations and caveats, especially with NA's . Cheers, Bert On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra dava...@verizon.net wrote: Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly faster than just using loops, since both are still essentially looping at the interpreted level. Whether either give you what you want, I do not know. -- Bert On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow 538...@gmail.com wrote: Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost atistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
On Apr 16, 2012, at 2:43 PM, David A Vavra wrote: Thanks Petr, I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Or even now you _could_ be clearer. Do you want successive partial sums? That would yield to: Reduce(+, listoftables, accumaulate=TRUE) Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) #[,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly faster than just using loops, since both are still essentially looping at the interpreted level. Whether either give you what you want, I do not know. -- Bert On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow 538...@gmail.com wrote: Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost atistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code
Re: [R] Effeciently sum 3d table
Bert, My apologies on the name. I haven't kept any data on loop times. I don't know why lapply seems faster but the difference is quite noticeable. It has struck me as odd. I would have thought lapply would be slower. It has taken an effort to change my thinking to force fit solutions to it but I've gotten used to it. As of now I reserve loops to times when there are only a few iterations (as in 10) and to solutions that require passing large amounts of information among iterations. lapply is particularly handy when constructing lists. As for vectorizing, see the code below. Note that it uses mapply but that simply may have made implementation easier. However, if vectorizing gives an improvement over looping, the mapply may be the reason. f-function(x,y,z) catn(do something) Vectorize(f,c('x','y')) function (x, y, z) { args - lapply(as.list(match.call())[-1L], eval, parent.frame()) names - if (is.null(names(args))) character(length(args)) else names(args) dovec - names %in% vectorize.args do.call(mapply, c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)) } environment: 0x7fb3442553c8 DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 3:07 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table David: 1. My first name is Bert. 2. It never occurred to me that there would be a question. Indeed. But in fact you got solutions for two different interpretations (Greg's is what you wanted). That is what I meant when I said that clarity in asking the question is important. 3. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. I'd like to see your data on this. My experience is that they are typically comparable. Chambers in his Software for Data Analysis book says (pp 213): (with apply type functions rather than explicit loops), The computation should run faster... However, none of the apply mechanisms changes the number of times the supplied functions is called, so serious improvements will be limited to iterating simple calculations many times. 4. You can get serious improvements by vectorizing; and you can do that here, if I understand correctly, because all your arrays have identical dim = d. Here's how: ## assume your list of arrays is in listoftables alldat - do.call(cbind,listoftables) ## this might be the slow part ans - array(.rowSums (allDat), dim = d) See ?rowSums for explanations and caveats, especially with NA's . Cheers, Bert On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra dava...@verizon.net wrote: Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly
Re: [R] Effeciently sum 3d table
On Apr 16, 2012, at 3:26 PM, David Winsemius wrote: On Apr 16, 2012, at 2:43 PM, David A Vavra wrote: Thanks Petr, I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Or even now you _could_ be clearer. Do you want successive partial sums? That would yield to: Reduce(+, listoftables, accumaulate=TRUE) If Dunlap's interpretation is corect then consder this L - lapply(1:5, function(i) array(i:(i+7), c(2,2,2))) system.time({final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final} ) # user system elapsed # 0.179 0.002 0.187 system.time(Reduce(+, L)) # user system elapsed # 0.150 0.002 0.157 identical(Reduce(+, L), final) [1] TRUE Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
even now you _could_ be clearer I fail to see why it's unclear. I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. I think it's precisely what I want. If I have two 3d tables, T1 and T2, then say either 1) T1 + T2 2) T1 - T2 (1) yields a third table equal to the sum of the individual cells and (2) yields a table full of zeroes. At least it does for matrices. Are you saying the T1+T2+T3+... above is equivalent to: sum(T1)+sum(T2)+sum(T3)+ when the table has more than 2d? I tried it out by hand I get the result I'm after. What I want is a general solution. Reduce may be the answer but I find the documentation for it a bit daunting. Not to mention that it is far from obvious that I should have originally thought of using it. DAV -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra Cc: 'Petr Savicky'; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Apr 16, 2012, at 2:43 PM, David A Vavra wrote: Thanks Petr, I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Or even now you _could_ be clearer. Do you want successive partial sums? That would yield to: Reduce(+, listoftables, accumaulate=TRUE) Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
For purposes of clarity only... On Mon, Apr 16, 2012 at 12:40 PM, David A Vavra dava...@verizon.net wrote: Bert, My apologies on the name. I haven't kept any data on loop times. I don't know why lapply seems faster but the difference is quite noticeable. It has struck me as odd. I would have thought lapply would be slower. It has taken an effort to change my thinking to force fit solutions to it but I've gotten used to it. As of now I reserve loops to times when there are only a few iterations (as in 10) and to solutions that require passing large amounts of information among iterations. lapply is particularly handy when constructing lists. As for vectorizing, see the code below. No. Despite the name, this is **not** what I mean by vectorization. What I mean is pushing the loops down to the C level rather than doing them at the interpreted level, which is where your code below still leaves you. -- Bert Note that it uses mapply but that simply may have made implementation easier. However, if vectorizing gives an improvement over looping, the mapply may be the reason. f-function(x,y,z) catn(do something) Vectorize(f,c('x','y')) function (x, y, z) { args - lapply(as.list(match.call())[-1L], eval, parent.frame()) names - if (is.null(names(args))) character(length(args)) else names(args) dovec - names %in% vectorize.args do.call(mapply, c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)) } environment: 0x7fb3442553c8 DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 3:07 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table David: 1. My first name is Bert. 2. It never occurred to me that there would be a question. Indeed. But in fact you got solutions for two different interpretations (Greg's is what you wanted). That is what I meant when I said that clarity in asking the question is important. 3. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. I'd like to see your data on this. My experience is that they are typically comparable. Chambers in his Software for Data Analysis book says (pp 213): (with apply type functions rather than explicit loops), The computation should run faster... However, none of the apply mechanisms changes the number of times the supplied functions is called, so serious improvements will be limited to iterating simple calculations many times. 4. You can get serious improvements by vectorizing; and you can do that here, if I understand correctly, because all your arrays have identical dim = d. Here's how: ## assume your list of arrays is in listoftables alldat - do.call(cbind,listoftables) ## this might be the slow part ans - array(.rowSums (allDat), dim = d) See ?rowSums for explanations and caveats, especially with NA's . Cheers, Bert On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra dava...@verizon.net wrote: Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully
Re: [R] Effeciently sum 3d table
Thanks Bill, For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid doing so, I will. I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have shown this to be true. DAV -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra; 'Bert Gunter' Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) #[,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning your query, I would only note that the two responses from Greg and Petr that you received are unlikely to be significantly faster than just using loops, since both are still essentially looping at the interpreted level. Whether either give you what you want, I do not know. -- Bert On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow 538...@gmail.com wrote: Look at the Reduce function. On Mon
Re: [R] Effeciently sum 3d table
David: Here is a comparison of the gains to be made by vectorization (again, assuming I have interpreted your query correctly) ## create a list of arrays z - lapply(seq_len(1),function(i)array(runif(24),dim=2:4)) ## Using an apply type approach system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.620.000.62 ## vectorizing via rowSums and cbind system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.020.000.02 identical(ans1,ans2) [1] TRUE Cheers, Bert On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra dava...@verizon.net wrote: Thanks Bill, For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid doing so, I will. I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have shown this to be true. DAV -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra; 'Bert Gunter' Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance
Re: [R] Effeciently sum 3d table
I generally prefer the list approach too. I only mentioned that you didn't need to have a list of inputs before starting the summation because you said There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. I guess I misinterpreted that sentence. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com From: David A Vavra [mailto:dava...@verizon.net] Sent: Monday, April 16, 2012 1:20 PM To: William Dunlap Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Thanks Bill, For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid doing so, I will. I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have shown this to be true. DAV -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra; 'Bert Gunter' Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's suggestion for using reduce seems in order but as yet I'm unfamiliar with the function. DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 12:42 PM To: Greg Snow Cc: David A Vavra; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Define sum . Do you mean you want to get a single sum for each array? -- get marginal sums for each array? -- get a single array in which each value is the sum of all the individual values at the position? Due thought and consideration for those trying to help by formulating your query carefully and concisely vastly increases the chance of getting a useful answer. See the posting guide -- this is a skill that needs to be learned and the guide is quite helpful. And I must acknowledge that it is a skill that I also have not yet mastered. Concerning
Re: [R] Effeciently sum 3d table
OK. I'll take your word for it. The mapply function calls do_mapply so I would have thought it is passing the operation down to the C code. I haven't tracked it any further than below. mapply function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) { FUN - match.fun(FUN) dots - list(...) answer - .Call(do_mapply, FUN, dots, MoreArgs, environment(), PACKAGE = base) ... etc. -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 4:13 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table For purposes of clarity only... On Mon, Apr 16, 2012 at 12:40 PM, David A Vavra dava...@verizon.net wrote: Bert, My apologies on the name. I haven't kept any data on loop times. I don't know why lapply seems faster but the difference is quite noticeable. It has struck me as odd. I would have thought lapply would be slower. It has taken an effort to change my thinking to force fit solutions to it but I've gotten used to it. As of now I reserve loops to times when there are only a few iterations (as in 10) and to solutions that require passing large amounts of information among iterations. lapply is particularly handy when constructing lists. As for vectorizing, see the code below. No. Despite the name, this is **not** what I mean by vectorization. What I mean is pushing the loops down to the C level rather than doing them at the interpreted level, which is where your code below still leaves you. -- Bert Note that it uses mapply but that simply may have made implementation easier. However, if vectorizing gives an improvement over looping, the mapply may be the reason. f-function(x,y,z) catn(do something) Vectorize(f,c('x','y')) function (x, y, z) { args - lapply(as.list(match.call())[-1L], eval, parent.frame()) names - if (is.null(names(args))) character(length(args)) else names(args) dovec - names %in% vectorize.args do.call(mapply, c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)) } environment: 0x7fb3442553c8 DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 3:07 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table David: 1. My first name is Bert. 2. It never occurred to me that there would be a question. Indeed. But in fact you got solutions for two different interpretations (Greg's is what you wanted). That is what I meant when I said that clarity in asking the question is important. 3. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. I'd like to see your data on this. My experience is that they are typically comparable. Chambers in his Software for Data Analysis book says (pp 213): (with apply type functions rather than explicit loops), The computation should run faster... However, none of the apply mechanisms changes the number of times the supplied functions is called, so serious improvements will be limited to iterating simple calculations many times. 4. You can get serious improvements by vectorizing; and you can do that here, if I understand correctly, because all your arrays have identical dim = d. Here's how: ## assume your list of arrays is in listoftables alldat - do.call(cbind,listoftables) ## this might be the slow part ans - array(.rowSums (allDat), dim = d) See ?rowSums for explanations and caveats, especially with NA's . Cheers, Bert On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra dava...@verizon.net wrote: Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) But I was hoping for a more elegant and hopefully more efficient solution. Greg's
Re: [R] Effeciently sum 3d table
On Mon, Apr 16, 2012 at 1:39 PM, David A Vavra dava...@verizon.net wrote: OK. I'll take your word for it. The mapply function calls do_mapply so I would have thought it is passing the operation down to the C code. I haven't tracked it any further than below. No, they can't. Function evaluation must take place at the interpreted level. However, don't take my word -- take Chambers's. -- Bert mapply function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) { FUN - match.fun(FUN) dots - list(...) answer - .Call(do_mapply, FUN, dots, MoreArgs, environment(), PACKAGE = base) ... etc. -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 4:13 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table For purposes of clarity only... On Mon, Apr 16, 2012 at 12:40 PM, David A Vavra dava...@verizon.net wrote: Bert, My apologies on the name. I haven't kept any data on loop times. I don't know why lapply seems faster but the difference is quite noticeable. It has struck me as odd. I would have thought lapply would be slower. It has taken an effort to change my thinking to force fit solutions to it but I've gotten used to it. As of now I reserve loops to times when there are only a few iterations (as in 10) and to solutions that require passing large amounts of information among iterations. lapply is particularly handy when constructing lists. As for vectorizing, see the code below. No. Despite the name, this is **not** what I mean by vectorization. What I mean is pushing the loops down to the C level rather than doing them at the interpreted level, which is where your code below still leaves you. -- Bert Note that it uses mapply but that simply may have made implementation easier. However, if vectorizing gives an improvement over looping, the mapply may be the reason. f-function(x,y,z) catn(do something) Vectorize(f,c('x','y')) function (x, y, z) { args - lapply(as.list(match.call())[-1L], eval, parent.frame()) names - if (is.null(names(args))) character(length(args)) else names(args) dovec - names %in% vectorize.args do.call(mapply, c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)) } environment: 0x7fb3442553c8 DAV -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Monday, April 16, 2012 3:07 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table David: 1. My first name is Bert. 2. It never occurred to me that there would be a question. Indeed. But in fact you got solutions for two different interpretations (Greg's is what you wanted). That is what I meant when I said that clarity in asking the question is important. 3. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. I'd like to see your data on this. My experience is that they are typically comparable. Chambers in his Software for Data Analysis book says (pp 213): (with apply type functions rather than explicit loops), The computation should run faster... However, none of the apply mechanisms changes the number of times the supplied functions is called, so serious improvements will be limited to iterating simple calculations many times. 4. You can get serious improvements by vectorizing; and you can do that here, if I understand correctly, because all your arrays have identical dim = d. Here's how: ## assume your list of arrays is in listoftables alldat - do.call(cbind,listoftables) ## this might be the slow part ans - array(.rowSums (allDat), dim = d) See ?rowSums for explanations and caveats, especially with NA's . Cheers, Bert On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra dava...@verizon.net wrote: Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate and given a pre-existing list of tables, I was thinking of creating a temporary environment to hold the final result so it could be passed globally to each lapply execution level but that seems clunky and wasteful as well. Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1
Re: [R] Effeciently sum 3d table
On Apr 16, 2012, at 4:04 PM, David A Vavra wrote: even now you _could_ be clearer I fail to see why it's unclear. I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. I think it's precisely what I want. If I have two 3d tables, T1 and T2, then say either 1) T1 + T2 2) T1 - T2 (1) yields a third table equal to the sum of the individual cells and (2) yields a table full of zeroes. At least it does for matrices. Are you saying the T1+T2+T3+... above is equivalent to: sum(T1)+sum(T2)+sum(T3)+ when the table has more than 2d? I tried it out by hand I get the result I'm after. For me (with my slightly constricted mindset) it would have been clearer to have started out talking about matrices and arrays. An example would have save a bunch of time. What I want is a general solution. Reduce may be the answer but I find the documentation for it a bit daunting. Not to mention that it is far from obvious that I should have originally thought of using it. It is a function designed to do exactly what you requested: Reduce uses a binary function to successively combine the elements of a given vector. As it turns out the term 'vector' in this case includes lists of classed and/or dimensioned objects rather than being restricted to atomic vectors. -- David. DAV -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra Cc: 'Petr Savicky'; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Apr 16, 2012, at 2:43 PM, David A Vavra wrote: Thanks Petr, I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Or even now you _could_ be clearer. Do you want successive partial sums? That would yield to: Reduce(+, listoftables, accumaulate=TRUE) Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Here is a simple example: mylist - replicate(4, matrix(rnorm(12), ncol=3), simplify=FALSE) A - Reduce( `+`, mylist ) B - mylist[[1]] + mylist[[2]] + mylist[[3]] + mylist[[4]] all.equal(A,B) [1] TRUE Basically what Reduce does is it first applies the function (`+` in this case) to the 1st 2 elements of mylist, then applies it to that result and the 3rd element, then that result and the 4th element (and would continue on if mylist had more than 4 elements). It is basically a way to create functions like sum from functions like `+` which only work on 2 objects at a time. Another way to see what it is doing is to run something like: Reduce( function(a,b){ cat(I am adding,a,and,b,\n); a+b }, 1:10 ) The Reduce function will probably not be any faster than a really well written loop, but will probably be faster (both to write the command and to run) than a poorly designed naive loop application. On Mon, Apr 16, 2012 at 12:52 PM, David A Vavra dava...@verizon.net wrote: Thanks Greg, I think this may be what I'm after but the documentation for it isn't particularly clear. I hate it when someone documents a piece of code saying it works kinda like some other code (running elsewhere, of course) making the tacit assumption that everybody will immediately know what that means and implies. I'm sure I'll understand it once I know what it is trying to say. :) There's an item in the examples which may be exactly what I'm after. DAV -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: Monday, April 16, 2012 11:54 AM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
On Apr 16, 2012, at 5:41 PM, Greg Snow wrote: Here is a simple example: mylist - replicate(4, matrix(rnorm(12), ncol=3), simplify=FALSE) A - Reduce( `+`, mylist ) B - mylist[[1]] + mylist[[2]] + mylist[[3]] + mylist[[4]] all.equal(A,B) [1] TRUE Basically what Reduce does is it first applies the function (`+` in this case) to the 1st 2 elements of mylist, then applies it to that result and the 3rd element, then that result and the 4th element (and would continue on if mylist had more than 4 elements). It is basically a way to create functions like sum from functions like `+` which only work on 2 objects at a time. Another way to see what it is doing is to run something like: Reduce( function(a,b){ cat(I am adding,a,and,b,\n); a+b }, 1:10 ) The Reduce function will probably not be any faster than a really well written loop, but will probably be faster (both to write the command and to run) than a poorly designed naive loop application. It's faster on my machine (but only fractionally) but it has the as yet unremarked-upon advantage that it will preserve attributes of the tables such as dimnames. system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.841 0.007 0.851 system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.132 0.003 0.145 And on my system the Reduce strategy wins by a hair: system.time(ans3 - Reduce(+, z) ) user system elapsed 0.129 0.001 0.134 -- (the other) David. On Mon, Apr 16, 2012 at 12:52 PM, David A Vavra dava...@verizon.net wrote: Thanks Greg, I think this may be what I'm after but the documentation for it isn't particularly clear. I hate it when someone documents a piece of code saying it works kinda like some other code (running elsewhere, of course) making the tacit assumption that everybody will immediately know what that means and implies. I'm sure I'll understand it once I know what it is trying to say. :) There's an item in the examples which may be exactly what I'm after. DAV -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: Monday, April 16, 2012 11:54 AM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
On Apr 16, 2012, at 4:32 PM, Bert Gunter wrote: David: Here is a comparison of the gains to be made by vectorization (again, assuming I have interpreted your query correctly) ## create a list of arrays z - lapply(seq_len(1),function(i)array(runif(24),dim=2:4)) ## Using an apply type approach system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.620.000.62 ## vectorizing via rowSums and cbind system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.020.000.02 identical(ans1,ans2) [1] TRUE It's an example as well for the possibility that different OSes may perform differently. My Mac (an early 2008 model) is nowhere nearly as efficient with the second solution, despite being the the same ballpark with the first: system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.841 0.007 0.851 system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.132 0.003 0.145 And on my system the Reduce strategy is fastest: system.time(ans3 - Reduce(+, z) ) user system elapsed 0.129 0.001 0.134 And ...the Reduce() strategy would preserve other object attributes, something I'm quite sure the re-dimensioning of rowSums(cbind(.)) could not preserve. L - list( table(a, sample(a)) , table(a, sample(a)), table(a, sample(a)), table(a, sample(a)), table(a, sample(a)) ) str(Reduce(+, L) ) 'table' int [1:3, 1:3] 1 1 3 4 0 1 0 4 1 - attr(*, dimnames)=List of 2 ..$ a: chr [1:3] a b c ..$ : chr [1:3] a b c str( array(rowSums(do.call(cbind,L)),dim=c(3,3)) ) num [1:3, 1:3] 5 5 5 5 5 5 5 5 5 -- David. Cheers, Bert On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra dava...@verizon.net wrote: Thanks Bill, For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid doing so, I will. I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have shown this to be true. DAV -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra; 'Bert Gunter' Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) #[,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression that a for loop is very inefficient. Whenever I change them to lapply calls there is a noticeable improvement in run time for whatever reason. The problem with lapply here is that I effectively need a global table to hold the final sum. lapply also wants to return a value. You may be correct that in the long run, the loop is the best. There's a lot of extraneous memory wastage holding all of the tables in a list as well as the return 'values'. As an alternate
Re: [R] Effeciently sum 3d table
That _is_ interesting. Reduce() calls the sum function at the interpreted level, so I would not expect this. Can you check whether most of the time for my vectorized version is spent on the do.call(cbind ...) part, which is what I would guess. Otherwise, this sounds strange, since .rowSums is specifically built for speed -- so it says.. I also assume z is as I constructed. -- Bert On Mon, Apr 16, 2012 at 3:01 PM, David Winsemius dwinsem...@comcast.net wrote: On Apr 16, 2012, at 4:32 PM, Bert Gunter wrote: David: Here is a comparison of the gains to be made by vectorization (again, assuming I have interpreted your query correctly) ## create a list of arrays z - lapply(seq_len(1),function(i)array(runif(24),dim=2:4)) ## Using an apply type approach system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.62 0.00 0.62 ## vectorizing via rowSums and cbind system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.02 0.00 0.02 identical(ans1,ans2) [1] TRUE It's an example as well for the possibility that different OSes may perform differently. My Mac (an early 2008 model) is nowhere nearly as efficient with the second solution, despite being the the same ballpark with the first: system.time(ans1 - array(do.call(mapply,c(sum,z)),dim=2:4)) user system elapsed 0.841 0.007 0.851 system.time(ans2 -array(rowSums(do.call(cbind,z)),dim=2:4)) user system elapsed 0.132 0.003 0.145 And on my system the Reduce strategy is fastest: system.time(ans3 - Reduce(+, z) ) user system elapsed 0.129 0.001 0.134 And ...the Reduce() strategy would preserve other object attributes, something I'm quite sure the re-dimensioning of rowSums(cbind(.)) could not preserve. L - list( table(a, sample(a)) , table(a, sample(a)), table(a, sample(a)), table(a, sample(a)), table(a, sample(a)) ) str(Reduce(+, L) ) 'table' int [1:3, 1:3] 1 1 3 4 0 1 0 4 1 - attr(*, dimnames)=List of 2 ..$ a: chr [1:3] a b c ..$ : chr [1:3] a b c str( array(rowSums(do.call(cbind,L)),dim=c(3,3)) ) num [1:3, 1:3] 5 5 5 5 5 5 5 5 5 -- David. Cheers, Bert On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra dava...@verizon.net wrote: Thanks Bill, For reasons that aren't important here, I must start from a list. Computing the sum while generating the tables may be a solution but it means doing something in one piece of code that is unrelated to the surrounding code. Bad practice where I'm from. If it's needed it's needed but if I can avoid doing so, I will. I haven't done any timing but because of the extra operations of get and assign, the non-loop implementation will likely suffer. It seems you have shown this to be true. DAV -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra; 'Bert Gunter' Cc: r-help@r-project.org Subject: RE: [R] Effeciently sum 3d table Example in partial code: Env - CreatEnv() # my own function Assign('final',T1-T1,envir=env) L-listOfTables lapply(L,function(t) { final - get('final',envir=env) + t assign('final',final,envir=env) NULL }) First, finish writing that code so it runs and you can make sure its output is ok: L - lapply(1:5, function(i) array(i:(i+3), c(2,2))) # list of 50,000 2x2 matrices env - new.env() assign('final', L[[1]] - L[[1]], envir=env) junk - lapply(L, function(t) { final - get('final', envir=env) + t assign('final', final, envir=env) NULL }) get('final', envir=env) # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 sum( (2:50001) ) # should be final[2,1] # [1] 1250075000 You asked for something less clunky. You are fighting the system by using get() and assign(), just use ordinary expression syntax to get and set variables: final - L[[1]] for(i in seq_along(L)[-1]) final - final + L[[i]] final # [,1] [,2] # [1,] 1250025000 1250125000 # [2,] 1250075000 1250175000 The former took 0.22 seconds on my machine, the latter 0.06. You don't have to compute the whole list of matrices before doing the sum, just add to the current sum when you have computed one matrix and then forget about it. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A Vavra Sent: Monday, April 16, 2012 11:35 AM To: 'Bert Gunter' Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Thanks Gunter, I mean what I think is the normal definition of 'sum' as in: T1 + T2 + T3 + ... It never occurred to me that there would be a question. I have gotten the impression
Re: [R] Effeciently sum 3d table
OK, then. Thanks. I've read the docs more carefully and Reduce does indeed look like the ticket. For whatever reason, the first time I looked at the documentation my initial reaction was: huh? DAV -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, April 16, 2012 4:55 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Apr 16, 2012, at 4:04 PM, David A Vavra wrote: even now you _could_ be clearer I fail to see why it's unclear. I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. I think it's precisely what I want. If I have two 3d tables, T1 and T2, then say either 1) T1 + T2 2) T1 - T2 (1) yields a third table equal to the sum of the individual cells and (2) yields a table full of zeroes. At least it does for matrices. Are you saying the T1+T2+T3+... above is equivalent to: sum(T1)+sum(T2)+sum(T3)+ when the table has more than 2d? I tried it out by hand I get the result I'm after. For me (with my slightly constricted mindset) it would have been clearer to have started out talking about matrices and arrays. An example would have save a bunch of time. What I want is a general solution. Reduce may be the answer but I find the documentation for it a bit daunting. Not to mention that it is far from obvious that I should have originally thought of using it. It is a function designed to do exactly what you requested: Reduce uses a binary function to successively combine the elements of a given vector. As it turns out the term 'vector' in this case includes lists of classed and/or dimensioned objects rather than being restricted to atomic vectors. -- David. DAV -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, April 16, 2012 3:26 PM To: David A Vavra Cc: 'Petr Savicky'; r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Apr 16, 2012, at 2:43 PM, David A Vavra wrote: Thanks Petr, I'm after T1 + T2 + T3 + ... Which would be one number ... i.e. the result you originally said you did not want. and your solution is giving a list of n items each containing sum(T[i]). I guess I should have been clearer in stating what I need. Or even now you _could_ be clearer. Do you want successive partial sums? That would yield to: Reduce(+, listoftables, accumaulate=TRUE) Cheers, DAV -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Petr Savicky Sent: Monday, April 16, 2012 11:07 AM To: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table On Mon, Apr 16, 2012 at 10:28:43AM -0400, David A Vavra wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. Hi. Use lapply(), for example listoftables - list(array(1:8, dim=c(2, 2, 2)), array(2:9, dim=c(2, 2, 2))) lapply(listoftables, sum) [[1]] [1] 36 [[2]] [1] 44 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effeciently sum 3d table
Thanks again, Greg. I must have gotten up on the wrong side of the keyboard this morning and been having a spate of dim insight. What you've said here makes things clearer. DAV -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: Monday, April 16, 2012 5:42 PM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Here is a simple example: mylist - replicate(4, matrix(rnorm(12), ncol=3), simplify=FALSE) A - Reduce( `+`, mylist ) B - mylist[[1]] + mylist[[2]] + mylist[[3]] + mylist[[4]] all.equal(A,B) [1] TRUE Basically what Reduce does is it first applies the function (`+` in this case) to the 1st 2 elements of mylist, then applies it to that result and the 3rd element, then that result and the 4th element (and would continue on if mylist had more than 4 elements). It is basically a way to create functions like sum from functions like `+` which only work on 2 objects at a time. Another way to see what it is doing is to run something like: Reduce( function(a,b){ cat(I am adding,a,and,b,\n); a+b }, 1:10 ) The Reduce function will probably not be any faster than a really well written loop, but will probably be faster (both to write the command and to run) than a poorly designed naive loop application. On Mon, Apr 16, 2012 at 12:52 PM, David A Vavra dava...@verizon.net wrote: Thanks Greg, I think this may be what I'm after but the documentation for it isn't particularly clear. I hate it when someone documents a piece of code saying it works kinda like some other code (running elsewhere, of course) making the tacit assumption that everybody will immediately know what that means and implies. I'm sure I'll understand it once I know what it is trying to say. :) There's an item in the examples which may be exactly what I'm after. DAV -Original Message- From: Greg Snow [mailto:538...@gmail.com] Sent: Monday, April 16, 2012 11:54 AM To: David A Vavra Cc: r-help@r-project.org Subject: Re: [R] Effeciently sum 3d table Look at the Reduce function. On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra dava...@verizon.net wrote: I have a large number of 3d tables that I wish to sum Is there an efficient way to do this? Or perhaps a function I can call? I tried using do.call(sum,listoftables) but that returns a single value. So far, it seems only a loop will do the job. TIA, DAV -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.