Re: [R] slow computation of functions over large datasets

2011-08-04 Thread Paul Hiemstra
Hi all, After reading this interesting discussion I delved a bit deeper into the subject matter. The following snippet of code (see the end of my mail) compares three ways of performing this task, using ddply, ave and one yet unmentioned option: data.table (a package). The piece of code

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread ONKELINX, Thierry
Dear Caroline, Here is a faster and more elegant solution. n - 1 exampledata - data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) library(plyr) system.time({ + ddply(exampledata, .(orderID), function(x){ + data.frame(itemPrice =

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread David Winsemius
On Aug 3, 2011, at 9:25 AM, Caroline Faisst wrote: Hello there, I’m computing the total value of an order from the price of the order items using a “for” loop and the “ifelse” function. Ouch. Schools really should stop teaching SAS and BASIC as a first language. I do this on a

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread David Winsemius
On Aug 3, 2011, at 9:59 AM, ONKELINX, Thierry wrote: Dear Caroline, Here is a faster and more elegant solution. n - 1 exampledata - data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) library(plyr) system.time({ + ddply(exampledata,

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread jim holtman
This takes about 2 secs for 1M rows: n - 100 exampledata - data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) require(data.table) # convert to data.table ed.dt - data.table(exampledata) system.time(result - ed.dt[ + ,

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread David Winsemius
On Aug 3, 2011, at 12:20 PM, jim holtman wrote: This takes about 2 secs for 1M rows: n - 100 exampledata - data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) require(data.table) # convert to data.table ed.dt - data.table(exampledata)

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread David Winsemius
On Aug 3, 2011, at 2:01 PM, Ken wrote: Hello, Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header. -Ken Hutchison Got any code? The OP offered a reproducible example, after all. -- David. On Aug 3, 2554 BE, at 1:12

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread Ken
Sorry about the lack of code, but using Davids example, would: tapply(itemPrice, INDEX=orderID, FUN=sum) work? -Ken Hutchison On Aug 3, 2554 BE, at 2:09 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 3, 2011, at 2:01 PM, Ken wrote: Hello, Perhaps transpose the table

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread Ken
Hello, Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header. -Ken Hutchison On Aug 3, 2554 BE, at 1:12 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 3, 2011, at 12:20 PM, jim holtman wrote: This takes

Re: [R] slow computation of functions over large datasets

2011-08-03 Thread David Winsemius
On Aug 3, 2011, at 3:05 PM, Ken wrote: Sorry about the lack of code, but using Davids example, would: tapply(itemPrice, INDEX=orderID, FUN=sum) work? Doesn't do the cumulative sums or the assignment into column of the same data.frame. That's why I used ave, because it keeps the sequence