On Aug 3, 2011, at 2:01 PM, Ken wrote:

Hello,
Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header.
            -Ken Hutchison

 Got any code? The OP offered a reproducible example, after all.

--
David.

On Aug 3, 2554 BE, at 1:12 PM, David Winsemius <dwinsem...@comcast.net> wrote:


On Aug 3, 2011, at 12:20 PM, jim holtman wrote:

This takes about 2 secs for 1M rows:

n <- 1000000
exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10))
require(data.table)
# convert to data.table
ed.dt <- data.table(exampledata)
system.time(result <- ed.dt[
+                         , list(total = sum(itemPrice))
+                         , by = orderID
+                         ]
+            )
user  system elapsed
1.30    0.05    1.34

Interesting. Impressive. And I noted that the OP wanted what cumsum would provide and for some reason creating that longer result is even faster on my machine than the shorter result using sum.

--
David.

str(result)
Classes ‘data.table’ and 'data.frame':  198708 obs. of  2 variables:
$ orderID: int  1 2 3 4 5 6 8 9 10 11 ...
$ total  : num  49 37 72 92 50 76 34 22 65 39 ...
head(result)
  orderID total
[1,]       1    49
[2,]       2    37
[3,]       3    72
[4,]       4    92
[5,]       5    50
[6,]       6    76



On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst
<caroline.fai...@gmail.com> wrote:
Hello there,


I’m computing the total value of an order from the price of the order items using a “for” loop and the “ifelse” function. I do this on a large dataframe (close to 1m lines). The computation of this function is painfully slow: in
1min only about 90 rows are calculated.


The computation time taken for a given number of rows increases with the
size of the dataset, see the example with my function below:


# small dataset: function performs well

exampledata<- data .frame (orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7))

exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"]

system.time(for (i in 2:length(exampledata[,1]))
{exampledata[i,"orderAmount"]<- ifelse (exampledata [i ,"orderID "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])})


# large dataset: the very same computational task takes much longer

exampledata2<- data .frame (orderID = c (1,1,1,2,2,3,3,3,4,5 :2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020))

exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"]

system.time(for (i in 2:9)
{exampledata2[i,"orderAmount"]<- ifelse (exampledata2 [i ,"orderID "]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"] +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])})



Does someone know a way to increase the speed?


Thank you very much!

Caroline

     [[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to