On Aug 3, 2011, at 9:25 AM, Caroline Faisst wrote:
Hello there,
Im computing the total value of an order from the price of the
order items
using a for loop and the ifelse function.
Ouch. Schools really should stop teaching SAS and BASIC as a first
language.
I do this on a large dataframe
(close to 1m lines). The computation of this function is painfully
slow: in
1min only about 90 rows are calculated.
The computation time taken for a given number of rows increases with
the
size of the dataset, see the example with my function below:
# small dataset: function performs well
exampledata<-
data
.frame
(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7))
exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"]
system.time(for (i in 2:length(exampledata[,1]))
{exampledata[i,"orderAmount"]<-
ifelse
(exampledata
[i
,"orderID
"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]
+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])})
Try instead using 'ave' to calculate a cumulative 'sum' within
"orderID":
exampledata$orderAmt <- with(exampledata, ave(itemPrice, orderID,
FUN=cumsum) )
I assure you this will be more reproducible, faster, and
understandable.
# large dataset:
"medium" dataset really. Barely nudges the RAM dial on my machine.
the very same computational task takes much longer
exampledata2<-
data
.frame
(orderID
=
c
(1,1,1,2,2,3,3,3,4,5
:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020))
exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"]
system.time(for (i in 2:9)
{exampledata2[i,"orderAmount"]<-
ifelse
(exampledata2
[i
,"orderID
"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]
+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])})
> system.time( exampledata2$orderAmt <- with(exampledata2,
ave(itemPrice, orderID, FUN=cumsum) ) )
user system elapsed
35.106 0.811 35.822
On a three year-old machine. Not as fast as I expected, but not long
enough to require refilling the coffee cup either.
--
David.
Does someone know a way to increase the speed?
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.