Your 2 examples have 2 differences and they are therefore confounded in their effects.
What are your results for: system.time(for (i in 1:100) {row <- dat[i, ] }) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Herve Pages > Sent: Friday, March 02, 2007 11:40 AM > To: r-devel@r-project.org > Subject: [Rd] extracting rows from a data frame by looping > over the row names: performance issues > > Hi, > > > I have a big data frame: > > > mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5) > > dat <- as.data.frame(mat) > > and I need to do some computation on each row. Currently I'm > doing this: > > > for (key in row.names(dat)) { row <- dat[key, ]; ... do > some computation on row... } > > which could probably considered a very natural (and R'ish) > way of doing it (but maybe I'm wrong and the real idiom for > doing this is something different). > > The problem with this "idiomatic form" is that it is _very_ > slow. The loop itself + the simple extraction of the rows (no > computation on the rows) takes 10 hours on a powerful server > (quad core Linux with 8G of RAM)! > > Looping over the first 100 rows takes 12 seconds: > > > system.time(for (key in row.names(dat)[1:100]) { row <- > dat[key, ] }) > user system elapsed > 12.637 0.120 12.756 > > But if, instead of the above, I do this: > > > for (i in nrow(dat)) { row <- sapply(dat, function(col) col[i]) } > > then it's 20 times faster!! > > > system.time(for (i in 1:100) { row <- sapply(dat, > function(col) col[i]) }) > user system elapsed > 0.576 0.096 0.673 > > I hope you will agree that this second form is much less natural. > > So I was wondering why the "idiomatic form" is so slow? > Shouldn't the idiomatic form be, not only elegant and easy to > read, but also efficient? > > > Thanks, > H. > > > > sessionInfo() > R version 2.5.0 Under development (unstable) (2007-01-05 > r40386) x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_ > MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_A > DDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" > "datasets" "methods" > [7] "base" > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel