Hi Seth, Quoting Seth Falcon <[EMAIL PROTECTED]>:
> Herve Pages <[EMAIL PROTECTED]> writes: > > So apparently here extracting with dat[i, ] is 300 times faster than > > extracting with dat[key, ] ! > > > >> system.time(for (i in 1:100) dat["1", ]) > > user system elapsed > > 12.680 0.396 13.075 > > > >> system.time(for (i in 1:100) dat[1, ]) > > user system elapsed > > 0.060 0.076 0.137 > > > > Good to know! > > I think what you are seeing here has to do with the space efficient > storage of row.names of a data.frame. The example data you are > working with has no specified row names and so they get stored in a > compact fashion: > > mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5) > dat <- as.data.frame(mat) > > > typeof(attr(dat, "row.names")) > [1] "integer" > > In the call to [.data.frame when i is character, the appropriate index > is found using pmatch and this requires that the row names be > converted to character. So in a loop, you get to convert the integer > vector to character vector at each iteration. Maybe this could be avoided. Why do you need to call pmath when the row names are integer? In [.data.frame if you replace this: ... if (is.character(i)) { rows <- attr(xx, "row.names") i <- pmatch(i, rows, duplicates.ok = TRUE) } ... by this ... if (is.character(i)) { rows <- attr(xx, "row.names") if (typeof(rows) == "integer") i <- as.integer(i) else i <- pmatch(i, rows, duplicates.ok = TRUE) } ... then you get a huge boost: - with current [.data.frame > system.time(for (i in 1:100) dat["1", ]) user system elapsed 34.994 1.084 37.915 - with "patched" [.data.frame > system.time(for (i in 1:100) dat["1", ]) user system elapsed 0.264 0.068 0.364 but maybe I'm missing somethig... Cheers, H. > > If you assign character row names, things will be a bit faster: > > # before > system.time(for (i in 1:25) dat["2", ]) > user system elapsed > 9.337 0.404 10.731 > > # this looks funny, but has the desired result > rownames(dat) <- rownames(dat) > typeof(attr(dat, "row.names") > > # after > system.time(for (i in 1:25) dat["2", ]) > user system elapsed > 0.343 0.226 0.608 > > And you probably would have seen this if you had looked at the the > profiling data: > > Rprof() > for (i in 1:25) dat["2", ] > Rprof(NULL) > summaryRprof() > > > + seth > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel