>>>>> "TH" == Tim Hesterberg <[EMAIL PROTECTED]> >>>>> on Tue, 1 Jul 2008 15:23:53 -0700 writes:
TH> There is a bug in the standard version of [.data.frame; TH> it mixes up handling duplicates and NAs when subscripting rows. TH> x <- data.frame(x=1:3, y=2:4, row.names=c("a","b","NA")) TH> y <- x[c(2:3, NA),] TH> y TH> It creates a data frame with duplicate rows, but won't print. and that's a bug, indeed ("introduced" to R version 2.5.0, when the [.data.frame code was much optimized for speed, with quite some care), and I have commited a fix (and a regression test) to both R-devel and R-patched. Thanks a lot for the bug report, Tim! Now about your newly proposed code: I'm sorry to say that it looks so much different from the source code in https://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R that I don't think we would accept it as a substitute, easily. Could you try to provide a minimal patch against the source code and also a selfcontained example that exhibits the speed gain you are aiming for ? Best regards, Martin Maechler, ETH Zurich [.........................] TH> On Tue, Jul 1, 2008 at 11:20 AM, Tim Hesterberg <[EMAIL PROTECTED]> TH> wrote: >> Below is a version of [.data.frame that is faster >> for subscripting rows of large data frames; it avoids calling >> duplicated(rows) >> if there is no need to check for duplicate row names, when: >> i is logical >> attr(x, "dup.row.names") is not NULL (S+ compatibility) >> i is numeric and negative >> i is strictly increasing >> TH> [[alternative HTML version deleted]] TH> ______________________________________________ TH> R-devel@r-project.org mailing list TH> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel