It should be possible to run unique()/duplicated() column by column and incrementally update the set of unique/duplicated rows. This would avoid any coercing. The benefit should be even greater for data.frame():s.
My $.02 /Henrik On Thu, Mar 10, 2011 at 12:29 AM, Petr Savicky <savi...@cs.cas.cz> wrote: > On Wed, Mar 09, 2011 at 02:11:49PM -0500, Simon Urbanek wrote: >> match() is a red herring here -- it is really a very specific thing that has >> to do with the fact that you're running unique() on a matrix. Also it's much >> easier to reproduce: >> >> > x=c(1,1+0.2e-15) >> > x >> [1] 1 1 >> > sprintf("%a",x) >> [1] "0x1p+0" "0x1.0000000000001p+0" >> > unique(x) >> [1] 1 1 >> > sprintf("%a",unique(x)) >> [1] "0x1p+0" "0x1.0000000000001p+0" >> > unique(matrix(x,2)) >> [,1] >> [1,] 1 >> >> and this comes from the fact that unique.matrix uses string representation >> since it has to take into account all values of a row/column so it pastes >> all values into one string, but for the two numbers that is the same: >> > as.character(x) >> [1] "1" "1" > > I understand the use of match() in the original message by Terry Therneau > as an example of a situation, where the behavior of unique.matrix() becomes > visible even without looking at the last bits of the numbers. > > Let me suggest to consider the following example. > > x <- 1 + c(1.1, 1.3, 1.7, 1.9)*1e-14 > a <- cbind(rep(x, each=2), 2) > rownames(a) <- 1:nrow(a) > > The correct set of rows may be obtained using > > unique(a - 1) > > [,1] [,2] > 1 1.110223e-14 1 > 3 1.310063e-14 1 > 5 1.709743e-14 1 > 7 1.909584e-14 1 > > However, due to the use of paste(), which uses as.character(), in > unique.matrix(), we also have > > unique(a) > > [,1] [,2] > 1 1 2 > 5 1 2 > > Let me suggest to consider a transformation of the numeric columns > by rank() before the use of paste(). For example > > unique.mat <- function(a) > { > temp <- apply(a, 2, rank, ties.method="max") > temp <- apply(temp, 1, function(x) paste(x, collapse = "\r")) > a[!duplicated(temp), , drop=FALSE] > } > > unique.mat(a) > > [,1] [,2] > 1 1 2 > 3 1 2 > 5 1 2 > 7 1 2 > > Petr Savicky. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel