Still, from a user's perspective this behavior is somewhat irritating. Wouldn't it be better to rewrite unique.matrix to use formatC or sprintf instead of as.character, on which paste in line 9 implicitly relies, at least in R version 2.12.2 (2011-02-25)?
For example, use temp <- apply(x, MARGIN, formatC, digits=324, format="f") instead of temp <- apply(x, MARGIN, function(x) paste(x, collapse = "\r")) Don't know whether this affects performance, though. Sorry to chime in late. Cheers, Jochen > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 On Mar 9, 2011, at 20:11 , Simon Urbanek wrote: > match() is a red herring here -- it is really a very specific thing that has > to do with the fact that you're running unique() on a matrix. Also it's much > easier to reproduce: > >> x=c(1,1+0.2e-15) >> x > [1] 1 1 >> sprintf("%a",x) > [1] "0x1p+0" "0x1.0000000000001p+0" >> unique(x) > [1] 1 1 >> sprintf("%a",unique(x)) > [1] "0x1p+0" "0x1.0000000000001p+0" >> unique(matrix(x,2)) > [,1] > [1,] 1 > > and this comes from the fact that unique.matrix uses string representation > since it has to take into account all values of a row/column so it pastes all > values into one string, but for the two numbers that is the same: >> as.character(x) > [1] "1" "1" > > Cheers, > Simon > > > On Mar 9, 2011, at 9:48 AM, Terry Therneau wrote: > >> I stumbled onto this working on an update to coxph. The last 6 lines >> below are the question, the rest create a test data set. >> >> tmt585% R >> R version 2.12.2 (2011-02-25) >> Copyright (C) 2011 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> # Lines of code from survival/tests/singtest.R >>> library(survival) >> Loading required package: splines >>> test1 <- data.frame(time= c(4, 3,1,1,2,2,3), >> + status=c(1,NA,1,0,1,1,0), >> + x= c(0, 2,1,1,1,0,0)) >>> >>> temp <- rep(0:3, rep(7,4)) >>> >>> stest <- data.frame(start = 10*temp, >> + stop = 10*temp + test1$time, >> + status = rep(test1$status,4), >> + x = c(test1$x+ 1:7, rep(test1$x,3)), >> + epoch = rep(1:4, rep(7,4))) >>> >>> fit1 <- coxph(Surv(start, stop, status) ~ x * factor(epoch), stest) >> >> ## New lines >>> temp1 <- fit1$linear.predictor >>> temp2 <- as.matrix(temp1) >>> match(temp1, unique(temp1)) >> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6 >>> match(temp2, unique(temp2)) >> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 NA NA NA 6 6 6 8 8 8 >> 6 6 >> >> ----------------------- >> >> I've solved it for my code by not calling match on a 1 column vector. >> In general, however, should I be using some other paradym for this "map >> to unique" operation? For example match(as.character(x), >> unique(as.character(x)) ? >> >> Terry T >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel