>>>>> "PS" == Petr Savicky <savi...@cs.cas.cz> >>>>> on Sat, 23 May 2009 09:44:54 +0200 writes:
PS> Function factor() in the current development version PS> (2009-05-22) guarantees that levels are different PS> character strings. However, they may represent the same PS> decimal number. The following example is derived from a PS> posting by Stavros Macrakis in thread "Match .3 in a PS> sequence" in March PS> nums <- 0.3 + 2e-16 * c(-2,-1,1,2) f <- factor(nums) PS> levels(f) # [1] "0.300000000000000" "0.3" PS> The levels differ in trailing zeros, but represent the PS> same decimal number. Besides that this is not really PS> meaningful, it may cause a problem, when using PS> as.numeric(levels(f)). PS> In the above case, as.numeric() works fine and maps the PS> two levels to the same number. However, there are cases, PS> where the difference in trailing zeros implies different PS> values in as.numeric(levels(f)) and these values may PS> even form a decreasing sequence although levels were PS> constructed from an increasing sequence of numbers. PS> Examples are platform dependent, but may be found by the PS> following code. Tested on Intel under Linux (both with PS> and without SSE) and also under Windows with an older PS> version of R. PS> for (i in 1:100000) { PS> x <- 10^(floor(runif(1, 61, 63)) + runif(1)/2) PS> x <- as.numeric(sprintf("%.14g", x)) PS> eps <- 2^(floor(log2(x)) - 52) PS> k <- round(x * c(5e-16, 1e-15) / eps) PS> if (x > 1e62) { k <- rev( - k) } PS> y <- x + k[1]:k[2] * eps PS> ind <- which(diff(as.numeric(as.character(y))) < 0) PS> for (j in ind) { PS> u1 <- y[c(j, j+1)] PS> u2 <- factor(u1) PS> print(levels(u2)) PS> print(diff(as.numeric(levels(u2)))) PS> aux <- readline("next") PS> } PS> } PS> An example of the output is PS> [1] "1.2296427920313e+61" "1.22964279203130e+61" PS> [1] -1.427248e+45 PS> next PS> [1] "1.82328862326830e+62" "1.8232886232683e+62" PS> [1] -2.283596e+46 PS> next PS> The negative number in diff(as.numeric(levels(u2))) demonstrates cases, PS> when as.numeric(levels(u2)) is decreasing. We can also see that the reason PS> is that the two strings in levels(u2) differ in the trailing zeros. PS> I did quite intensive search for such examples for all possible exponents PS> (not only 61 and 62 and a week of CPU on three processors) and all the obtained PS> examples were caused by a difference in trailing zeros. So, i believe that PS> removing trailing zeros from the output of as.character(x) solves the problem PS> with the reversed order in as.numeric(levels(factor(x))) entirely. PS> A patch against R-devel_2009-05-22, which eliminates trailing zeros PS> from as.character(x), but makes no other changes to as.character(x), PS> is in an attachment. Using the patch, we obtain a better result also PS> in the following. PS> nums <- 0.3 + 2e-16 * c(-2,-1,1,2) PS> factor(nums) PS> # [1] 0.3 0.3 0.3 0.3 PS> # Levels: 0.3 yes, of course. PS> Petr. [.....] I have very slightly modified the changes (to get rid of -Wall warnings) and also exported the function as Rf_dropTrailing0(), and tested the result with 'make check-all' . As the change seems reasonable and consequent, and as it seems not to produce any problems in our tests, I'm hereby proposing to commit it (my version of it), [to R-devel only] within a few days, unless someone speaks up. Martin Maechler, ETH Zurich PS> --- R-devel/src/main/coerce.c 2009-04-17 17:53:35.000000000 +0200 PS> +++ R-devel-elim-trailing/src/main/coerce.c 2009-05-23 08:39:03.914774176 +0200 PS> @@ -294,12 +294,33 @@ PS> else return mkChar(EncodeInteger(x, w)); PS> } PS> +const char *elim_trailing(const char *s, char cdec) PS> +{ PS> + const char *p; PS> + char *replace; PS> + for (p = s; *p; p++) { PS> + if (*p == cdec) { PS> + replace = (char *) p++; PS> + while ('0' <= *p & *p <= '9') { PS> + if (*(p++) != '0') { PS> + replace = (char *) p; PS> + } PS> + } PS> + while (*(replace++) = *(p++)) { PS> + ; PS> + } PS> + break; PS> + } PS> + } PS> + return s; PS> +} PS> + PS> SEXP attribute_hidden StringFromReal(double x, int *warn) PS> { PS> int w, d, e; PS> formatReal(&x, 1, &w, &d, &e, 0); PS> if (ISNA(x)) return NA_STRING; PS> - else return mkChar(EncodeReal(x, w, d, e, OutDec)); PS> + else return mkChar(elim_trailing(EncodeReal(x, w, d, e, OutDec), OutDec)); PS> } PS> SEXP attribute_hidden StringFromComplex(Rcomplex x, int *warn) ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel