Hi, folks, Underscores are, in fact, ignored in some collation orders, including (if I recall correctly) en_CA.UTF-8. It's caused me a bit of confusion now and then. No idea about "English_United States.1252", but from the fact that Joris' example does not agree with Hervé's, it seems most likely that it does not ignore them.
Cheers, - Gord Brown On 2011/12/07 14:48, "Joris Meys" <jorism...@gmail.com> wrote: > @Barry : regardless of whether '_' comes before or after '1' , it > should be consistent. Adding an 'a' shouldn't shift '_' from before > '1' to between '1' and '2', that's clearly an error. The help files > are not stating anything about that. The only thing I can imagine, is > that '_' gets ignored (in that case 19a would rank before 1a). > > This said, I can't reproduce. > >> x <- c("_1_", "1_9", "2_9") >> xa <- paste(x,'a',sep='') >> rank(x) > [1] 1 2 3 >> rank(xa) > [1] 1 2 3 > >> sessionInfo() > R version 2.14.0 Patched (2006-00-00 r00000) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] grDevices datasets splines graphics stats tcltk utils > methods base > > other attached packages: > [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 > survival_2.36-9 > > loaded via a namespace (and not attached): > [1] cluster_1.14.1 grid_2.14.0 lattice_0.19-33 svMisc_0.9-63 > tools_2.14.0 > > > 2011/12/7 Hervé Pagès <hpa...@fhcrc.org>: >> Hi, >> >> This looks OK: >> >>> x <- c("_1_", "1_9", "2_9") >>> rank(x) >> [1] 1 2 3 >> >> But this does not: >> >>> xa <- paste(x, "a", sep="") >>> xa >> [1] "_1_a" "1_9a" "2_9a" >>> rank(xa) >> [1] 2 1 3 >> >> Cheers, >> H. >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] tools_2.14.0 >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel