On Dec 7, 2011, at 15:48 , Joris Meys wrote: > @Barry : regardless of whether '_' comes before or after '1' , it > should be consistent. Adding an 'a' shouldn't shift '_' from before > '1' to between '1' and '2', that's clearly an error. The help files > are not stating anything about that. The only thing I can imagine, is > that '_' gets ignored (in that case 19a would rank before 1a).
As far as I remember, that is exactly the case. In some locales, and not even consistently across different OS versions of the "same" locale, there are characters that are ignored for collation. With that in mind, what we see is really not any stranger than "a" < "ab" but "ac" > "abc". R just uses what the OS supplies, so if you want to use words like "inconsistent" or "error", please direct them at those who define the locales. (And be prepared to realize that you may have kicked a hornet's nest...) > > This said, I can't reproduce. > >> x <- c("_1_", "1_9", "2_9") >> xa <- paste(x,'a',sep='') >> rank(x) > [1] 1 2 3 >> rank(xa) > [1] 1 2 3 > >> sessionInfo() > R version 2.14.0 Patched (2006-00-00 r00000) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] grDevices datasets splines graphics stats tcltk utils > methods base > > other attached packages: > [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 > survival_2.36-9 > > loaded via a namespace (and not attached): > [1] cluster_1.14.1 grid_2.14.0 lattice_0.19-33 svMisc_0.9-63 > tools_2.14.0 > > > 2011/12/7 Hervé Pagès <hpa...@fhcrc.org>: >> Hi, >> >> This looks OK: >> >>> x <- c("_1_", "1_9", "2_9") >>> rank(x) >> [1] 1 2 3 >> >> But this does not: >> >>> xa <- paste(x, "a", sep="") >>> xa >> [1] "_1_a" "1_9a" "2_9a" >>> rank(xa) >> [1] 2 1 3 >> >> Cheers, >> H. >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] tools_2.14.0 >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 9 264 59 87 > joris.m...@ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel