On Dec 7, 2011, at 15:48 , Joris Meys wrote:

> @Barry : regardless of whether '_' comes before or after '1' , it
> should be consistent. Adding an 'a' shouldn't shift '_' from before
> '1' to between '1' and '2', that's clearly an error. The help files
> are not stating anything about that. The only thing I can imagine, is
> that '_' gets ignored (in that case 19a would rank before 1a).

As far as I remember, that is exactly the case. In some locales, and not even 
consistently across different OS versions of the "same" locale, there are 
characters that are ignored for collation. With that in mind, what we see is 
really not any stranger than "a" < "ab" but "ac" > "abc". 

R just uses what the OS supplies, so if you want to use words like 
"inconsistent" or "error", please direct them at those who define the locales. 
(And be prepared to realize that you may have kicked a hornet's nest...)

> 
> This said, I can't reproduce.
> 
>> x <- c("_1_", "1_9", "2_9")
>> xa <- paste(x,'a',sep='')
>> rank(x)
> [1] 1 2 3
>> rank(xa)
> [1] 1 2 3
> 
>> sessionInfo()
> R version 2.14.0 Patched (2006-00-00 r00000)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
> 
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
>   methods   base
> 
> other attached packages:
> [1] svSocket_0.9-51 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3
> survival_2.36-9
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.14.1  grid_2.14.0     lattice_0.19-33 svMisc_0.9-63
> tools_2.14.0
> 
> 
> 2011/12/7 Hervé Pagès <hpa...@fhcrc.org>:
>> Hi,
>> 
>> This looks OK:
>> 
>>> x <- c("_1_", "1_9", "2_9")
>>> rank(x)
>> [1] 1 2 3
>> 
>> But this does not:
>> 
>>> xa <- paste(x, "a", sep="")
>>> xa
>> [1] "_1_a" "1_9a" "2_9a"
>>> rank(xa)
>> [1] 2 1 3
>> 
>> Cheers,
>> H.
>> 
>>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>> locale:
>>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> loaded via a namespace (and not attached):
>> [1] tools_2.14.0
>> 
>> 
>> --
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpa...@fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
> 
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to