Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hervé Pagès
Hi Paul, On 11-12-07 10:29 AM, Roebuck,Paul L wrote: Do this first and try again. R Sys.setlocale(LC_COLLATE, C) OK I see it now (in ?Sys.setlocale): Sys.setlocale(LC_COLLATE, C) # turn off locale-specific sorting, # usually Thanks all for the

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Gordon Brown
Hi, folks, Underscores are, in fact, ignored in some collation orders, including (if I recall correctly) en_CA.UTF-8. It's caused me a bit of confusion now and then. No idea about English_United States.1252, but from the fact that Joris' example does not agree with Hervé's, it seems most likely

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hadley Wickham
Actually this is the situation I was facing when I did my first post: I have a function that downloads a list of sequences from the Ensembl FTP server, sorts them by name, and returns them to the user. I have a test for that function and the test was working for me when I was doing  

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Roebuck,Paul L
On 12/8/11 3:57 AM, Hervé Pagès hpa...@fhcrc.org wrote: On 11-12-07 10:29 AM, Roebuck,Paul L wrote: Do this first and try again. R Sys.setlocale(LC_COLLATE, C) OK I see it now (in ?Sys.setlocale): Sys.setlocale(LC_COLLATE, C) # turn off locale-specific sorting,

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hervé Pagès
Hi Barry, Hope you don't mind if I put this back on the list. On 11-12-08 05:50 AM, Barry Rowlingson wrote: 2011/12/8 Hervé Pagèshpa...@fhcrc.org: A naive question: wouldn't everything be simpler if LC_COLLATE=C was the default for everybody? Yet when we Brits suggest everything would be

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Barry Rowlingson
2011/12/7 Hervé Pagès hpa...@fhcrc.org: rank(xa) See help(Comparison), specifically: Beware of making _any_ assumptions about the collation order followed by Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic. Barry

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Joris Meys
@Barry : regardless of whether '_' comes before or after '1' , it should be consistent. Adding an 'a' shouldn't shift '_' from before '1' to between '1' and '2', that's clearly an error. The help files are not stating anything about that. The only thing I can imagine, is that '_' gets ignored (in

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/12/11 15:48, Joris Meys wrote: @Barry : regardless of whether '_' comes before or after '1' , it should be consistent. Adding an 'a' shouldn't shift '_' from before '1' to between '1' and '2', that's clearly an error. The help files are not

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Gabriel Becker
I'm not an expert on Locales but those that are getting this behavior and those that aren't appear to be different. (in fact, all three sets are slightly different). Isn't sorting order based on Locale rather than any internal R code anyway? ~G On Wed, Dec 7, 2011 at 7:06 AM, Rainer M Krug

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Roebuck,Paul L
Do this first and try again. R Sys.setlocale(LC_COLLATE, C) On 12/7/11 3:41 AM, Hervé Pagès hpa...@fhcrc.org wrote: Hi, This looks OK: x - c(_1_, 1_9, 2_9) rank(x) [1] 1 2 3 But this does not: xa - paste(x, a, sep=) xa [1] _1_a 1_9a 2_9a rank(xa) [1] 2 1 3 Cheers, H.

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread peter dalgaard
On Dec 7, 2011, at 15:48 , Joris Meys wrote: @Barry : regardless of whether '_' comes before or after '1' , it should be consistent. Adding an 'a' shouldn't shift '_' from before '1' to between '1' and '2', that's clearly an error. The help files are not stating anything about that. The only

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Barry Rowlingson
2011/12/7 Joris Meys jorism...@gmail.com: @Barry : regardless of whether '_' comes before or after '1' , it should be consistent. Adding an 'a' shouldn't shift '_' from before '1' to between '1' and '2', that's clearly an error. The help files are not stating anything about that. That's an

Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-07 Thread Joris Meys
2011/12/7 Barry Rowlingson b.rowling...@lancaster.ac.uk: 2011/12/7 Joris Meys jorism...@gmail.com: @Barry : regardless of whether '_' comes before or after '1' , it should be consistent. Adding an 'a' shouldn't shift '_' from before '1' to between '1' and '2', that's clearly an error. The help