Re: [SQL] order by different on mac vs linux

Scott Marlowe Wed, 16 May 2012 19:24:37 -0700

On Wed, May 16, 2012 at 7:58 PM, Wes James <[email protected]> wrote:
>
>
> On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <[email protected]>
> wrote:
>>
>>
>>
>> On Wed, May 16, 2012 at 3:46 PM, Wes James <[email protected]> wrote:
>>>
>>>
>>>
>>> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <[email protected]> wrote:
>>>>
>>>> Wes James <[email protected]> writes:
>>>> > Why is there a different order on the different platforms.
>>>>
>>>> This is not exactly unusual.  You should first check to see if
>>>> lc_collate is set differently in the two installations --- but even if
>>>> it's the same, there are often platform-specific interpretations of
>>>> the sorting rules.  (Not to mention that OS X is flat out broken when
>>>> it comes to sorting UTF8 data ...)
>>>>
>>>
>>> I just ran these:
>>>
>>> linux:
>>>
>>> on linux
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> -------
>>>  FALSE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>> ------------------------
>>>
>>> on mac os x:
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> ------
>>>  TRUE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>>
>>> -----------------------
>>>
>>> Why is the linux postgres saying false with the lc_collage set the way it
>>> is?
>>
>>
>> That's the point - UTF-8 collation is just completely broken under OS X.
>>  There's much previous discussion of the topic on this list and elsewhere.
>>  If you're developing on OS X but running linux and you are mostly using an
>> ascii character set in your test dataset, set your development OS X boxes to
>> use C collation, which will basically do what you expect it do do until you
>> start throwing multibyte characters at it.  If you can't constrain your
>> testing/development dataset in such a manner and collation order really
>> matters during development, then you probably shouldn't develop on OS X.  I
>> spent a fair amount of time investigating how to define a new charset in
>> what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to
>> behave just like ti does on Linux.  I just gave it up after wasting a few
>> too many hours on it. It may be possible to do it, but the return on
>> invested time was non-existent for me so I abandoned my effort.
>>
>
> Why are people saying os x is broken in my case?  Looking
> at  http://www.utf8-chartable.de/  and  http://www.asciitable.com/  for sort
> order, ! should come before apache.  On os x it is correct, on ubuntu linux
> it is not.  In the order by output per my previous emails, it is correct on
> os x, but no on linux.  Why do people keep saying os x is broken, when it is
> doing the correct thing from what I've seen?


You're confusing encoding with locale.  UTF-8 is an encoding.  If
there were no Locale (i.e. it was set to C, simple byte ordering) then
you'd be correct.  HOWEVER, a locale is a different animal altogether.
 For instance, most locales ignore many characters when it comes to
sort ordering.  Such as spaces:

smarlowe=# create table test (i text);
CREATE TABLE
smarlowe=# insert into test values ('abc'),(' abc'),('def'),(' def');
INSERT 0 4
smarlowe=# select * from test order by i;
  i
------
 abc
  abc
 def
  def
(4 rows)

Note the spaces are ignored for sorting purposes.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] order by different on mac vs linux

Reply via email to