Re: [SQL] a strange order by behavior

Peter Eisentraut Wed, 22 Jun 2011 13:10:40 -0700

On ons, 2011-06-22 at 01:43 -0700, Samuel Gendler wrote:
> I seem to recall a thread here about it ignoring spaces entirely in that
> collation (and maybe ignoring capitalization, too?).


The way it works is that every collating element (letter or other
character or character group that you sort as a unit) is assigned four
weights (primary, secondary, tertiary, and quaternary), and the sorting
then first compares the primary weights, then the secondary weights,
etc.  The primary weight typically indicates the overall sort order,
like A before B, the secondary weight has to do with diacritic marks,
the tertiary with letter case, and the fourth level is only used in
special cases.  So that's why it looks as though the capitalization is
"ignored" unless both the primary and secondary weights are the same.

> This worked:
> 
> createdb  -E UTF-8 --lc-collate=C some_db
> 
> A quick google search
> reveals that there is some kind of standard for unicode collation (
> http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is
> represented by the en_US.UTF-8 collation or not.

At least the collate category of the en_US.UTF-8 locale on glibc is
unaltered from the ISO 14651 default ordering, which is equivalent to
the Unicode default ordering.  There several other locales for which
that is also the case.  Unfortunately, this is not exposed outside of
the glibc source code.  So you can't just select "give me a neutral
default ordering".



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] a strange order by behavior

Reply via email to