On ons, 2011-06-22 at 01:43 -0700, Samuel Gendler wrote: > I seem to recall a thread here about it ignoring spaces entirely in that > collation (and maybe ignoring capitalization, too?).
The way it works is that every collating element (letter or other character or character group that you sort as a unit) is assigned four weights (primary, secondary, tertiary, and quaternary), and the sorting then first compares the primary weights, then the secondary weights, etc. The primary weight typically indicates the overall sort order, like A before B, the secondary weight has to do with diacritic marks, the tertiary with letter case, and the fourth level is only used in special cases. So that's why it looks as though the capitalization is "ignored" unless both the primary and secondary weights are the same. > This worked: > > createdb -E UTF-8 --lc-collate=C some_db > > A quick google search > reveals that there is some kind of standard for unicode collation ( > http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is > represented by the en_US.UTF-8 collation or not. At least the collate category of the en_US.UTF-8 locale on glibc is unaltered from the ISO 14651 default ordering, which is equivalent to the Unicode default ordering. There several other locales for which that is also the case. Unfortunately, this is not exposed outside of the glibc source code. So you can't just select "give me a neutral default ordering". -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql