Interesting. The original thread to which I was referring has a subject of "Sorting Issue" and the original request showed a list of vehicle model names which were sorting as though there were no spaces. The user had collation set to en_US.UTF-8. However, my database (on OS X) sorts both his example and the example that started this thread correctly, despite my lc_collate being set to the same value.
Then I just ran the exact same test on a Centos 5 linux host (postgresql 8.4 as well) and I am seeing the 'erroneous' sort order that prompted both this thread and the other. So you can't even assume the same behaviour for the same collation on different platforms. On OS X: # \l Name | Owner | Encoding | Collation | Ctype | Access privileges -------------------+------------+----------+-------------+-------------+----------------------- col_test | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 | # show lc_collate; lc_collate ------------- en_US.UTF-8 # select * from t1 order by f1; id | f1 ----+------------------- 1 | CX Hatchback 2 | CX Minivan 2 | CX Plus Minivan 2 | CX Sedan 2 | CX Sport Utility 2 | CXL Minivan 2 | CXL Premium Sedan 2 | CXL Sedan 2 | CXL Sport Utility 2 | CXL Turbo Sedan 2 | CXS Sedan On CentOS 5: # \l Name | Owner | Encoding | Collation | Ctype | Access privileges -------------------+------------+----------+-------------+-------------+----------------------- col_test | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 | # show lc_collate ecorithm-# ; lc_collate ------------- en_US.UTF-8 # select * from t1 order by f1; id | f1 ----+------------------- 1 | CX Hatchback 2 | CXL Minivan 2 | CXL Premium Sedan 2 | CXL Sedan 2 | CXL Sport Utility 2 | CXL Turbo Sedan 2 | CX Minivan 2 | CX Plus Minivan 2 | CX Sedan 2 | CX Sport Utility 2 | CXS Sedan Further testing would seem to reveal that OS X is using something resembling C collation order, despite the fact that it says en_US.UTF-8. I say this because it is also case sensitive. CentOS, on the other hand, is ignoring spaces, but is also case-insensitive. # select * from t1 order by f1 asc; id | f1 ----+------------------- 1 | CX Hatchback 2 | CXL Minivan 2 | CXL Premium Sedan 2 | CXL Sedan 2 | CXL Sport Utility 2 | CXL Turbo Sedan 2 | CX Minivan 2 | CX Plus Minivan 2 | cx Sedan 2 | CX Sedan 2 | CX Sport Utility 2 | CXS Sedan Pavel suggested using a collation of ucs_basic, but I get an error when I try that on linux: $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test createdb: database creation failed: ERROR: invalid locale name ucs_basic I was able to create the db with --lc_collate=C and get case-sensitive sorting that treats spaces 'correctly,' but I have no idea how reliable that is with multibyte characters and it almost certainly doesn't handle accented characters correctly in languages that have a mix of ascii and non-ascii characters, since the non-ascii chars will all sort as greater than the ascii chars, which is probably not how things are alphabetized in those languages. I happen to have my computer set so I can type cyrillic and it appears to work correctly with non-ascii chars when the entire charset is non-ascii, but it'd be a lot of effort to construct a test that utilizes a mix of 1,2,3, and 4 byte characters, since I'm not familiar with enough alphabets. test=# select * from t1 order by f1; id | f1 ----+------------------- 1 | CX Hatchback 2 | CX Minivan 2 | CX Plus Minivan 2 | CX Sedan 2 | CX Sport Utility 2 | CXL Minivan 2 | CXL Premium Sedan 2 | CXL Sedan 2 | CXL Sport Utility 2 | CXL Turbo Sedan 2 | CXS Sedan 2 | cx Sedan 2 | АИВЕТ 2 | П АИВЕТ 2 | П РИВЕТ 2 | П СИВЕТ 2 | ПРИВЕТ 2 | РИВЕТ 2 | СИВЕТ 2 | привет