On Tue, Aug 10, 2010 at 13:49, Tom Lane <t...@sss.pgh.pa.us> wrote: > Rod Taylor <rod.tay...@gmail.com> writes: >> Does anybody have experience on the cost, if any, of making this change? > >> Pg 8.3: >> Encoding: SQL_ASCII >> LC_COLLATE: en_US >> LC_CTYPE: en_US > >> Pg 8.4: >> Encoding: SQL_ASCII >> Collation: en_US.UTF-8 >> Ctype: en_US.UTF-8 > > Well, *both* of those settings collections are fundamentally > wrong/bogus; any collation/ctype setting other than "C" is unsafe if > you've got encoding set to SQL_ASCII. But without knowing what your > platform thinks "en_US" means, it's difficult to speculate about what > the difference between them is. I suppose that your libc's default > assumption about encoding is not UTF-8, else these would be equivalent. > If it had been assuming a single-byte encoding, then telling it UTF8 > instead could lead to a significant slowdown in strcoll() speed ... > but I would think that would mainly be a problem if you had a lot of > non-ASCII data, and if you did, you'd be having a lot of problems other > than just performance. Have you noticed any change in sorting behavior?
Agreed with it being an interesting choice of settings. Nearly all of the data is 7-bit ASCII and what isn't seems to be a mix of UTF8, LATIN1, and LATIN15. I'm pretty sure it interpreted en_US to be LATIN1. There haven't been any noticeable changes in sorting order that I know of. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers