On Wed, Aug 10, 2005 at 11:17:40AM -0400, Alvaro Herrera wrote: > On Wed, Aug 10, 2005 at 10:04:23AM +0200, Martijn van Oosterhout wrote: > > > Comments welcome. I can write more, if people can suggest things to > > write about. I was thinking something about collation and locales but > > I'm sure sure I understand them myself. > > I'd really love to see a Q&A for encodings, recoding, and "I see strange > characters." Not sure how to phrase the question though.
I think you could write a whole section just on them and all the issues on various platforms. But having never dealt with a system with multiple languages / encodings I'm not sure I really understand the issues. You know, like: Encoding / character sets gotchas / recommendations: Languages: Asian European Programming: Perl Python Java ODBC Regular expressions Full text indexing etc... Platforms: Windows UNIX etc... The main thing I wonder about is does UTF-8 handle all characters anybody would want to use. I've been told it doesn't for Asian languages, in which case I don't see how this is a solvable problem anyway. I've collected quite a few comments from other people, so I'll post a slightly revised patch later. -- Martijn van Oosterhout <email@example.com> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Description: PGP signature