On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: > > The authors of the library in question? Java? Anybody whose primary > > alphabet isn't LATIN1 based? :-) > Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than > UTF-8. That covers a lot of the world. Forcing those people to change > to UTF-16 does not strike me as a very good idea.
Ah. Thought you were talking UTF-8 vs UTF-16. > But Martijn already clarified that ICU does not actually force you to > switch everything to UTF-16, so this is not an issue anyway. If my memory is correct, it does this by converting it to UTF-16 first. This is a performance disadvantage (although it may not be worse than PostgreSQL's current implementation :-) ). > > Only ASCII values store more space efficiently in UTF-8. All values > > over 127 store more space efficiently using UTF-16. UTF-16 is easier > > to process. UTF-8 requires too many bit checks with single character > > offsets. I'm not an expert - I had this question before a year or two > > ago, and read up on the ideas of experts. > Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs. > UTF-16". I don't much like UTF-8 myself, but that's not a very informed > opinion, just like a feeling of "fly-killing-cannon" (when it's used to > store Latin-9-fitting text). *nod* Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly