On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >> AFAICT, most of the useful operations work on UChar, which is uint16: > >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > > 5324ef288165e2ac > > Oh, you're confusing UCS-2 with UTF-16, > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'll end up converting to UTF16 all the > time. We're certainly not going to change to using UTF16 as the actual > native string representation inside the backend, both because of the > space penalty and incompatibility with tools like bison.
I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are at a disadvantage when using the library. UTF-16 is considered more efficient to work with for everybody except ASCII users. :-) No opinion on the matter though. Changing PostgreSQL to UTF-16 would be an undertaking... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend