Re: [HACKERS] Unicode Normalization

2009-09-24 Thread David E. Wheeler
On Sep 24, 2009, at 6:24 AM, p...@thetdh.com wrote: In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? That might be nice, but I'd be wary

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread Andrew Dunstan
David E. Wheeler wrote: On Sep 24, 2009, at 6:24 AM, p...@thetdh.com wrote: In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? That might be

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread David E. Wheeler
On Sep 24, 2009, at 8:59 AM, Andrew Dunstan wrote: That might be nice, but I'd be wary of a geometric multiplication of text types. We already have TEXT and CITEXT; what if we had your NTEXT (normalized text) but I wanted it to also be case-insensitive? Actually, I don't think it's

Re: [HACKERS] Unicode Normalization

2009-09-24 Thread pg
In a context using normalization, wouldn't you typically want to store a normalized-text type that could perhaps (depending on locale) take advantage of simpler, more-efficient comparison functions? Whether you're doing INSERT/UPDATE, or importing a flat text file, if you canonicalize

[HACKERS] Unicode Normalization

2009-09-23 Thread David E. Wheeler
Hackers, I just had a discussion on IRC about unicode normalization in PostgreSQL. Apparently there is not support for it, currently. Andrew Gierth points out that it's part of the SQL spec to support it, though: RhodiumToad:e.g. NORMALIZE(foo,NFC,len) justatheory:Oh, just a function

Re: [HACKERS] Unicode Normalization

2009-09-23 Thread David E. Wheeler
On Sep 23, 2009, at 11:08 AM, David E. Wheeler wrote: I just had a discussion on IRC about unicode normalization in PostgreSQL. Apparently there is not support for it, currently. BTW, the only reference I found on the [to do list](http://wiki.postgresql.org/wiki/Todo ) was: More sensible

Re: [HACKERS] Unicode Normalization

2009-09-23 Thread David E. Wheeler
On Sep 23, 2009, at 11:08 AM, David E. Wheeler wrote: I looked around and found the Public Software Group's utf8proc project, which even includes some PostgreSQL support (not, alas, for normalization). It has an MIT-licensed C library that offers these functions: Sorry, forgot the link: