On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:
So, the upshot is that the = and <> operators are not locale-aware,
yes? They just do byte comparisons. Is that really the way it
should be? I mean, could there not be strings that are equivalent
but have different bytes?
Correct. The problem is complex. It works fine only for normalized
string. But postgres now assume that all utf8 strings are normalized.
I see. So binary equivalence is okay, in that case.
If you need to implement < <= >= > operators you need to use strcol
which take care of locale collation.
Which varstr_cmp() does, I guess. It's what textlt uses, for example.
See unicode collation algorithm http://www.unicode.org/reports/tr10/
Wow, that looks like a fun read.
Best,
David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers