David E. Wheeler napsal(a):
On Jul 7, 2008, at 12:21, David E. Wheeler wrote:

My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed consistently?

Let me try to answer my own question by citing this comment:

    /*
* Since we only care about equality or not-equality, we can avoid all the
     * expense of strcoll() here, and just do bitwise comparison.
     */

So, the upshot is that the = and <> operators are not locale-aware, yes? They just do byte comparisons. Is that really the way it should be? I mean, could there not be strings that are equivalent but have different bytes?

Correct. The problem is complex. It works fine only for normalized string. But postgres now assume that all utf8 strings are normalized.

If you need to implement < <= >= > operators you need to use strcol which take care of locale collation.

See unicode collation algorithm http://www.unicode.org/reports/tr10/

                Zdenek




--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to