On 2 June 2017 at 23:52, Peter Geoghegan <p...@bowt.ie> wrote: > On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar <amitdkhan...@gmail.com> > wrote: >> Ok. I was thinking we are doing the tie-breaker because specifically >> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it, >> that we do that to be compatible with texteq(). > > Both of these explanations are correct, in a way. See commit 656beff. > >> Secondly, I was also considering if ICU especially has a way to >> customize an ICU locale by setting some attributes which dictate >> comparison or sorting rules for a set of characters. I mean, if there >> is such customized ICU locale defined in the system, and we use that >> to create PG collation, I thought we might have to strictly follow >> those rules without a tie-breaker, so as to be 100% conformant to ICU. >> I can't come up with an example, or may there isn't one, but , say , >> there is a locale which is supposed to sort only by lowest comparison >> strength (de@strength=1 ?? ). In that case, there might be many >> characters considered equal, but PG < operator or > operator would >> still return true for those chars. > > In the terminology of the Unicode collation algorithm, PostgreSQL > "forces deterministic comparisons" [1]. There is a lot of information > on the details of that within the UCA spec. > > If we ever wanted to offer a case insensitive collation feature, then > we wouldn't necessarily have to do the equivalent of a full strxfrm() > when hashing, at least with collations controlled by ICU. Perhaps we > could instead use a collator whose UCOL_STRENGTH is only UCOL_PRIMARY > to build binary sort keys, and leave the rest to a ucol_equal() call > (within texteq()) that has the usual UCOL_STRENGTH for the underlying > PostgreSQL collation. > > I don't think it would be possible to implement case insensitive > collations by using some pre-existing ICU collation that is case > insensitive. Instead, an implementation might directly vary collation > strength of any given collation to achieve case insensitivity. > PostgreSQL would know that this collation was case insensitive, so > regular collations wouldn't need to change their > behavior/implementation (to use ucol_equal() within texteq(), and so > on).
Ah ok. Understood, thanks. Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers