On Fri, Jun 9, 2017 at 11:46 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes: >> On 6/9/17 11:12, Tom Lane wrote: >>> https://www.postgresql.org/message-id/27064.1134753...@sss.pgh.pa.us > >> Good to know. That just says that if we were to go with the strcoll() >> result only, things would work correctly. > > There's still the hashing problem.
Tom, that mailing list discussions is very illuminating. Thanks for digging it up. Regarding the question of hashing, one way to support that would be if we had some sort of canonicalization function. IOW, suppose there were a collation API call distill() which had the property that strcmp(distill(X), distill(Y)) == 0 iff X and Y are considered equal under that collation. Then, you could define your hash function as hash_any(distill(X)). Alternatively, if the collation library provided its own hashing function, that would be fine too, and probably faster. On the other hand, is there any rule that says we have to support hashing? Certainly, if we defined a new datatype collated_text, it could have a btree opfamily and no hash opfamily. It's trickier with only one datatype, but possibly we could come up with a way for an opfamily to be consulted about whether it is available for a given choice of collation. I'm not exactly sure what is possible or desirable, but I would not be too surprised to hear complaints about the observed behavior different from the "pure" ICU behavior because of the tiebreak, and at least some users might even find it worth giving up hashing in order to get the exact sort order they need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers