On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > The trick for hashing such datatypes is to be able to guarantee that > "equal" values hash to the same hash code, which is typically possible > as long as you know the equality rules well enough. We could possibly > do that for text with pure-strcoll equality if we knew all the details > of what strcoll would consider "equal", but we do not.
It occurs to me that strxfrm would answer this question. If we made the hash function hash the result of strxfrm then we could make equality use strcoll and not fall back to strcmp. I'm suspect in a green field that's what we would do though the cpu cost might be enough to think hard about it. I'm not sure it's worth considering switching though. The cases where it matters to users incidentally is when you have a multi-column sort order and have values that are supposed to sort equal in the first column but print differently. Given that there seems to be some controversy in the locale definitions -- most locals seem to use "insignificant" factors like accents or ligatures as tie-breakers and avoid claiming different sequences are equal even when the language usually treats them as equivalent -- it doesn't seem super important to maintain the property for the few locales that fall the other way. Unless my impression is wrong and there's a good principled reason why some locales treat nearly equivalent strings one way and some treat them the other. -- greg -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers