Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-07-10 Thread Peter Geoghegan
On Fri, Jun 9, 2017 at 11:09 AM, Robert Haas wrote: >> Isn't that what strxfrm() is? > > Yeah, just with bugs. If ICU has a non-buggy equivalent, then we can > make this work. I agree that it probably isn't worth using strxfrm() again, simply because the glibc implementation is buggy, and glibc

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Robert Haas
On Fri, Jun 9, 2017 at 1:45 PM, Peter Eisentraut wrote: > On 6/9/17 12:17, Robert Haas wrote: >> IOW, suppose there >> were a collation API call distill() which had the property that >> strcmp(distill(X), distill(Y)) == 0 iff X and Y are considered equal >> under that collation. Then, you could d

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2017 at 10:45 AM, Robert Haas wrote: >> But they are getting the sort order they need. They just don't get the >> equality semantics they expect. > > You're right. If we happened to ever guarantee the user a stable sort, then I'd be wrong. We don't, though. -- Peter Geoghegan

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Peter Eisentraut
On 6/9/17 12:17, Robert Haas wrote: > IOW, suppose there > were a collation API call distill() which had the property that > strcmp(distill(X), distill(Y)) == 0 iff X and Y are considered equal > under that collation. Then, you could define your hash function as > hash_any(distill(X)). Alternativ

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Robert Haas
On Fri, Jun 9, 2017 at 12:18 PM, Peter Geoghegan wrote: > On Fri, Jun 9, 2017 at 9:17 AM, Robert Haas wrote: >> I'm not exactly sure what is possible or >> desirable, but I would not be too surprised to hear complaints about >> the observed behavior different from the "pure" ICU behavior because

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2017 at 9:17 AM, Robert Haas wrote: > I'm not exactly sure what is possible or > desirable, but I would not be too surprised to hear complaints about > the observed behavior different from the "pure" ICU behavior because > of the tiebreak, and at least some users might even find it

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Robert Haas
On Fri, Jun 9, 2017 at 11:46 AM, Tom Lane wrote: > Peter Eisentraut writes: >> On 6/9/17 11:12, Tom Lane wrote: >>> https://www.postgresql.org/message-id/27064.1134753...@sss.pgh.pa.us > >> Good to know. That just says that if we were to go with the strcoll() >> result only, things would work co

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Tom Lane
Peter Eisentraut writes: > On 6/9/17 11:12, Tom Lane wrote: >> https://www.postgresql.org/message-id/27064.1134753...@sss.pgh.pa.us > Good to know. That just says that if we were to go with the strcoll() > result only, things would work correctly. There's still the hashing problem.

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Peter Eisentraut
On 6/9/17 11:12, Tom Lane wrote: > Robert Haas writes: >> I have to admit that I'm still a little confused about what's actually >> going on here. Commit says that it "fixes inconsistent behavior under >> glibc's hu_HU locale", but it doesn't say what sort of inconsistent >> behavior it fixes. >

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Tom Lane
Robert Haas writes: > I have to admit that I'm still a little confused about what's actually > going on here. Commit says that it "fixes inconsistent behavior under > glibc's hu_HU locale", but it doesn't say what sort of inconsistent > behavior it fixes. Unfortunately we were not good back then

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Peter Eisentraut
On 6/9/17 10:31, Robert Haas wrote: > + * In some locales strcoll() can claim that nonidentical strings are > + * equal. Believing that would be bad news for a number of reasons, > + * so we follow Perl's lead and sort "equal" strings according to > + * strcmp(). >

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-09 Thread Robert Haas
On Fri, Jun 2, 2017 at 2:22 PM, Peter Geoghegan wrote: > On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar > wrote: >> Ok. I was thinking we are doing the tie-breaker because specifically >> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it, >> that we do that to be compatible

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-08 Thread Amit Khandekar
On 2 June 2017 at 23:52, Peter Geoghegan wrote: > On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar > wrote: >> Ok. I was thinking we are doing the tie-breaker because specifically >> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it, >> that we do that to be compatible with te

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-02 Thread Peter Geoghegan
On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar wrote: > Ok. I was thinking we are doing the tie-breaker because specifically > strcoll_l() was unexpectedly returning 0 for some cases. Now I get it, > that we do that to be compatible with texteq(). Both of these explanations are correct, in a way

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-02 Thread Amit Khandekar
On 2 June 2017 at 03:18, Thomas Munro wrote: > On Fri, Jun 2, 2017 at 9:27 AM, Peter Geoghegan wrote: >> On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro >> wrote: >>> Why should ICU be any different than the system provider in this >>> respect? In both cases, we have a two-level comparison: first

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Thomas Munro
On Fri, Jun 2, 2017 at 9:27 AM, Peter Geoghegan wrote: > On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro > wrote: >> Why should ICU be any different than the system provider in this >> respect? In both cases, we have a two-level comparison: first we use >> the collation-aware comparison, and then a

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Tom Lane
Peter Geoghegan writes: > On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro > wrote: >> Why should ICU be any different than the system provider in this >> respect? In both cases, we have a two-level comparison: first we use >> the collation-aware comparison, and then as a tie breaker, we use a >> bi

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Peter Geoghegan
On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro wrote: > Why should ICU be any different than the system provider in this > respect? In both cases, we have a two-level comparison: first we use > the collation-aware comparison, and then as a tie breaker, we use a > binary comparison. If we didn't do

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Thomas Munro
On Fri, Jun 2, 2017 at 6:58 AM, Amit Khandekar wrote: > While comparing two text strings using varstr_cmp(), if *strcoll*() > call returns 0, we do strcmp() tie-breaker to do binary comparison, > because strcoll() can return 0 for non-identical strings : > > varstr_cmp() > { > ... > /* > * In some