Re: Collation version tracking for macOS

2022-06-08 Thread Jeremy Schneider
> On Jun 8, 2022, at 03:19, Thomas Munro wrote: > > On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan wrote: >> ISTM that there are two mostly-distinct questions here: >> >> 1. How do we link to multiple versions of ICU at the same time, in a >> way that is going to work smoothly on

Re: Collation version tracking for macOS

2022-06-08 Thread Jeremy Schneider
New emoji are getting added with some frequency, it’s a thing lately… New Unicode chars use existing but unassigned code points. All code points are able to be encoded, claimed or unclaimed. Someone on old glibc or ICU can still store the new characters. As long as there’s an input field. You

Re: Collation version tracking for macOS

2022-06-08 Thread Thomas Munro
On Thu, Jun 9, 2022 at 5:42 AM Tom Lane wrote: > I'm sure that Apple are indeed updating the UTF8 data behind > their proprietary i18n APIs, but the libc APIs are mostly getting benign > neglect. As for how exactly they might be doing that, I don't know, but a bit of light googling tells me that

Re: Collation version tracking for macOS

2022-06-08 Thread Mark Dilger
> On Jun 7, 2022, at 1:10 PM, Tom Lane wrote: > > This is not the concern that I have. I agree that if we tell a user > that collation X changed behavior and he'd better reindex his indexes > that use collation X, but none of them actually contain any cases that > changed behavior, that's

Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Wed, Jun 8, 2022 at 4:02 PM Tom Lane wrote: > I'm very skeptical of this process as being a reason to push users > to reindex everything in sight. If U+ was not a thing last year, > there's no reason to expect that it appears in anyone's existing data, > and therefore the fact that it

Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
"Daniel Verite" writes: > Independently of these rules, all Unicode collations change frequently > because each release of Unicode adds new characters. Any string > that contains a code point that was previously unassigned is going > to be sorted differently by all collations when that code point

Re: Collation version tracking for macOS

2022-06-08 Thread Peter Geoghegan
On Wed, Jun 8, 2022 at 10:51 AM Tom Lane wrote: > Their POSIX collations seem to be legacy code that's entirely unrelated to > any modern collation support; in particular the "UTF8" ones are that in > name only. I'm sure that Apple are indeed updating the UTF8 data behind > their proprietary

Re: Collation version tracking for macOS

2022-06-08 Thread Daniel Verite
Tom Lane wrote: > Yeah, and it's exactly at the level of quirks that things are likely > to change. Nobody's going to suddenly start sorting B before A. > They might, say, change their minds about where the digram "cz" > sorts relative to single letters, in languages where special rules

Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
Robert Haas writes: > On Tue, Jun 7, 2022 at 4:10 PM Tom Lane wrote: >> I mean by "false positive" is telling every macOS user that they'd better >> reindex everything every year, when in point of fact Apple changes those >> collations almost never. > Do we actually know that to be true? Given

Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
Robert Haas writes: > On Tue, Jun 7, 2022 at 3:53 PM Tom Lane wrote: >> No, I quite agree that we have a problem. What I don't agree is that >> issuing a lot of false-positive warnings is a solution. > I mean, how many false-positive warnings do you think we'll get? The proposed patch would

Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Tue, Jun 7, 2022 at 4:10 PM Tom Lane wrote: > I mean by "false positive" is telling every macOS user that they'd better > reindex everything every year, when in point of fact Apple changes those > collations almost never. Do we actually know that to be true? Given how fast things seem to be

Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Tue, Jun 7, 2022 at 3:53 PM Tom Lane wrote: > No, I quite agree that we have a problem. What I don't agree is that > issuing a lot of false-positive warnings is a solution. That will > just condition people to ignore the warnings, and then when their > platform really does change behavior,

Re: Collation version tracking for macOS

2022-06-08 Thread Thomas Munro
On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan wrote: > ISTM that there are two mostly-distinct questions here: > > 1. How do we link to multiple versions of ICU at the same time, in a > way that is going to work smoothly on mainstream platforms? > > 2. What semantics around collations do we

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 4:29 PM Thomas Munro wrote: > The difference is that Debian has libllvm-{11,12,13,14}-dev packages, > but it does *not* have multiple -dev packages for libicu, just a > single libicu-dev which can be used to compile and link against their > chosen current library version.

Re: Collation version tracking for macOS

2022-06-07 Thread Thomas Munro
On Wed, Jun 8, 2022 at 10:59 AM Peter Geoghegan wrote: > On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro wrote: > > Yeah, it's possible to link against multiple versions in theory and > > that might be a way to do it if we were shipping our own N copies of > > ICU like DB2 does, but that's hard in

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro wrote: > Yeah, it's possible to link against multiple versions in theory and > that might be a way to do it if we were shipping our own N copies of > ICU like DB2 does, but that's hard in practice for shared libraries on > common distros (and vendoring

Re: Collation version tracking for macOS

2022-06-07 Thread Thomas Munro
On Wed, Jun 8, 2022 at 8:16 AM Peter Geoghegan wrote: > On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro wrote: > > Earlier I mentioned distinct "providers" but I take that back, that's > > too complicated. Reprising an old idea that comes up each time we > > talk about this, this time with some

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 2:13 PM Jeremy Schneider wrote: > For my for my part, gut feeling is that MacOS major releases will be > similar to any other OS major release, which may contain updates to > collation algorithms and locales. ISTM like the same thing PG is looking > for on other OS's to

Re: Collation version tracking for macOS

2022-06-07 Thread Bruce Momjian
On Tue, Jun 7, 2022 at 03:43:32PM -0400, Tom Lane wrote: > Thomas Munro writes: > > On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote: > >> Is this more involved than creating a list of all valid Unicode characters > >> (~144 thousand), sorting them, then running crc32 over the sorted order to

Re: Collation version tracking for macOS

2022-06-07 Thread Jeremy Schneider
On 6/7/22 1:51 PM, Peter Geoghegan wrote: > On Tue, Jun 7, 2022 at 1:24 PM Jeremy Schneider > wrote: >> This idea does seem to persist. It's not as frequent as timezones, but >> collation rules reflect local dialects and customs, and there are >> changes quite regularly for a variety of reasons.

Re: Collation version tracking for macOS

2022-06-07 Thread Robert Haas
On Tue, Jun 7, 2022 at 4:24 PM Jeremy Schneider wrote: > I haven't yet found a Red Hat minor release that changed > glibc collation. I feel like this is a thing that happens regularly enough that it's known to be a gotcha by many of my colleagues here at EDB. Perhaps that's all pure fiction,

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 1:24 PM Jeremy Schneider wrote: > This idea does seem to persist. It's not as frequent as timezones, but > collation rules reflect local dialects and customs, and there are > changes quite regularly for a variety of reasons. A brief perusal of > CLDR changelogs and CLDR

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 1:16 PM Tom Lane wrote: > This is not the concern that I have. I agree that if we tell a user > that collation X changed behavior and he'd better reindex his indexes > that use collation X, but none of them actually contain any cases that > changed behavior, that's not a

Re: Collation version tracking for macOS

2022-06-07 Thread Jeremy Schneider
On 6/7/22 12:53 PM, Peter Geoghegan wrote: > > Collations by their very nature are unlikely to change all that much. > Obviously they can and do change, but the details are presumably > pretty insignificant to a native speaker. This idea does seem to persist. It's not as frequent as timezones,

Re: Collation version tracking for macOS

2022-06-07 Thread Thomas Munro
On Wed, Jun 8, 2022 at 7:43 AM Tom Lane wrote: > The idea of fingerprinting a collation's behavior is interesting, > but I've got doubts about whether we can make a sufficiently thorough > fingerprint. On one of the many threads about this I recall posting a thought experiment patch that added

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro wrote: > Earlier I mentioned distinct "providers" but I take that back, that's > too complicated. Reprising an old idea that comes up each time we > talk about this, this time with some more straw-man detail: what about > teaching our ICU support to

Re: Collation version tracking for macOS

2022-06-07 Thread Tom Lane
Peter Geoghegan writes: > I agree that "false positive" is not a valid way of describing a > breaking change in a Postgres collation that happens to not affect one > index in particular, due to the current phase of the moon. It's > probably very likely that most individual indexes that we warn

Re: Collation version tracking for macOS

2022-06-07 Thread Peter Geoghegan
On Tue, Jun 7, 2022 at 12:37 PM Robert Haas wrote: > It's true that we don't have any false positives right now, but we > also have no true positives. Even a stopped clock is right twice a > day, but not in a useful way. People want to be notified when a > problem might exist, even if sometimes

Re: Collation version tracking for macOS

2022-06-07 Thread Tom Lane
Robert Haas writes: > In fact, I'd go so far as to argue that you're basically sticking your > head in the sand here. You wrote: No, I quite agree that we have a problem. What I don't agree is that issuing a lot of false-positive warnings is a solution. That will just condition people to

Re: Collation version tracking for macOS

2022-06-07 Thread Tom Lane
Thomas Munro writes: > On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote: >> Is this more involved than creating a list of all valid Unicode characters >> (~144 thousand), sorting them, then running crc32 over the sorted order to >> create the "version" for the library/collation pair? Far from

Re: Collation version tracking for macOS

2022-06-07 Thread Robert Haas
On Fri, Jun 3, 2022 at 4:58 PM Tom Lane wrote: > I think the real problem here is that the underlying software mostly > doesn't take this issue seriously. Unfortunately, that leads one to > the conclusion that we need to maintain our own collation code and > data (e.g., our own fork of ICU), and

Re: Collation version tracking for macOS

2022-06-07 Thread Thomas Munro
On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote: > Is this more involved than creating a list of all valid Unicode characters > (~144 thousand), sorting them, then running crc32 over the sorted order to > create the "version" for the library/collation pair? Far from free but few > databases

Re: Collation version tracking for macOS

2022-06-07 Thread Rod Taylor
On Mon, Jun 6, 2022 at 8:25 PM Tom Lane wrote: > Jim Nasby writes: > >> I think the real problem here is that the underlying software mostly > >> doesn't take this issue seriously. > > > The first step to a solution is admitting that the problem exists. > > Ignoring broken backups, segfaults

Re: Collation version tracking for macOS

2022-06-06 Thread Thomas Munro
On Tue, Jun 7, 2022 at 12:10 PM Jim Nasby wrote: > On 6/3/22 3:58 PM, Tom Lane wrote > > Thomas Munro writes: > >> On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider > >> wrote: > >>> It feels to me like we're still not really thinking clearly about this > >>> within the PG community, and that the

Re: Collation version tracking for macOS

2022-06-06 Thread Jeremy Schneider
> On Jun 6, 2022, at 17:10, Jim Nasby wrote: > Ignoring broken backups, segfaults and data corruption as a "rant" implies > that we simply throw in the towel and tell users to suck it up or switch > engines. Well now, let’s be clear, I was the one who called my email a “rant”.  And I do

Re: Collation version tracking for macOS

2022-06-06 Thread Tom Lane
Jim Nasby writes: >> I think the real problem here is that the underlying software mostly >> doesn't take this issue seriously. > The first step to a solution is admitting that the problem exists. > Ignoring broken backups, segfaults and data corruption as a "rant" > implies that we simply

Re: Collation version tracking for macOS

2022-06-03 Thread Tom Lane
Thomas Munro writes: > On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider > wrote: >> It feels to me like we're still not really thinking clearly about this >> within the PG community, and that the seriousness of this issue is not >> fully understood. > FWIW A couple of us tried quite hard to make

Re: Collation version tracking for macOS

2022-06-03 Thread Thomas Munro
On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider wrote: > No other piece of software that calls itself a database would do what > PostgreSQL is doing: just give users a "warning" after suddenly changing > the sort order algorithm (most users won't even read warnings in their > logs). Oracle, DB2,

Re: Collation version tracking for macOS

2022-06-03 Thread Thomas Munro
On Sat, Jun 4, 2022 at 12:17 AM Peter Eisentraut wrote: > On 07.05.22 02:31, Thomas Munro wrote: > > Last time I looked into this it seemed like macOS's strcoll() gave > > sensible answers in the traditional single-byte encodings, but didn't > > understand UTF-8 at all so you get C/strcmp()

Re: Collation version tracking for macOS

2022-06-03 Thread Jeremy Schneider
On 6/3/22 9:21 AM, Tom Lane wrote: > > According to that document, they changed it in macOS 11, which came out > a year and a half ago. Given the lack of complaints, it doesn't seem > like this is urgent enough to mandate a post-beta change that would > have lots of downside (namely,

Re: Collation version tracking for macOS

2022-06-03 Thread Tom Lane
Peter Eisentraut writes: > On 07.05.22 02:31, Thomas Munro wrote: >> Last time I looked into this it seemed like macOS's strcoll() gave >> sensible answers in the traditional single-byte encodings, but didn't >> understand UTF-8 at all so you get C/strcmp() order. In other words >> there was

Re: Collation version tracking for macOS

2022-06-03 Thread Peter Eisentraut
On 07.05.22 02:31, Thomas Munro wrote: During development, I have been using the attached patch to simulate libc collation versions on macOS. It just uses the internal major OS version number. I don't know to what the extend the libc locales on macOS are maintained or updated at all, so I

Re: Collation version tracking for macOS

2022-05-06 Thread Thomas Munro
On Mon, Feb 14, 2022 at 10:00 PM Peter Eisentraut wrote: > During development, I have been using the attached patch to simulate > libc collation versions on macOS. It just uses the internal major OS > version number. I don't know to what the extend the libc locales on > macOS are maintained or

Collation version tracking for macOS

2022-02-14 Thread Peter Eisentraut
Eisentraut Date: Tue, 1 Feb 2022 16:07:29 +0100 Subject: [PATCH] Collation version tracking for macOS --- src/backend/utils/adt/pg_locale.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index

<    1   2