> On Jun 8, 2022, at 03:19, Thomas Munro wrote:
>
> On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan wrote:
>> ISTM that there are two mostly-distinct questions here:
>>
>> 1. How do we link to multiple versions of ICU at the same time, in a
>> way that is going to work smoothly on
New emoji are getting added with some frequency, it’s a thing lately…
New Unicode chars use existing but unassigned code points. All code points are
able to be encoded, claimed or unclaimed.
Someone on old glibc or ICU can still store the new characters. As long as
there’s an input field. You
On Thu, Jun 9, 2022 at 5:42 AM Tom Lane wrote:
> I'm sure that Apple are indeed updating the UTF8 data behind
> their proprietary i18n APIs, but the libc APIs are mostly getting benign
> neglect.
As for how exactly they might be doing that, I don't know, but a bit
of light googling tells me that
> On Jun 7, 2022, at 1:10 PM, Tom Lane wrote:
>
> This is not the concern that I have. I agree that if we tell a user
> that collation X changed behavior and he'd better reindex his indexes
> that use collation X, but none of them actually contain any cases that
> changed behavior, that's
On Wed, Jun 8, 2022 at 4:02 PM Tom Lane wrote:
> I'm very skeptical of this process as being a reason to push users
> to reindex everything in sight. If U+ was not a thing last year,
> there's no reason to expect that it appears in anyone's existing data,
> and therefore the fact that it
"Daniel Verite" writes:
> Independently of these rules, all Unicode collations change frequently
> because each release of Unicode adds new characters. Any string
> that contains a code point that was previously unassigned is going
> to be sorted differently by all collations when that code point
On Wed, Jun 8, 2022 at 10:51 AM Tom Lane wrote:
> Their POSIX collations seem to be legacy code that's entirely unrelated to
> any modern collation support; in particular the "UTF8" ones are that in
> name only. I'm sure that Apple are indeed updating the UTF8 data behind
> their proprietary
Tom Lane wrote:
> Yeah, and it's exactly at the level of quirks that things are likely
> to change. Nobody's going to suddenly start sorting B before A.
> They might, say, change their minds about where the digram "cz"
> sorts relative to single letters, in languages where special rules
Robert Haas writes:
> On Tue, Jun 7, 2022 at 4:10 PM Tom Lane wrote:
>> I mean by "false positive" is telling every macOS user that they'd better
>> reindex everything every year, when in point of fact Apple changes those
>> collations almost never.
> Do we actually know that to be true? Given
Robert Haas writes:
> On Tue, Jun 7, 2022 at 3:53 PM Tom Lane wrote:
>> No, I quite agree that we have a problem. What I don't agree is that
>> issuing a lot of false-positive warnings is a solution.
> I mean, how many false-positive warnings do you think we'll get?
The proposed patch would
On Tue, Jun 7, 2022 at 4:10 PM Tom Lane wrote:
> I mean by "false positive" is telling every macOS user that they'd better
> reindex everything every year, when in point of fact Apple changes those
> collations almost never.
Do we actually know that to be true? Given how fast things seem to be
On Tue, Jun 7, 2022 at 3:53 PM Tom Lane wrote:
> No, I quite agree that we have a problem. What I don't agree is that
> issuing a lot of false-positive warnings is a solution. That will
> just condition people to ignore the warnings, and then when their
> platform really does change behavior,
On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan wrote:
> ISTM that there are two mostly-distinct questions here:
>
> 1. How do we link to multiple versions of ICU at the same time, in a
> way that is going to work smoothly on mainstream platforms?
>
> 2. What semantics around collations do we
On Tue, Jun 7, 2022 at 4:29 PM Thomas Munro wrote:
> The difference is that Debian has libllvm-{11,12,13,14}-dev packages,
> but it does *not* have multiple -dev packages for libicu, just a
> single libicu-dev which can be used to compile and link against their
> chosen current library version.
On Wed, Jun 8, 2022 at 10:59 AM Peter Geoghegan wrote:
> On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro wrote:
> > Yeah, it's possible to link against multiple versions in theory and
> > that might be a way to do it if we were shipping our own N copies of
> > ICU like DB2 does, but that's hard in
On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro wrote:
> Yeah, it's possible to link against multiple versions in theory and
> that might be a way to do it if we were shipping our own N copies of
> ICU like DB2 does, but that's hard in practice for shared libraries on
> common distros (and vendoring
On Wed, Jun 8, 2022 at 8:16 AM Peter Geoghegan wrote:
> On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro wrote:
> > Earlier I mentioned distinct "providers" but I take that back, that's
> > too complicated. Reprising an old idea that comes up each time we
> > talk about this, this time with some
On Tue, Jun 7, 2022 at 2:13 PM Jeremy Schneider
wrote:
> For my for my part, gut feeling is that MacOS major releases will be
> similar to any other OS major release, which may contain updates to
> collation algorithms and locales. ISTM like the same thing PG is looking
> for on other OS's to
On Tue, Jun 7, 2022 at 03:43:32PM -0400, Tom Lane wrote:
> Thomas Munro writes:
> > On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote:
> >> Is this more involved than creating a list of all valid Unicode characters
> >> (~144 thousand), sorting them, then running crc32 over the sorted order to
On 6/7/22 1:51 PM, Peter Geoghegan wrote:
> On Tue, Jun 7, 2022 at 1:24 PM Jeremy Schneider
> wrote:
>> This idea does seem to persist. It's not as frequent as timezones, but
>> collation rules reflect local dialects and customs, and there are
>> changes quite regularly for a variety of reasons.
On Tue, Jun 7, 2022 at 4:24 PM Jeremy Schneider
wrote:
> I haven't yet found a Red Hat minor release that changed
> glibc collation.
I feel like this is a thing that happens regularly enough that it's
known to be a gotcha by many of my colleagues here at EDB.
Perhaps that's all pure fiction,
On Tue, Jun 7, 2022 at 1:24 PM Jeremy Schneider
wrote:
> This idea does seem to persist. It's not as frequent as timezones, but
> collation rules reflect local dialects and customs, and there are
> changes quite regularly for a variety of reasons. A brief perusal of
> CLDR changelogs and CLDR
On Tue, Jun 7, 2022 at 1:16 PM Tom Lane wrote:
> This is not the concern that I have. I agree that if we tell a user
> that collation X changed behavior and he'd better reindex his indexes
> that use collation X, but none of them actually contain any cases that
> changed behavior, that's not a
On 6/7/22 12:53 PM, Peter Geoghegan wrote:
>
> Collations by their very nature are unlikely to change all that much.
> Obviously they can and do change, but the details are presumably
> pretty insignificant to a native speaker.
This idea does seem to persist. It's not as frequent as timezones,
On Wed, Jun 8, 2022 at 7:43 AM Tom Lane wrote:
> The idea of fingerprinting a collation's behavior is interesting,
> but I've got doubts about whether we can make a sufficiently thorough
> fingerprint.
On one of the many threads about this I recall posting a thought
experiment patch that added
On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro wrote:
> Earlier I mentioned distinct "providers" but I take that back, that's
> too complicated. Reprising an old idea that comes up each time we
> talk about this, this time with some more straw-man detail: what about
> teaching our ICU support to
Peter Geoghegan writes:
> I agree that "false positive" is not a valid way of describing a
> breaking change in a Postgres collation that happens to not affect one
> index in particular, due to the current phase of the moon. It's
> probably very likely that most individual indexes that we warn
On Tue, Jun 7, 2022 at 12:37 PM Robert Haas wrote:
> It's true that we don't have any false positives right now, but we
> also have no true positives. Even a stopped clock is right twice a
> day, but not in a useful way. People want to be notified when a
> problem might exist, even if sometimes
Robert Haas writes:
> In fact, I'd go so far as to argue that you're basically sticking your
> head in the sand here. You wrote:
No, I quite agree that we have a problem. What I don't agree is that
issuing a lot of false-positive warnings is a solution. That will
just condition people to
Thomas Munro writes:
> On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote:
>> Is this more involved than creating a list of all valid Unicode characters
>> (~144 thousand), sorting them, then running crc32 over the sorted order to
>> create the "version" for the library/collation pair? Far from
On Fri, Jun 3, 2022 at 4:58 PM Tom Lane wrote:
> I think the real problem here is that the underlying software mostly
> doesn't take this issue seriously. Unfortunately, that leads one to
> the conclusion that we need to maintain our own collation code and
> data (e.g., our own fork of ICU), and
On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor wrote:
> Is this more involved than creating a list of all valid Unicode characters
> (~144 thousand), sorting them, then running crc32 over the sorted order to
> create the "version" for the library/collation pair? Far from free but few
> databases
On Mon, Jun 6, 2022 at 8:25 PM Tom Lane wrote:
> Jim Nasby writes:
> >> I think the real problem here is that the underlying software mostly
> >> doesn't take this issue seriously.
>
> > The first step to a solution is admitting that the problem exists.
> > Ignoring broken backups, segfaults
On Tue, Jun 7, 2022 at 12:10 PM Jim Nasby wrote:
> On 6/3/22 3:58 PM, Tom Lane wrote
> > Thomas Munro writes:
> >> On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
> >> wrote:
> >>> It feels to me like we're still not really thinking clearly about this
> >>> within the PG community, and that the
> On Jun 6, 2022, at 17:10, Jim Nasby wrote:
> Ignoring broken backups, segfaults and data corruption as a "rant" implies
> that we simply throw in the towel and tell users to suck it up or switch
> engines.
Well now, let’s be clear, I was the one who called my email a “rant”.
And I do
Jim Nasby writes:
>> I think the real problem here is that the underlying software mostly
>> doesn't take this issue seriously.
> The first step to a solution is admitting that the problem exists.
> Ignoring broken backups, segfaults and data corruption as a "rant"
> implies that we simply
Thomas Munro writes:
> On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
> wrote:
>> It feels to me like we're still not really thinking clearly about this
>> within the PG community, and that the seriousness of this issue is not
>> fully understood.
> FWIW A couple of us tried quite hard to make
On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
wrote:
> No other piece of software that calls itself a database would do what
> PostgreSQL is doing: just give users a "warning" after suddenly changing
> the sort order algorithm (most users won't even read warnings in their
> logs). Oracle, DB2,
On Sat, Jun 4, 2022 at 12:17 AM Peter Eisentraut
wrote:
> On 07.05.22 02:31, Thomas Munro wrote:
> > Last time I looked into this it seemed like macOS's strcoll() gave
> > sensible answers in the traditional single-byte encodings, but didn't
> > understand UTF-8 at all so you get C/strcmp()
On 6/3/22 9:21 AM, Tom Lane wrote:
>
> According to that document, they changed it in macOS 11, which came out
> a year and a half ago. Given the lack of complaints, it doesn't seem
> like this is urgent enough to mandate a post-beta change that would
> have lots of downside (namely,
Peter Eisentraut writes:
> On 07.05.22 02:31, Thomas Munro wrote:
>> Last time I looked into this it seemed like macOS's strcoll() gave
>> sensible answers in the traditional single-byte encodings, but didn't
>> understand UTF-8 at all so you get C/strcmp() order. In other words
>> there was
On 07.05.22 02:31, Thomas Munro wrote:
During development, I have been using the attached patch to simulate
libc collation versions on macOS. It just uses the internal major OS
version number. I don't know to what the extend the libc locales on
macOS are maintained or updated at all, so I
On Mon, Feb 14, 2022 at 10:00 PM Peter Eisentraut
wrote:
> During development, I have been using the attached patch to simulate
> libc collation versions on macOS. It just uses the internal major OS
> version number. I don't know to what the extend the libc locales on
> macOS are maintained or
Eisentraut
Date: Tue, 1 Feb 2022 16:07:29 +0100
Subject: [PATCH] Collation version tracking for macOS
---
src/backend/utils/adt/pg_locale.c | 26 ++
1 file changed, 26 insertions(+)
diff --git a/src/backend/utils/adt/pg_locale.c
b/src/backend/utils/adt/pg_locale.c
index
101 - 144 of 144 matches
Mail list logo