> On Jun 8, 2022, at 22:40, Peter Geoghegan <p...@bowt.ie> wrote:
> 
> On Wed, Jun 8, 2022 at 10:24 PM Jeremy Schneider
> <schnei...@ardentperf.com> wrote:
>> Even if PG supports two versions of ICU, how does someone actually go about 
>> removing every dependency on the old version and replacing it with the new?
> 
> They simply REINDEX, without changing anything. The details are still
> fuzzy, but at least that's what I was thinking of.
> 
>> Can it be done without downtime? Can it be done without modifying a running 
>> application?
> 
> Clearly the only way that we can ever transition to a new "physical
> collation" is by reindexing using a newer ICU version. And clearly
> there is going to be a need to fully deprecate any legacy version of
> ICU on a long enough timeline. There is just no getting around that.


I’m probably just going to end up rehashing the old threads I haven’t read yet…

One challenge with this approach is you have things like sort-merge joins that 
require the same collation across multiple objects. So I think you’d need to 
keep all the old indexes around until you have new indexes available for all 
objects in a database, and somehow the planner would need to be smart enough to 
dynamically figure out old vs new versions on a query-by-query basis. May need 
an atomic database-wide cutover; running a DB with internally mixed collation 
versions doesn’t seem like a small challenge. It would require enough disk 
space for two copies of all indexes, and queries would change which indexes 
they use in a way that wouldn’t be immediately obvious to users or app dev. 
Suddenly switching to or from a differently-bloated index could result in 
confusing and sudden performance changes.

Also there would still need to be a plan to address all the other non-index 
objects where collation is used, as has been mentioned before.

And given the current architecture, that final “alter database update default 
collation” command still seems awful risky, bug-prone and difficult to get 
correct. At least it seems that way to me.

At a minimum, this is a very big project and it seems to me like it may be wise 
to get more end-to-end fleshing out of the plans before committing incremental 
pieces in core (which could end up being misguided if the plan doesn’t work as 
well as assumed). Definitely doesn’t seem to me like anything that will happen 
in a year or two.

And my opinion is that the problems caused by depending on OS libraries for 
collation need to be addressed on a shorter timeline than what’s realistic for 
inventing a new way for a relational database to offer transparent or online 
upgrades of linguistic collation versions.

Also I still think folks are overcomplicating this by focusing on linguistic 
collation as the solution. Like 1% of users actually need or care about having 
the latest technically correct local-language-based sorting, at a database 
level. MySQL did the right thing here by doing what every other RDBMS did, and 
just making a simple “good-enough” collation hardcoded in the DB, same across 
all platforms, that never changes.

The 1% of users who need true linguistic collation can probably deal with the 
trade-off of dump-and-load upgrades for their ICU indexes and databases for a 
few more years.

-Jeremy


Sent from my TI-83



Reply via email to