On 11/28/22 14:11, Robert Haas wrote:
On Wed, Nov 23, 2022 at 12:09 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
OK.  Time for a new list of the various models we've discussed so far:

1.  search-by-collversion:  We introduce no new "library version"
concept to COLLATION and DATABASE object and little or no new syntax.

2.  lib-version-in-providers: We introduce a separate provider value
for each ICU version, for example ICU63, plus an unversioned ICU like
today.

3.  lib-version-in-attributes: We introduce daticuversion (alongside
datcollversion) and collicuversion (alongside collversion).  Similar
to the above, but it's a separate property and the provider is always
ICU.  New syntax for CREATE/ALTER COLLATION/DATABASE to set and change
ICU_VERSION.

4.  lib-version-in-locale:  "63:en" from earlier versions.  That was
mostly a strawman proposal to avoid getting bogged down in
syntax/catalogue/model change discussions while trying to prove that
dlopen would even work.  It doesn't sound like anyone really likes
this.

5.  lib-version-in-collversion:  We didn't explicitly discuss this
before, but you hinted at it: we could just use u_getVersion() in
[dat]collversion.

I'd like to vote against #3 at least in the form that's described
here. If we had three more libraries providing collations, it's likely
that they would need versioning, too. So if we add an explicit notion
of provider version, then it ought not to be specific to libicu.

+many

I think it's OK to decide that different library versions are
different providers (your option #2), or that they are the same
provider but give rise to different collations (your option #4), or
that there can be multiple version of each collation which are
distinguished by some additional provider version field (your #3 made
more generic).

I think provider and collation version are distinct concepts. The provider ('c' versus 'i' for example) determines a unique code path in the backend due to different APIs, whereas collation version is related to a specific ordering given a set of characters.


I don't really understand #1 or #5 well enough to have an educated
opinion, but I do think that #1 seems a bit magical. It hopes that the
combination of a collation name and a datcollversion will be
sufficient to find exactly one matcing collation in a list of provided
libraries. The advantage of that, as I understand it, is that if you
do something to your system that causes the number of matches to go
from one to zero, you can just throw another library on the pile and
get the number back up to one. Woohoo! But there's a part of me that
worries: what if the number goes up to two, and they're not all the
same? Probably that's something that shouldn't happen, but if it does
then I think there's kind of no way to fix it. With the other options,
if there's some way to jigger the catalog state to match what you want
to happen, you can always repair the situation somehow, because the
library to be used for each collation is explicitly specified in some
way, and you just have to get it to match what you want to have
happen.

My vote is for something like #5. The collversion should indicate a specific immutable ordering behavior.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Reply via email to