On Sat, 2022-11-26 at 18:27 +1300, Thomas Munro wrote:
> On Thu, Nov 24, 2022 at 5:48 PM Thomas Munro <thomas.mu...@gmail.com>
> wrote:
> > On Thu, Nov 24, 2022 at 3:07 PM Jeff Davis <pg...@j-davis.com>
> > wrote:
> > > I'd vote for 1 on the grounds that it's easier to document and
> > > understand a single collation version, which comes straight from
> > > ucol_getVersion(). This approach makes it a separate problem to
> > > find
> > > the collation version among whatever libraries the admin can
> > > provide;
> > > but adding some observability into the search should mitigate any
> > > confusion.
> > 
> > OK, it sounds like I should code that up next.
> 
> Here's the first iteration.

Thank you.

Proposed changes:

* I attached a first pass of some documentation.

* Should be another GUC to turn WARNING into an ERROR. Useful at least
for testing; perhaps too dangerous for production.

* The libraries should be loaded in a more diliberate order. The "*"
should be expanded in a descending fashion so that later versions are
preferred.

* GUCs should be validated.

* Should validate that loaded library has expected version.

* We need to revise or remove pg_collation_actual_version() and
pg_database_collation_actual_version().

* The GUCs are PGC_SUSET, but don't take effect because
icu_library_list_fully_loaded is never reset.

* The extra collations you're adding at bootstrap time are named based
on the library major version. I suppose it might be more "proper" to
name them based on the collation version, but that would be more
verbose, so I won't advocate for that. Just pointing it out.

* It looks hard (or impossible) to mix multiple ICU libraries with the
same major version and different minor versions. That's because,
e.g., libicui18n.so.63.1 links against libicuuc.63 and libicudata.63,
and when you install ICU 63.2, those dependencies get clobbered with
the 63.2 versions. That fails the sanity check I proposed above about
the library version number matching the requested library version
number. And it also just seems wrong -- why would you have minor-
version precision about an ICU library but then only major-version
precision about the ICU dependencies of that library? Doesn't that
defeat the whole purpose of this naming scheme? (Maybe another ICU
bug?).

Minor comments:

* ICU_I18N is defined in make_icu_library_name() but used outside of
it. One solution might be to have it return both library names to the
caller and rename it as make_icu_library_names().

* get_icu_function() could use a clarifying comment or a better name.
Something that communicates that you are looking for the function in
the given library with the given major version number (which may or may
not be needed depending on how the library was compiled).

* typo in comment over make_icu_collator:
s/u_getVersion/ucol_getVersion/

* The return value of make_icu_collator() seems backwards to me,
stylistically. I typically see the false-is-good pattern with integer
returns.

* weird bracketing style in get_icu_collator for the "else"

>   The version rosetta stone functions look like this:
> 
> postgres=# select * from pg_icu_library_versions();
>  icu_version | unicode_version | cldr_version
> -------------+-----------------+--------------
>  67.1        | 13.0            | 37.0
>  63.1        | 11.0            | 34.0
>  57.1        | 8.0             | 29.0
> (3 rows)
> 
> postgres=# select * from pg_icu_collation_versions('zh');
>  icu_version | uca_version | collator_version
> -------------+-------------+------------------
>  67.1        | 13.0        | 153.14.37
>  63.1        | 11.0        | 153.88.34
>  57.1        | 8.0         | 153.64.29
> (3 rows)

I like these functions.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 445fd175d8..b9dba8ac67 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1047,6 +1047,50 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
     </tip>
    </sect3>
   </sect2>
+  <sect2 id="collation-versions">
+   <title>Collation Versions</title>
+
+   <para>
+    Collations are sensitive to the specific collation version, which is
+    obtained from the collation provider library at the time the collation is
+    created (and only updated with <xref linkend="sql-altercollation"/>). If
+    the collation provider library is updated on the system (e.g. due to an
+    operating system upgrade), it may provide a different collation version;
+    but the version recorded in <productname>PostgreSQL</productname> will
+    remain unchanged.
+   </para>
+   <para>
+    New collation versions are generally desirable, as they reflect changes in
+    natural language over time. But these ordering changes can also cause
+    problems, such as the inconsistency of an indexes, which often depend on a
+    stable ordering. If <productname>PostgreSQL</productname> is unable to
+    find a collation in the collation provider that matches the recorded
+    version exactly, it will emit a <literal>WARNING</literal> (configurable
+    with <xref linkend="guc-collation-version-mismatch-error"/>).
+   </para>
+   <sect3 id="multiple-icu-libraries">
+    <title>Multiple ICU collation provider libraries</title>
+    <para>
+     When using the <literal>icu</literal> collation provider,
+     <productname>PostgreSQL</productname> can load multiple ICU collation
+     provider libraries, making it possible to find an exact match for the
+     collation version even if the operating system's ICU library has been
+     upgraded and provides a new collation version.
+    </para>
+    <para>
+     To use additional ICU collation provider libraries, set <xref
+     linkend="guc-icu-library-path"/> to the directory where the ICU libraries
+     are installed, and use <xref linkend="guc-icu-library-versions"/> to
+     control how those libraries are searched.
+    </para>
+    <note>
+     <para>
+      The <literal>libc</literal> collation provider does not allow specifying
+      multiple libraries.
+     </para>
+    </note>
+   </sect3>
+  </sect2>
  </sect1>
 
  <sect1 id="multibyte">
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9fd2075b1e..3809c26b31 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10288,6 +10288,107 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
      </variablelist>
     </sect2>
 
+    <sect2 id="runtime-config-compatible-collation">
+     <title>Collation Version Compatibility</title>
+     <variablelist>
+     <varlistentry id="guc-collation-version-mismatch-error">
+      <term><varname>collation_version_mismatch_error</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>collation_version_mismatch_error</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        If <productname>PostgreSQL</productname> detects mismatched collation
+        versions, and this variable is set to <literal>true</literal>, an
+        error will be raised. If this variable is set to
+        <literal>false</literal>, a warning will be raised instead. The
+        default is <literal>false</literal>.
+       </para>
+       <para>
+        See <xref linkend="collation-versions"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-icu-library-path">
+      <term><varname>icu_library_path</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>icu_library_path</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Set to the directory where additional ICU libraries are installed, to
+        be searched for matching collation versions.
+       </para>
+       <para>
+        See <xref linkend="multiple-icu-libraries"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-icu-library-versions">
+      <term><varname>icu_library_versions</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>icu_library_versions</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        When searching for a matching collation version, search the ICU
+        libraries with the version numbers specified in this setting,
+        separated by commas. By default, only the built-in ICU library is
+        searched.
+       </para>
+       <para>
+        Library version numbers can be specified as either
+        <literal>major_version</literal> or
+        <literal>major_version.minor_version</literal>. By default, the
+        built-in ICU library is used.
+       </para>
+       <para>
+        If this variable is set to <literal>*</literal>,
+        <productname>PostgreSQL</productname> will attempt to load any ICU
+        library in <literal>icu_library_path</literal> with a compatible major
+        version.
+       </para>
+       <para>
+        See <xref linkend="multiple-icu-libraries"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-default-icu-library-version">
+      <term><varname>default_icu_library_version</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_icu_library_version</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        If <productname>PostgreSQL</productname> detects mismatched collation
+        versions, use the collation provided by the ICU library with this
+        version number.
+       </para>
+       <para>
+        Library version numbers can be specified as either
+        <literal>major_version</literal> or
+        <literal>major_version.minor_version</literal>. By default, the
+        built-in ICU library is used.
+       </para>
+       <para>
+        Has no effect if <literal>collation_version_mismatch_error</literal>
+        is set to <literal>true</literal>.
+       </para>
+       <para>
+        See <xref linkend="multiple-icu-libraries"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
     <sect2 id="runtime-config-compatible-clients">
      <title>Platform and Client Compatibility</title>
      <variablelist>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 68cd4297d2..a9f6258e77 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27180,6 +27180,39 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
         Use of this function is restricted to superusers.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_icu_library_versions</primary>
+        </indexterm>
+        <function>pg_icu_library_versions</function> ()
+        <returnvalue>setof record</returnvalue>
+        ( <parameter>icu_version</parameter> <type>text</type>,
+        <parameter>unicode_version</parameter> <type>text</type>,
+        <parameter>cldr_version</parameter> <type>text</type>) )
+       </para>
+       <para>
+        Returns information from each available ICU library.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_icu_collation_versions</primary>
+        </indexterm>
+        <function>pg_icu_collation_versions</function> ( <parameter>locale</parameter> <type>text</type> )
+        <returnvalue>setof record</returnvalue>
+        (<parameter>icu_version</parameter> <type>text</type>,
+        <parameter>uca_version</parameter> <type>text</type>,
+        <parameter>collator_version</parameter> <type>text</type> )
+       </para>
+       <para>
+        Returns the collation version of the given locale from each available
+        ICU library.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>

Reply via email to