Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

Peter Eisentraut Tue, 15 Aug 2017 11:35:19 -0700

On 8/9/17 18:49, Peter Geoghegan wrote:
> I'd like to give a demo on what is already possible, but not currently
> documented. I didn't see anyone else comment on this, including Peter
> E (maybe I missed that?). We should improve the documentation in this
> area, to get this into the hands of users.


Here is a small piece of documentation.  Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From a9d5926b68eb6e0e726b7c9838f6ea8b3b22a157 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Tue, 15 Aug 2017 14:31:39 -0400
Subject: [PATCH] doc: Document TR 35 collation options for ICU

---
 doc/src/sgml/charset.sgml | 52 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 48ecfc5f48..7bb645a39f 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -709,6 +709,58 @@ <title>ICU collations</title>
     will draw an error along the lines of <quote>collation "de-x-icu" for
     encoding "WIN874" does not exist</>.
    </para>
+
+   <para>
+    ICU allows collations to be customized beyond the basic
+    language/country/type set that is preloaded by <command>initdb</command>.
+    Users are encouraged to define their own collation objects that make use
+    of these facilities to suit the sorting behavior to their requirements.
+    Here are some examples:
+    <variablelist>
+     <varlistentry>
+      <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 
'en-u-kr-latn-digit')</literal></term>
+      <listitem>
+       <para>
+        Sort digits after letters.  (The default is digits before letters.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 
'en-u-kf-upper')</literal></term>
+      <listitem>
+       <para>
+        Sort upper-case letters before lower-case letters.  (The default is
+        lower-case letters first.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><literal>CREATE COLLATION special (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit')</literal></term>
+      <listitem>
+       <para>
+        Combines both of the above options.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    </variablelist>
+
+    See <ulink 
url="http://unicode.org/reports/tr35/tr35-collation.html";>Unicode
+    Technical Standard #35</ulink>
+    and <ulink url="https://tools.ietf.org/html/bcp47";>BCP 47</ulink> for
+    details.
+   </para>
+
+   <para>
+    Note that while this system allows creating collations that <quote>ignore
+    case</quote> or <quote>ignore accents</quote> or similar (using
+    the <literal>ks</literal> key), PostgreSQL does not at the moment allow
+    such collations to act in a truly case- or accent-insensitive manner.  Any
+    strings that compare equal according to the collation but are not
+    byte-wise equal will be sorted according to their byte values.
+   </para>
    </sect4>
    </sect3>
 
-- 
2.14.1

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

Reply via email to