On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov <aekorot...@gmail.com> wrote: > > On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy <o.tselebrovs...@postgrespro.ru> > wrote: > > Greetings, everyone! > > One of our clients has found a difference in behaviour of initcap function > when > using different locale providers, shown below > > postgres=# create database test_db_1 locale_provider=icu locale="ru_RU.UTF-8" > template=template0; > NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8" > CREATE DATABASE > postgres=# \c test_db_1; > You are now connected to database "test_db_1" as user "postgres". > test_db_1=# select initcap('ЧиЮ А.Ю.'); > initcap > ---------- > Чию А.ю. > (1 row) > test_db_1=# select initcap('joHn d.e.'); > initcap > ----------- > John D.e. > (1 row) > postgres=# create database test_db_2 locale_provider=libc > locale="ru_RU.UTF-8" template=template0; > CREATE DATABASE > postgres=# \c test_db_2 > You are now connected to database "test_db_2" as user "postgres". > test_db_2=# select initcap('ЧиЮ А.Ю.'); > initcap > ---------- > Чию А.Ю. > (1 row) > test_db_2=# select initcap('joHn d.e.'); > initcap > ----------- > John D.E. > (1 row) > > And an easier reproduction (should work for REL_12_STABLE and up) > > postgres=# SELECT initcap('first.second' COLLATE "en-x-icu"); > initcap > -------------- > First.second > (1 row) > postgres=# SELECT initcap('first.second' COLLATE "en_US"); > initcap > -------------- > First.Second > (1 row) > > This behaviour is reproducible on REL_12_STABLE and up to master > > I don't believe that this is an erroneous behaviour, just a differing one, > hence > just a documentation change proposition > > I suggest adding a clarification that this function works differently with > libc > and ICU providers because there is a difference in what a "word" is between > them > > In libc a word is a sequence of alphanumeric characters, separated by > non-alphanumeric characters (as it is written in documentation right now) > In ICU words are divided according to Unicode® Standard Annex #29 [1] > > Similar issue was briefly discussed in [2] > > The suggested documentation patch is attached (versions for REL_13_STABLE+ and > for REL_12_STABLE only) > > [1]: https://www.unicode.org/reports/tr29/#Word_Boundaries > [2]: > https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com > > Oleg Tselebrovskiy, Postgres > Professional<v1-0001-string-functions.patch><v1-0002-string-functions-REL_12.patch> > > > I can confirm inicap works with libc and libicu as you stated. The > documentation patch looks good to me. I’ve written a commit message. The > REL_12_STABLE branch is not relevant anymore as it’s out of support. I’m > going to push this if no objections.
I'm sorry for these many messages. My email client just gone crazy. Must be fixed now. ------ Regards, Alexander Korotkov Supabase