Re: postgresql: libc collation issue, linking with ICU
Jeremie Courreges-Anglas - Thu, 12 December 2019 at 11:35:51 > Can you actually use ICU as the default collation algorithm used by > a database? it's not totally straightforward but yes, on the schema level it's possible to override collation: macos=# CREATE TABLE t (n text COLLATE "fr-FR-x-icu"); CREATE TABLE macos=# INSERT INTO t (values ('bernard'),('bérénice'),('béatrice'),('boris')); INSERT 0 4 macos=# SELECT * FROM t ORDER BY n; n -- béatrice bérénice bernard boris (4 rows) macos=# show lc_collate; lc_collate C (1 row) macos=# \d t Table "public.t" Column | Type | Collation | Nullable | Default +--+-+--+- n | text | fr-FR-x-icu | | however `CREATE DATABASE` and `initdb` does not support this. it's WIP: https://www.postgresql-archive.org/ICU-for-global-collation-td6099973.html it is far from ideal but at least having the option to override the collation on both schema and/or individual query level means a working sorting. this is a good article when ICU was introduced: https://www.2ndquadrant.com/en/blog/icu-support-postgresql-10/ (...i wonder what's the actual situation with mariadb...) -f -- tower: "say position." pilot: "position."
Re: postgresql: libc collation issue, linking with ICU
On Thu, Dec 12 2019, "f.holop" wrote: > Landry Breuil - Thu, 12 December 2019 at 08:51:49 >> On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote: >> > >> > +cc maintainer >> > >> > This has bugged me for some time, I think enabling ICU makes sense. >> > Here's a wip diff. I fear it might cause issues with existing >> > databases. Real world tests would probably help. >> >> I doubt it can have side effects on existing databases as they all have >> a locale (and potential collations) configured during initdb, and adding >> the possibility to use more locales/collations shouldnt affect existing >> ones. Of course, to be tested in the real world :) > > initdb just sets "default" collation/ctype for `template1`. when > collation/ctype needs to be different, `CREATE DATABASE` must use the > template `template0`. > > Collation cannot be changed on a database after it has been created as > it affects the index creation as well. Can you actually use ICU as the default collation algorithm used by a database? Sadly using ICU doesn't seem to be the silver bullet I was hoping for. postgres on Linux distros still defaults to the libc collation backend. You can't just push a button so that ICU is used by default[0]. [0] https://www.postgresql.org/message-id/flat/3366.1498183854%40sss.pgh.pa.us#3366.1498183...@sss.pgh.pa.us So right now it seems that adding ICU might help a few users who can tweak their schemas for their specific needs. That's much less interesting than what I expected. -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: postgresql: libc collation issue, linking with ICU
Landry Breuil - Thu, 12 December 2019 at 08:51:49 > On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote: > > > > +cc maintainer > > > > This has bugged me for some time, I think enabling ICU makes sense. > > Here's a wip diff. I fear it might cause issues with existing > > databases. Real world tests would probably help. > > I doubt it can have side effects on existing databases as they all have > a locale (and potential collations) configured during initdb, and adding > the possibility to use more locales/collations shouldnt affect existing > ones. Of course, to be tested in the real world :) initdb just sets "default" collation/ctype for `template1`. when collation/ctype needs to be different, `CREATE DATABASE` must use the template `template0`. Collation cannot be changed on a database after it has been created as it affects the index creation as well. -f -- why is the alphabet in that order? is it because of that song?
Re: postgresql: libc collation issue, linking with ICU
On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote: > > +cc maintainer > > This has bugged me for some time, I think enabling ICU makes sense. > Here's a wip diff. I fear it might cause issues with existing > databases. Real world tests would probably help. I doubt it can have side effects on existing databases as they all have a locale (and potential collations) configured during initdb, and adding the possibility to use more locales/collations shouldnt affect existing ones. Of course, to be tested in the real world :)
Re: postgresql: libc collation issue, linking with ICU
On Wed, Dec 11, 2019 at 4:50 PM Jeremie Courreges-Anglas wrote: > > +cc maintainer > > This has bugged me for some time, I think enabling ICU makes sense. > Here's a wip diff. I fear it might cause issues with existing > databases. Real world tests would probably help. > I would prefer that we do not make any non-security related changes to PostgreSQL until after the PostgreSQL 12 update is committed. After that, we can do some testing with ICU and see what the effect is. Thanks, Jeremy
Re: postgresql: libc collation issue, linking with ICU
+cc maintainer This has bugged me for some time, I think enabling ICU makes sense. Here's a wip diff. I fear it might cause issues with existing databases. Real world tests would probably help. FWIW I would also like to enable DEBUG_PACKAGES and investigate whether --disable-thread-safety really is appropriate. Index: Makefile === RCS file: /cvs/ports/databases/postgresql/Makefile,v retrieving revision 1.257 diff -u -p -r1.257 Makefile --- Makefile15 Nov 2019 13:19:10 - 1.257 +++ Makefile12 Dec 2019 00:40:41 - @@ -8,6 +8,7 @@ COMMENT-plpython=Python procedural langu COMMENT-pg_upgrade=Support for upgrading PostgreSQL data from previous version VERSION= 11.6 +REVISION= 0 PREV_MAJOR=10 DISTNAME= postgresql-${VERSION} PKGNAME-main= postgresql-client-${VERSION} @@ -24,7 +25,7 @@ SHARED_LIBS= ecpg7.9 \ pgtypes 4.8 \ pq 6.10 -HOMEPAGE= http://www.postgresql.org/ +HOMEPAGE= https://www.postgresql.org/ MAINTAINER=Pierre-Emmanuel Andre @@ -68,6 +69,7 @@ CONFIGURE_ARGS= --disable-rpath --with-o --with-system-tzdata="/usr/share/zoneinfo" \ --with-openssl \ --with-libxml \ + --with-icu \ --disable-thread-safety # There is no spinlock support for alpha or hppa yet. Until we have access to @@ -100,8 +102,9 @@ WANTLIB-main = ${WANTLIB} xml2 LIB_DEPENDS-server= databases/postgresql=${VERSION} \ - ${LIB_DEPENDS-main} -WANTLIB-server=${WANTLIB-main} perl pq + ${LIB_DEPENDS-main} \ + textproc/icu4c +WANTLIB-server=${WANTLIB-main} icudata icui18n icuuc perl pq RUN_DEPENDS-contrib= databases/postgresql,-server=${VERSION} LIB_DEPENDS-contrib= databases/postgresql=${VERSION} \ @@ -117,7 +120,7 @@ WANTLIB-pg_upgrade =${WANTLIB-main} pq LIB_DEPENDS-plpython= ${MODPY_LIB_DEPENDS} WANTLIB-plpython = c m pthread util \ - ${MODPY_WANTLIB} + intl ${MODPY_WANTLIB} RUN_DEPENDS-plpython= databases/postgresql,-server=${VERSION} WANTLIB-docs= Index: pkg/PLIST-contrib === RCS file: /cvs/ports/databases/postgresql/pkg/PLIST-contrib,v retrieving revision 1.20 diff -u -p -r1.20 PLIST-contrib --- pkg/PLIST-contrib 19 Feb 2019 05:09:18 - 1.20 +++ pkg/PLIST-contrib 12 Dec 2019 00:40:41 - @@ -9,51 +9,51 @@ @bin bin/pg_test_timing @bin bin/pgbench @bin bin/vacuumlo -lib/postgresql/_int.so -lib/postgresql/adminpack.so -lib/postgresql/amcheck.so -lib/postgresql/auth_delay.so -lib/postgresql/auto_explain.so -lib/postgresql/autoinc.so -lib/postgresql/bloom.so -lib/postgresql/btree_gin.so -lib/postgresql/btree_gist.so -lib/postgresql/citext.so -lib/postgresql/cube.so -lib/postgresql/dblink.so -lib/postgresql/earthdistance.so -lib/postgresql/file_fdw.so -lib/postgresql/fuzzystrmatch.so -lib/postgresql/hstore.so -lib/postgresql/hstore_plperl.so -lib/postgresql/insert_username.so -lib/postgresql/isn.so -lib/postgresql/lo.so -lib/postgresql/ltree.so -lib/postgresql/moddatetime.so -lib/postgresql/pageinspect.so -lib/postgresql/passwordcheck.so -lib/postgresql/pg_buffercache.so -lib/postgresql/pg_freespacemap.so -lib/postgresql/pg_prewarm.so -lib/postgresql/pg_stat_statements.so -lib/postgresql/pg_trgm.so -lib/postgresql/pg_visibility.so -lib/postgresql/pgcrypto.so -lib/postgresql/pgrowlocks.so -lib/postgresql/pgstattuple.so -lib/postgresql/pgxml.so -lib/postgresql/postgres_fdw.so -lib/postgresql/refint.so -lib/postgresql/seg.so -lib/postgresql/sslinfo.so -lib/postgresql/tablefunc.so -lib/postgresql/tcn.so -lib/postgresql/timetravel.so -lib/postgresql/tsm_system_rows.so -lib/postgresql/tsm_system_time.so -lib/postgresql/unaccent.so -lib/postgresql/uuid-ossp.so +@so lib/postgresql/_int.so +@so lib/postgresql/adminpack.so +@so lib/postgresql/amcheck.so +@so lib/postgresql/auth_delay.so +@so lib/postgresql/auto_explain.so +@so lib/postgresql/autoinc.so +@so lib/postgresql/bloom.so +@so lib/postgresql/btree_gin.so +@so lib/postgresql/btree_gist.so +@so lib/postgresql/citext.so +@so lib/postgresql/cube.so +@so lib/postgresql/dblink.so +@so lib/postgresql/earthdistance.so +@so lib/postgresql/file_fdw.so +@so lib/postgresql/fuzzystrmatch.so +@so lib/postgresql/hstore.so +@so lib/postgresql/hstore_plperl.so +@so lib/postgresql/insert_username.so +@so lib/postgresql/isn.so +@so lib/postgresql/lo.so +@so lib/postgresql/ltree.so +@so lib/postgresql/moddatetime.so +@so lib/postgresql/pageinspect.so +@so lib/postgresql/passwordcheck.so +@so lib/postgresql/pg_buffercache.so +@so lib/postgresql/pg_freespacemap.so +@so lib/postgresql/pg_prewarm.so +@so lib/postgresql/pg_stat_statements.so +@so lib/postgresql/pg_trgm.so +@so lib/postgresql/pg_visibility.so +@so
Re: postgresql: libc collation issue, linking with ICU
Ingo Schwarze - Wed, 11 December 2019 at 20:46:05 > Hi, > > Stuart Henderson wrote on Wed, Dec 11, 2019 at 07:19:16PM +: > > On 2019/12/11 19:57, f.holop wrote: > >> Ingo Schwarze - Wed, 11 December 2019 at 18:42:35 > > i have noticed that libc collation on OpenBSD is broken (also on macos) > :( > > >>> It is intentional that OpenBSD does not support collation for locales > >>> other than "C" in libc, and i'm not aware of any developer who might > >>> have plans to add it in the future, not even in the long term. Even > > >> sounds like all the more reason to build postgresql with ICU. > > > Nothing in Ingo's mail contradicts that. > > Right. it seems that we all agree.. as long as there is a strong case not to put locale aware collation into libc, using ICU with postgresql becomes more or less mandatory as there's not much use of a database server that cannot sort it's content... -f -- all computers wait at the same speed.
Re: postgresql: libc collation issue, linking with ICU
Hi, Stuart Henderson wrote on Wed, Dec 11, 2019 at 07:19:16PM +: > On 2019/12/11 19:57, f.holop wrote: >> Ingo Schwarze - Wed, 11 December 2019 at 18:42:35 i have noticed that libc collation on OpenBSD is broken (also on macos) :( >>> It is intentional that OpenBSD does not support collation for locales >>> other than "C" in libc, and i'm not aware of any developer who might >>> have plans to add it in the future, not even in the long term. Even >> sounds like all the more reason to build postgresql with ICU. > Nothing in Ingo's mail contradicts that. Right. The reason i didn't comment on the suggestion to make postgresql depend on ICU is that i'm using postgres quite rarely and cannot contribute meaningful input to the tradeoff between another dependency of medium weight vs. the added functionality. *If* locale-dependent collation is considered important for the postgres database, then depending on ICU would agree well with my above reasoning, unless it causes other woes - the port is not a very easy and straightforward one and i have no idea which technical consequences adding that dependency might entail. Yours, Ingo
Re: postgresql: libc collation issue, linking with ICU
On 2019/12/11 19:57, f.holop wrote: > Ingo Schwarze - Wed, 11 December 2019 at 18:42:35 > > > i have noticed that libc collation on OpenBSD is broken (also on macos) :( > > > > It is intentional that OpenBSD does not support collation for locales > > other than "C" in libc, and i'm not aware of any developer who might > > have plans to add it in the future, not even in the long term. Even > > sounds like all the more reason to build postgresql with ICU. Nothing in Ingo's mail contradicts that.
Re: postgresql: libc collation issue, linking with ICU
Ingo Schwarze - Wed, 11 December 2019 at 18:42:35 > > i have noticed that libc collation on OpenBSD is broken (also on macos) :( > > It is intentional that OpenBSD does not support collation for locales > other than "C" in libc, and i'm not aware of any developer who might > have plans to add it in the future, not even in the long term. Even sounds like all the more reason to build postgresql with ICU. -f -- my desk is a final proof of the chaos theory
Re: postgresql: libc collation issue, linking with ICU
Hi, f.holop wrote on Tue, Dec 10, 2019 at 05:07:07PM +0100: > i have noticed that libc collation on OpenBSD is broken (also on macos) :( It is intentional that OpenBSD does not support collation for locales other than "C" in libc, and i'm not aware of any developer who might have plans to add it in the future, not even in the long term. Even the people who added collation support to the FreeBSD libc some years ago have been cursing loudly about it, even though FreeBSD is often quite ready to add bloat to their system. My personal opinion is that libc is a general-purpose programming and operating system support library, whereas locale-specific collation is a highly specialiced niche task that requires extremely complicated code, much too large and ugly for a library like libc. There is a lot of merit in keeping libc lean and simple, not least to help correctness, robustness, and security. For the intended behaviour of OpenBSD, please read: * locale(1) * setlocale(3), including the CAVEATS in that page * strcoll(3) * wcscoll(3) Yours, Ingo
postgresql: libc collation issue, linking with ICU
hello, i have noticed that libc collation on OpenBSD is broken (also on macos) :( openbsd=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr_FR" n -- bernard boris béatrice bérénice (4 rows) macos=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr_FR"; n -- bernard boris béatrice bérénice (4 rows) linux=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr_FR" n béatrice bérénice bernard boris (4 rows) postgres supports ICU and this guarantees the same results on every platform. both macos and linux link against it, i think it would be a good addition to the openbsd port as well... macos=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr-FR-x-icu"; n -- béatrice bérénice bernard boris (4 rows) macos=# SELECT * FROM pg_collation; ... (861 rows) linux=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr-FR-x-icu"; n béatrice bérénice bernard boris (4 rows) linux=# SELECT * FROM pg_collation; ... (1142 rows) openbsd=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate "fr-FR-x-icu"; ERROR: collation "fr-FR-x-icu" for encoding "UTF8" does not exist LINE 1: ...nice'),('béatrice'),('boris')) AS l(n) order by n collate "f... ^ openbsd=# SELECT * FROM pg_collation; ... (134 rows) not sure if related but sort(1) is similarly confused (also on macos): $ echo $LC_CTYPE en_US.UTF-8 $ cat n bernard boris béatrice bérénice $ sort n bernard boris béatrice bérénice -f -- history doesn't repeat itself. historians do.