Re: postgresql: libc collation issue, linking with ICU

2019-12-12 Thread f.holop
Jeremie Courreges-Anglas - Thu, 12 December 2019 at 11:35:51
> Can you actually use ICU as the default collation algorithm used by
> a database?

it's not totally straightforward but yes, on the schema level it's
possible to override collation:

macos=# CREATE TABLE t (n text COLLATE "fr-FR-x-icu");
CREATE TABLE

macos=# INSERT INTO t (values ('bernard'),('bérénice'),('béatrice'),('boris'));
INSERT 0 4

macos=# SELECT * FROM t ORDER BY n;
n
--
 béatrice
 bérénice
 bernard
 boris
(4 rows)

macos=# show lc_collate;
 lc_collate

 C
(1 row)

macos=# \d t
 Table "public.t"
 Column | Type |  Collation  | Nullable | Default
+--+-+--+-
 n  | text | fr-FR-x-icu |  |


however `CREATE DATABASE` and `initdb` does not support this.  it's WIP:
https://www.postgresql-archive.org/ICU-for-global-collation-td6099973.html


it is far from ideal but at least having the option to override the
collation on both schema and/or individual query level means a working
sorting.

this is a good article when ICU was introduced:
https://www.2ndquadrant.com/en/blog/icu-support-postgresql-10/

(...i wonder what's the actual situation with mariadb...)

-f
-- 
tower: "say position."  pilot: "position."



Re: postgresql: libc collation issue, linking with ICU

2019-12-12 Thread Jeremie Courreges-Anglas
On Thu, Dec 12 2019, "f.holop"  wrote:
> Landry Breuil - Thu, 12 December 2019 at 08:51:49
>> On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote:
>> > 
>> > +cc maintainer
>> > 
>> > This has bugged me for some time, I think enabling ICU makes sense.
>> > Here's a wip diff.  I fear it might cause issues with existing
>> > databases.  Real world tests would probably help.
>> 
>> I doubt it can have side effects on existing databases as they all have
>> a locale (and potential collations) configured during initdb, and adding
>> the possibility to use more locales/collations shouldnt affect existing
>> ones. Of course, to be tested in the real world :)
>
> initdb just sets "default" collation/ctype for `template1`.  when
> collation/ctype needs to be different, `CREATE DATABASE` must use the
> template `template0`.
>
> Collation cannot be changed on a database after it has been created as
> it affects the index creation as well.

Can you actually use ICU as the default collation algorithm used by
a database?

Sadly using ICU doesn't seem to be the silver bullet I was hoping for.
postgres on Linux distros still defaults to the libc collation backend.
You can't just push a button so that ICU is used by default[0].

[0] 
https://www.postgresql.org/message-id/flat/3366.1498183854%40sss.pgh.pa.us#3366.1498183...@sss.pgh.pa.us

So right now it seems that adding ICU might help a few users who can
tweak their schemas for their specific needs.  That's much less
interesting than what I expected.

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: postgresql: libc collation issue, linking with ICU

2019-12-12 Thread f.holop
Landry Breuil - Thu, 12 December 2019 at 08:51:49
> On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote:
> > 
> > +cc maintainer
> > 
> > This has bugged me for some time, I think enabling ICU makes sense.
> > Here's a wip diff.  I fear it might cause issues with existing
> > databases.  Real world tests would probably help.
> 
> I doubt it can have side effects on existing databases as they all have
> a locale (and potential collations) configured during initdb, and adding
> the possibility to use more locales/collations shouldnt affect existing
> ones. Of course, to be tested in the real world :)

initdb just sets "default" collation/ctype for `template1`.  when
collation/ctype needs to be different, `CREATE DATABASE` must use the
template `template0`.

Collation cannot be changed on a database after it has been created as
it affects the index creation as well.

-f
-- 
why is the alphabet in that order?  is it because of that song?



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Landry Breuil
On Thu, Dec 12, 2019 at 01:47:25AM +0100, Jeremie Courreges-Anglas wrote:
> 
> +cc maintainer
> 
> This has bugged me for some time, I think enabling ICU makes sense.
> Here's a wip diff.  I fear it might cause issues with existing
> databases.  Real world tests would probably help.

I doubt it can have side effects on existing databases as they all have
a locale (and potential collations) configured during initdb, and adding
the possibility to use more locales/collations shouldnt affect existing
ones. Of course, to be tested in the real world :)



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Jeremy Evans
On Wed, Dec 11, 2019 at 4:50 PM Jeremie Courreges-Anglas 
wrote:

>
> +cc maintainer
>
> This has bugged me for some time, I think enabling ICU makes sense.
> Here's a wip diff.  I fear it might cause issues with existing
> databases.  Real world tests would probably help.
>

I would prefer that we do not make any non-security related changes to
PostgreSQL until after the PostgreSQL 12 update is committed.  After that,
we can do some testing with ICU and see what the effect is.

Thanks,
Jeremy


Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Jeremie Courreges-Anglas


+cc maintainer

This has bugged me for some time, I think enabling ICU makes sense.
Here's a wip diff.  I fear it might cause issues with existing
databases.  Real world tests would probably help.

FWIW I would also like to enable DEBUG_PACKAGES and investigate
whether --disable-thread-safety really is appropriate.


Index: Makefile
===
RCS file: /cvs/ports/databases/postgresql/Makefile,v
retrieving revision 1.257
diff -u -p -r1.257 Makefile
--- Makefile15 Nov 2019 13:19:10 -  1.257
+++ Makefile12 Dec 2019 00:40:41 -
@@ -8,6 +8,7 @@ COMMENT-plpython=Python procedural langu
 COMMENT-pg_upgrade=Support for upgrading PostgreSQL data from previous version
 
 VERSION=   11.6
+REVISION=  0
 PREV_MAJOR=10
 DISTNAME=  postgresql-${VERSION}
 PKGNAME-main=  postgresql-client-${VERSION}
@@ -24,7 +25,7 @@ SHARED_LIBS=  ecpg7.9 \
pgtypes 4.8 \
pq  6.10
 
-HOMEPAGE=  http://www.postgresql.org/
+HOMEPAGE=  https://www.postgresql.org/
 
 MAINTAINER=Pierre-Emmanuel Andre 
 
@@ -68,6 +69,7 @@ CONFIGURE_ARGS=   --disable-rpath --with-o
--with-system-tzdata="/usr/share/zoneinfo" \
--with-openssl \
--with-libxml \
+   --with-icu \
--disable-thread-safety
 
 # There is no spinlock support for alpha or hppa yet. Until we have access to
@@ -100,8 +102,9 @@ WANTLIB-main =  ${WANTLIB} xml2
 
 
 LIB_DEPENDS-server= databases/postgresql=${VERSION} \
-   ${LIB_DEPENDS-main}
-WANTLIB-server=${WANTLIB-main} perl pq
+   ${LIB_DEPENDS-main} \
+   textproc/icu4c
+WANTLIB-server=${WANTLIB-main} icudata icui18n icuuc perl pq
 
 RUN_DEPENDS-contrib= databases/postgresql,-server=${VERSION}
 LIB_DEPENDS-contrib= databases/postgresql=${VERSION} \
@@ -117,7 +120,7 @@ WANTLIB-pg_upgrade =${WANTLIB-main} pq
 
 LIB_DEPENDS-plpython=  ${MODPY_LIB_DEPENDS}
 WANTLIB-plpython = c m pthread util \
-   ${MODPY_WANTLIB}
+   intl ${MODPY_WANTLIB}
 RUN_DEPENDS-plpython=  databases/postgresql,-server=${VERSION}
 
 WANTLIB-docs=
Index: pkg/PLIST-contrib
===
RCS file: /cvs/ports/databases/postgresql/pkg/PLIST-contrib,v
retrieving revision 1.20
diff -u -p -r1.20 PLIST-contrib
--- pkg/PLIST-contrib   19 Feb 2019 05:09:18 -  1.20
+++ pkg/PLIST-contrib   12 Dec 2019 00:40:41 -
@@ -9,51 +9,51 @@
 @bin bin/pg_test_timing
 @bin bin/pgbench
 @bin bin/vacuumlo
-lib/postgresql/_int.so
-lib/postgresql/adminpack.so
-lib/postgresql/amcheck.so
-lib/postgresql/auth_delay.so
-lib/postgresql/auto_explain.so
-lib/postgresql/autoinc.so
-lib/postgresql/bloom.so
-lib/postgresql/btree_gin.so
-lib/postgresql/btree_gist.so
-lib/postgresql/citext.so
-lib/postgresql/cube.so
-lib/postgresql/dblink.so
-lib/postgresql/earthdistance.so
-lib/postgresql/file_fdw.so
-lib/postgresql/fuzzystrmatch.so
-lib/postgresql/hstore.so
-lib/postgresql/hstore_plperl.so
-lib/postgresql/insert_username.so
-lib/postgresql/isn.so
-lib/postgresql/lo.so
-lib/postgresql/ltree.so
-lib/postgresql/moddatetime.so
-lib/postgresql/pageinspect.so
-lib/postgresql/passwordcheck.so
-lib/postgresql/pg_buffercache.so
-lib/postgresql/pg_freespacemap.so
-lib/postgresql/pg_prewarm.so
-lib/postgresql/pg_stat_statements.so
-lib/postgresql/pg_trgm.so
-lib/postgresql/pg_visibility.so
-lib/postgresql/pgcrypto.so
-lib/postgresql/pgrowlocks.so
-lib/postgresql/pgstattuple.so
-lib/postgresql/pgxml.so
-lib/postgresql/postgres_fdw.so
-lib/postgresql/refint.so
-lib/postgresql/seg.so
-lib/postgresql/sslinfo.so
-lib/postgresql/tablefunc.so
-lib/postgresql/tcn.so
-lib/postgresql/timetravel.so
-lib/postgresql/tsm_system_rows.so
-lib/postgresql/tsm_system_time.so
-lib/postgresql/unaccent.so
-lib/postgresql/uuid-ossp.so
+@so lib/postgresql/_int.so
+@so lib/postgresql/adminpack.so
+@so lib/postgresql/amcheck.so
+@so lib/postgresql/auth_delay.so
+@so lib/postgresql/auto_explain.so
+@so lib/postgresql/autoinc.so
+@so lib/postgresql/bloom.so
+@so lib/postgresql/btree_gin.so
+@so lib/postgresql/btree_gist.so
+@so lib/postgresql/citext.so
+@so lib/postgresql/cube.so
+@so lib/postgresql/dblink.so
+@so lib/postgresql/earthdistance.so
+@so lib/postgresql/file_fdw.so
+@so lib/postgresql/fuzzystrmatch.so
+@so lib/postgresql/hstore.so
+@so lib/postgresql/hstore_plperl.so
+@so lib/postgresql/insert_username.so
+@so lib/postgresql/isn.so
+@so lib/postgresql/lo.so
+@so lib/postgresql/ltree.so
+@so lib/postgresql/moddatetime.so
+@so lib/postgresql/pageinspect.so
+@so lib/postgresql/passwordcheck.so
+@so lib/postgresql/pg_buffercache.so
+@so lib/postgresql/pg_freespacemap.so
+@so lib/postgresql/pg_prewarm.so
+@so lib/postgresql/pg_stat_statements.so
+@so lib/postgresql/pg_trgm.so
+@so lib/postgresql/pg_visibility.so
+@so 

Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread f.holop
Ingo Schwarze - Wed, 11 December 2019 at 20:46:05
> Hi,
> 
> Stuart Henderson wrote on Wed, Dec 11, 2019 at 07:19:16PM +:
> > On 2019/12/11 19:57, f.holop wrote:
> >> Ingo Schwarze - Wed, 11 December 2019 at 18:42:35
> 
>  i have noticed that libc collation on OpenBSD is broken (also on macos) 
>  :(
> 
> >>> It is intentional that OpenBSD does not support collation for locales
> >>> other than "C" in libc, and i'm not aware of any developer who might
> >>> have plans to add it in the future, not even in the long term.  Even
> 
> >> sounds like all the more reason to build postgresql with ICU.
> 
> > Nothing in Ingo's mail contradicts that.
> 
> Right.

it seems that we all agree..  as long as there is a strong case not to
put locale aware collation into libc, using ICU with postgresql becomes
more or less mandatory as there's not much use of a database server that
cannot sort it's content...

-f
-- 
all computers wait at the same speed.



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Ingo Schwarze
Hi,

Stuart Henderson wrote on Wed, Dec 11, 2019 at 07:19:16PM +:
> On 2019/12/11 19:57, f.holop wrote:
>> Ingo Schwarze - Wed, 11 December 2019 at 18:42:35

 i have noticed that libc collation on OpenBSD is broken (also on macos) :(

>>> It is intentional that OpenBSD does not support collation for locales
>>> other than "C" in libc, and i'm not aware of any developer who might
>>> have plans to add it in the future, not even in the long term.  Even

>> sounds like all the more reason to build postgresql with ICU.

> Nothing in Ingo's mail contradicts that.

Right.

The reason i didn't comment on the suggestion to make postgresql
depend on ICU is that i'm using postgres quite rarely and cannot
contribute meaningful input to the tradeoff between another
dependency of medium weight vs. the added functionality.

*If* locale-dependent collation is considered important for
the postgres database, then depending on ICU would agree well
with my above reasoning, unless it causes other woes - the port
is not a very easy and straightforward one and i have no idea
which technical consequences adding that dependency might entail.

Yours,
  Ingo



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Stuart Henderson
On 2019/12/11 19:57, f.holop wrote:
> Ingo Schwarze - Wed, 11 December 2019 at 18:42:35
> > > i have noticed that libc collation on OpenBSD is broken (also on macos) :(
> > 
> > It is intentional that OpenBSD does not support collation for locales
> > other than "C" in libc, and i'm not aware of any developer who might
> > have plans to add it in the future, not even in the long term.  Even
> 
> sounds like all the more reason to build postgresql with ICU.

Nothing in Ingo's mail contradicts that.



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread f.holop
Ingo Schwarze - Wed, 11 December 2019 at 18:42:35
> > i have noticed that libc collation on OpenBSD is broken (also on macos) :(
> 
> It is intentional that OpenBSD does not support collation for locales
> other than "C" in libc, and i'm not aware of any developer who might
> have plans to add it in the future, not even in the long term.  Even

sounds like all the more reason to build postgresql with ICU.

-f
-- 
my desk is a final proof of the chaos theory



Re: postgresql: libc collation issue, linking with ICU

2019-12-11 Thread Ingo Schwarze
Hi,

f.holop wrote on Tue, Dec 10, 2019 at 05:07:07PM +0100:

> i have noticed that libc collation on OpenBSD is broken (also on macos) :(

It is intentional that OpenBSD does not support collation for locales
other than "C" in libc, and i'm not aware of any developer who might
have plans to add it in the future, not even in the long term.  Even
the people who added collation support to the FreeBSD libc some
years ago have been cursing loudly about it, even though FreeBSD
is often quite ready to add bloat to their system.

My personal opinion is that libc is a general-purpose programming
and operating system support library, whereas locale-specific
collation is a highly specialiced niche task that requires
extremely complicated code, much too large and ugly for a library
like libc.  There is a lot of merit in keeping libc lean and simple,
not least to help correctness, robustness, and security.

For the intended behaviour of OpenBSD, please read:

 * locale(1)
 * setlocale(3), including the CAVEATS in that page
 * strcoll(3)
 * wcscoll(3)

Yours,
  Ingo



postgresql: libc collation issue, linking with ICU

2019-12-10 Thread f.holop
hello,

i have noticed that libc collation on OpenBSD is broken (also on macos) :(

openbsd=# select n from (values 
('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate 
"fr_FR"
n
--
 bernard
 boris
 béatrice
 bérénice
(4 rows)


macos=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) 
AS l(n) order by n collate "fr_FR";
n
--
 bernard
 boris
 béatrice
 bérénice
(4 rows)


linux=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) 
AS l(n) order by n collate "fr_FR"
 n

 béatrice
 bérénice
 bernard
 boris
(4 rows)


postgres supports ICU and this guarantees the same results on every
platform.  both macos and linux link against it, i think it would be a
good addition to the openbsd port as well...

macos=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) 
AS l(n) order by n collate "fr-FR-x-icu";
n
--
 béatrice
 bérénice
 bernard
 boris
(4 rows)

macos=# SELECT * FROM pg_collation;
...
(861 rows)


linux=# select n from (values ('bernard'),('bérénice'),('béatrice'),('boris')) 
AS l(n) order by n collate "fr-FR-x-icu";
 n

 béatrice
 bérénice
 bernard
 boris
(4 rows)

linux=# SELECT * FROM pg_collation;
...
(1142 rows)

openbsd=# select n from (values 
('bernard'),('bérénice'),('béatrice'),('boris')) AS l(n) order by n collate 
"fr-FR-x-icu";
ERROR:  collation "fr-FR-x-icu" for encoding "UTF8" does not exist
LINE 1: ...nice'),('béatrice'),('boris')) AS l(n) order by n collate "f...
 ^
openbsd=# SELECT * FROM pg_collation;
...
(134 rows)


not sure if related but sort(1) is similarly confused (also on macos):

$ echo $LC_CTYPE
en_US.UTF-8
$ cat n
bernard
boris
béatrice
bérénice
$ sort n
bernard
boris
béatrice
bérénice

-f
-- 
history doesn't repeat itself.  historians do.