Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Aug 9, 2017 at 6:19 PM, Tsunakawa, Takayuki
>  wrote:
>> That's sad news, as I expected ICU to bring its various collation features 
>> to PostgreSQL.  I hope it will be easy to add them.

> The reason it is not easy is that text equality is based on strict
> binary equality.

Yeah.  You can change sort order all you want, but you can't easily
change the system's notion of equality.  But when people ask for "case
insensitive" collation, they generally want that too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Peter Geoghegan
On Wed, Aug 9, 2017 at 6:19 PM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
>> There are no case-insensitive collations in PostgreSQL (yet).
>
> That's sad news, as I expected ICU to bring its various collation features to 
> PostgreSQL.  I hope it will be easy to add them.

The reason it is not easy is that text equality is based on strict
binary equality. We would have to teach hash operator classes about
collations to fix this, and make text_eq() hash strxfrm() or
something. That requires special work. You can ask ICU for case
insensitivity with Postgres 10, but the strcmp() tie breaker within
varstr_cmp() will prevent it from being truly case insensitive.

>> The best explanation of the difference that I can understand is here, under
>> "Why do CJK strings sort incorrectly in Unicode?":
>>
>> https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html
>
> Thanks a lot.  MysQL seems to have many collations, doesn't it?

Well, it depends on how you define collation, which actually gets
quite complicated with ICU. You can combine certain options together
with great flexibility, it seems (see my e-mail from earlier today --
"What users can do with custom ICU collations in Postgres 10"). Let's
assume that there are 10 distinct options for each locale that each
independently affect sort order for the base collation of the locale.
I believe that there are at least 10 such options that change things,
and maybe a lot more. Theoretically, this means that there are an
absolutely enormous number of possible collations with ICU.

Now, I wouldn't *actually* say that we have many thousands of
collations with ICU, because that doesn't seem like a sensible way to
explain ICU Postgres collations. The way that we think about this from
using libc doesn't work well for ICU. Instead, I would say that we
have hundreds of locales (not collations), each of which support some
variant options (e.g., traditional Spanish sort order, alternative
Japanese sort order, pictographic emoji sorting), with some further
generic options for varying how case it handled, how numbers are
handled, and other things like that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
> There are no case-insensitive collations in PostgreSQL (yet).

That's sad news, as I expected ICU to bring its various collation features to 
PostgreSQL.  I hope it will be easy to add them.


From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Geoghegan
> This is not an answer to the question you asked, but it may interest you
> to know that ICU uses JIS X 4061 for Japanese, unlike Glibc. This will give
> more useful results when sorting Japanese.
> 
> The best explanation of the difference that I can understand is here, under
> "Why do CJK strings sort incorrectly in Unicode?":
> 
> https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html

Thanks a lot.  MysQL seems to have many collations, doesn't it?

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Peter Geoghegan
On Wed, Aug 9, 2017 at 5:38 AM, MauMau  wrote:
> I tried to find a particular collation name in pg_collation, but I
> cannot understand the naming convention after reading the following
> article.  Specifically, I want to find out whether there is some
> collation equivalent to Japanese_CI_AS in SQL Server, which means
> Japanese, case-insensitive, and accent sensitive.  Could you tell me
> how to do this?  Is there any room to improve the PG manual?

This is not an answer to the question you asked, but it may interest
you to know that ICU uses JIS X 4061 for Japanese, unlike Glibc. This
will give more useful results when sorting Japanese.

The best explanation of the difference that I can understand is here,
under "Why do CJK strings sort incorrectly in Unicode?":

https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Peter Eisentraut
On 8/9/17 08:38, MauMau wrote:
> I tried to find a particular collation name in pg_collation, but I
> cannot understand the naming convention after reading the following
> article.  Specifically, I want to find out whether there is some
> collation equivalent to Japanese_CI_AS in SQL Server, which means
> Japanese, case-insensitive, and accent sensitive.

There are no case-insensitive collations in PostgreSQL (yet).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers