Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-24 Thread Peter Geoghegan
On Sun, Sep 24, 2017 at 5:04 AM, Peter Eisentraut
 wrote:
> On 9/22/17 12:25, Peter Geoghegan wrote:
>> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>>  wrote:
>>> I agree.  The attached patch should do it.
>>
>> I see one small issue here: You'll now need to set ssup->comparator
>> back to NULL before you return early in the Windows' libc case. That
>> way, a shim comparator (that goes through bttextcmp(), in the case of
>> text) will be installed within FinishSortSupportFunction(). Other than
>> that, looks good to me.
>
> committed accordingly

Thanks.

I am currently working on a patch for the issues with ICU colcollate
stability as I see them. I should be able to post something later
today or tomorrow. I would appreciate it if you held off on committing
anything there until you've considered what I'll propose.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-24 Thread Peter Eisentraut
On 9/22/17 12:25, Peter Geoghegan wrote:
> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>  wrote:
>> I agree.  The attached patch should do it.
> 
> I see one small issue here: You'll now need to set ssup->comparator
> back to NULL before you return early in the Windows' libc case. That
> way, a shim comparator (that goes through bttextcmp(), in the case of
> text) will be installed within FinishSortSupportFunction(). Other than
> that, looks good to me.

committed accordingly

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
 wrote:
> I agree.  The attached patch should do it.

I see one small issue here: You'll now need to set ssup->comparator
back to NULL before you return early in the Windows' libc case. That
way, a shim comparator (that goes through bttextcmp(), in the case of
text) will be installed within FinishSortSupportFunction(). Other than
that, looks good to me.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-22 Thread Peter Eisentraut
On 9/21/17 15:21, Peter Geoghegan wrote:
> On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
>  wrote:
>>> Attached patch shows what I'm getting at. This is untested, since I
>>> don't use Windows. Proceed with caution.
>>
>> Your analysis makes sense, but the patch doesn't work, because "locale"
>> is never set before reading it.
> 
> It was just for illustrative purposes. Seems fine to actually move the
> WIN32 block down to just before the existing TRUST_STRXFRM test, since
> the locale is going to be cached and then immediately reused where we
> return immediately (Windows libc) anyway. This would also be a win for
> clarity, IMV.

I agree.  The attached patch should do it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c2c875fe447eaf65f2ba5b28e77dcc2e42016455 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 22 Sep 2017 10:23:35 -0400
Subject: [PATCH v2] Allow ICU to use SortSupport on Windows with UTF-8

There is no reason to ever prevent the use of SortSupport on Windows
when ICU locales are used.  We previously avoided SortSupport on Windows
with UTF-8 server encoding and a non C-locale due to restrictions in
Windows' libc functionality.

This is now considered to be a restriction in one platform's libc
collation provider, and not a more general platform restriction.

Reported-by: Peter Geoghegan 
---
 src/backend/utils/adt/varlena.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ebfb823fb8..071bc83378 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1823,12 +1823,6 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
 * requirements of BpChar callers.  However, if LC_COLLATE = C, we can
 * make things quite a bit faster with varstrfastcmp_c or 
bpcharfastcmp_c,
 * both of which use memcmp() rather than strcoll().
-*
-* There is a further exception on Windows.  When the database encoding 
is
-* UTF-8 and we are not using the C collation, complex hacks are 
required.
-* We don't currently have a comparator that handles that case, so we 
fall
-* back on the slow method of having the sort code invoke bttextcmp() 
(in
-* the case of text) via the fmgr trampoline.
 */
if (lc_collate_is_c(collid))
{
@@ -1839,10 +1833,6 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
 
collate_c = true;
}
-#ifdef WIN32
-   else if (GetDatabaseEncoding() == PG_UTF8)
-   return;
-#endif
else
{
ssup->comparator = varstrfastcmp_locale;
@@ -1869,6 +1859,21 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
}
}
 
+   /*
+* There is a further exception on Windows.  When the database encoding 
is
+* UTF-8 and we are not using the C collation, complex hacks are 
required.
+* We don't currently have a comparator that handles that case, so we 
fall
+* back on the slow method of having the sort code invoke bttextcmp() 
(in
+* the case of text) via the fmgr trampoline.  ICU locales work just the
+* same on Windows, however.
+*/
+#ifdef WIN32
+   if (!collate_c &&
+   GetDatabaseEncoding() == PG_UTF8 &&
+   !(locale && locale->provider == COLLPROVIDER_ICU))
+   return;
+#endif
+
/*
 * Unfortunately, it seems that abbreviation for non-C collations is
 * broken on many common platforms; testing of multiple versions of 
glibc
-- 
2.14.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch  wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

I have withdrawn this as an open item. I'm still hopeful that it can
be worked through for v10 at Peter's discretion.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
 wrote:
>> Attached patch shows what I'm getting at. This is untested, since I
>> don't use Windows. Proceed with caution.
>
> Your analysis makes sense, but the patch doesn't work, because "locale"
> is never set before reading it.

It was just for illustrative purposes. Seems fine to actually move the
WIN32 block down to just before the existing TRUST_STRXFRM test, since
the locale is going to be cached and then immediately reused where we
return immediately (Windows libc) anyway. This would also be a win for
clarity, IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Eisentraut
On 9/16/17 18:33, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

Your analysis makes sense, but the patch doesn't work, because "locale"
is never set before reading it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch  wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

Fair enough.

This is clearly an omission that was made in 41839b7ab, the commit
that added/fixed Windows support for ICU. Hopefully the omission can
be corrected for v10. I see this as a surprising behavior for users,
since ICU locales on all other platforms *will* have much faster
sorting than libc locales (often more than 3x faster). This is mostly
because ICU brings back abbreviated keys. 3x - 5x faster is more of a
qualitative difference than a quantitative one, IMHO.

That having been said, I'm certainly not going to insist on a v10 fix
for this issue.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Noah Misch
On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

This is currently a v10 open item, but I think it doesn't qualify for that
treatment.  It's merely an opportunity for optimization, albeit an
attractively-simple one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-16 Thread Peter Geoghegan
varstr_sortsupport() only allows Windows to use SortSupport with a
non-C-locale (when the server encoding happens to be UTF-8, which I
assume is the common case). This is because we (quite reasonably)
don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion
hack from varstr_cmp() for the SortSupport authoritative comparator
(varstrfastcmp_locale() shouldn't get its own copy of this kludge,
because it's supposed to be "fast"). This broad restriction made sense
when Windows + UTF-8 + non-C-locale necessarily required the
aforementioned UTF-16 conversion kludge. However, iff an ICU locale is
in use on Windows (or any other platform), then we can always use
SortSupport, regardless of anything else (we should not have the core
code install a fmgr comparison shim that just calls varstr_cmp(),
though we still do). We don't actually need the UTF-16 kludge at all,
so we can use SortSupport without any special care.

The current state of affairs doesn't make any sense, AFAICT, and so
the restriction should be removed on general principle: we *already*
expect ICU to have no restrictions that are peculiar to Windows, as we
see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on
to this restriction. This restriction also seems worth fixing because
Windows users are generally more likely to want to use ICU locales;
most of them would otherwise end up actually paying the overhead for
the UTF-16 kludge. (Presumably the UTF-16 conversion makes text
sorting *even slower* than it would be if we merely didn't do
SortSupport, which is to say: very slow indeed.)

In summary, we're currently attaching the use of SortSupport to the
wrong thing. We're treating this UTF-16 business as something that
implies a broad OS/platform restriction, when in fact it should be
treated as implying a restriction for one particular collation
provider only (a collation provider that happens to be built into
Windows, but isn't really special to us).

Attached patch shows what I'm getting at. This is untested, since I
don't use Windows. Proceed with caution.

On a related note, am I the only one that finds it questionable that
str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
contains an "#ifdef USE_ICU" block? It seems like those two things
might get conflated on some platforms. We don't want lower() to ever
not use the ICU infrastructure when an ICU collation is used, and yet
it's not obvious that that's impossible. I understand that the code in
regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
facilities, and that that was always accepted as legacy that ICU had
to live with. Maybe a static assertion is all that we need here (ICU
builds must also be USE_WIDE_UPPER_LOWER builds).

-- 
Peter Geoghegan
From fca07aca51fb7979f19180712707233ff0e6f4b4 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 16 Sep 2017 13:36:25 -0700
Subject: [PATCH] Allow ICU to use SortSupport on Windows with UTF-8.

There is no reason to ever prevent the use of SortSupport on Windows
when ICU locales are used.  We previously avoided SortSupport on Windows
with UTF-8 server encoding and a non C-locale due to restrictions in
Windows' libc functionality.

This is now considered to be a restriction in one platform's libc
collation provider, and not a more general platform restriction.
---
 src/backend/utils/adt/varlena.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ebfb823..ad8ebed 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1828,7 +1828,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 	 * UTF-8 and we are not using the C collation, complex hacks are required.
 	 * We don't currently have a comparator that handles that case, so we fall
 	 * back on the slow method of having the sort code invoke bttextcmp() (in
-	 * the case of text) via the fmgr trampoline.
+	 * the case of text) via the fmgr trampoline.  (ICU locales work just the
+	 * same on Windows, however.)
 	 */
 	if (lc_collate_is_c(collid))
 	{
@@ -1840,7 +1841,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 		collate_c = true;
 	}
 #ifdef WIN32
-	else if (GetDatabaseEncoding() == PG_UTF8)
+	else if (GetDatabaseEncoding() == PG_UTF8 &&
+			 !(locale && locale->provider == COLLPROVIDER_ICU))
 		return;
 #endif
 	else
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers