varstr_sortsupport() only allows Windows to use SortSupport with a non-C-locale (when the server encoding happens to be UTF-8, which I assume is the common case). This is because we (quite reasonably) don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion hack from varstr_cmp() for the SortSupport authoritative comparator (varstrfastcmp_locale() shouldn't get its own copy of this kludge, because it's supposed to be "fast"). This broad restriction made sense when Windows + UTF-8 + non-C-locale necessarily required the aforementioned UTF-16 conversion kludge. However, iff an ICU locale is in use on Windows (or any other platform), then we can always use SortSupport, regardless of anything else (we should not have the core code install a fmgr comparison shim that just calls varstr_cmp(), though we still do). We don't actually need the UTF-16 kludge at all, so we can use SortSupport without any special care.
The current state of affairs doesn't make any sense, AFAICT, and so the restriction should be removed on general principle: we *already* expect ICU to have no restrictions that are peculiar to Windows, as we see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on to this restriction. This restriction also seems worth fixing because Windows users are generally more likely to want to use ICU locales; most of them would otherwise end up actually paying the overhead for the UTF-16 kludge. (Presumably the UTF-16 conversion makes text sorting *even slower* than it would be if we merely didn't do SortSupport, which is to say: very slow indeed.) In summary, we're currently attaching the use of SortSupport to the wrong thing. We're treating this UTF-16 business as something that implies a broad OS/platform restriction, when in fact it should be treated as implying a restriction for one particular collation provider only (a collation provider that happens to be built into Windows, but isn't really special to us). Attached patch shows what I'm getting at. This is untested, since I don't use Windows. Proceed with caution. On a related note, am I the only one that finds it questionable that str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself contains an "#ifdef USE_ICU" block? It seems like those two things might get conflated on some platforms. We don't want lower() to ever not use the ICU infrastructure when an ICU collation is used, and yet it's not obvious that that's impossible. I understand that the code in regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER facilities, and that that was always accepted as legacy that ICU had to live with. Maybe a static assertion is all that we need here (ICU builds must also be USE_WIDE_UPPER_LOWER builds). -- Peter Geoghegan
From fca07aca51fb7979f19180712707233ff0e6f4b4 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <p...@bowt.ie> Date: Sat, 16 Sep 2017 13:36:25 -0700 Subject: [PATCH] Allow ICU to use SortSupport on Windows with UTF-8. There is no reason to ever prevent the use of SortSupport on Windows when ICU locales are used. We previously avoided SortSupport on Windows with UTF-8 server encoding and a non C-locale due to restrictions in Windows' libc functionality. This is now considered to be a restriction in one platform's libc collation provider, and not a more general platform restriction. --- src/backend/utils/adt/varlena.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index ebfb823..ad8ebed 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -1828,7 +1828,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar) * UTF-8 and we are not using the C collation, complex hacks are required. * We don't currently have a comparator that handles that case, so we fall * back on the slow method of having the sort code invoke bttextcmp() (in - * the case of text) via the fmgr trampoline. + * the case of text) via the fmgr trampoline. (ICU locales work just the + * same on Windows, however.) */ if (lc_collate_is_c(collid)) { @@ -1840,7 +1841,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar) collate_c = true; } #ifdef WIN32 - else if (GetDatabaseEncoding() == PG_UTF8) + else if (GetDatabaseEncoding() == PG_UTF8 && + !(locale && locale->provider == COLLPROVIDER_ICU)) return; #endif else -- 2.7.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers