Hello hackers,

I've been investigating a performance issue on Windows with recent
gettext versions (0.20.1 and later) that causes exception-heavy
workloads to run significantly slower than with gettext 0.19.8.

Starting with gettext 0.20.1, the library changed its Windows locale
handling in a way that conflicts with how PostgreSQL sets LC_MESSAGES.
The performance regression manifests when raising many exceptions:

  - gettext 0.19.8: ~32 seconds for 1M exceptions
  - gettext 0.20.1+: ~180 seconds for 1M exceptions
  - gettext 0.2x.y+: ~39 seconds for 1M exceptions

The root cause is a combination of three issues:

1. Locale format mismatch
   gettext 0.20.1+ introduced a get_lcid() function that expects Windows
locale format ("English_United States.1252") rather than POSIX format
("en_US"). This function enumerates all Windows locales (~259) until a
match is found, then uses the resulting LCID to determine the catalog path.

   PostgreSQL, however, has always used IsoLocaleName() to convert
Windows locales to POSIX format before setting LC_MESSAGES. This means
we're passing "en_US" to a function expecting "English_United States.1252".

   The enumeration doesn't find "en_US" among Windows locale names,
returns 0, and gettext falls back to its internal locale resolution
(which still works correctly - translations are not broken, just slow).

2. Missing cache on failure
   The get_lcid() function has a cache, but it only updates the cache
when found_lcid > 0 (successful lookup). Failed lookups don't update the
cache, causing the 259-locale enumeration to repeat on every gettext() call.

   This is the actual performance bug in gettext - even if we passed a
valid Windows locale format, setting lc_messages to 'C' or 'POSIX'
(common in scripts and automation) would trigger the same issue since
these aren't Windows locale names.  Please see the bug I opened with the
gettext project [1].

3. Empty string bug in early 0.2x.y
   gettext 0.20.1 introduced a setlocale_null() wrapper that returns ""
instead of NULL when setlocale() fails. This causes get_lcid("") to be
called, triggering the enumeration bug even when LC_MESSAGES is unset.


The attached patch takes a pragmatic approach: for gettext 0.20.1+, we
avoid triggering the bug by using Windows locale format instead of
calling IsoLocaleName(). This works because gettext 0.20.1+ internally
converts the Windows format back to POSIX for catalog lookups, whereas
0.19.8 and earlier need POSIX format directly.

The patch uses LIBINTL_VERSION to detect the gettext version at compile
time and adjusts behavior accordingly. When locale is NULL, empty, or
set to 'C'/'POSIX', we fall back to using the LC_CTYPE value (which is
already in Windows format and always set).

For gettext 0.19.8 and earlier, the existing IsoLocaleName() path is
retained to maintain compatibility.

I don't have automated tests for this since we'd need to test against
multiple versions of a third-party library. I'm open to suggestions if
folks think we should add something to the buildfarm or CI.

Manual testing can be done with this test case:

-- Create test table
CREATE TABLE sampletest (
    a VARCHAR,
    b VARCHAR
);

-- Insert 1 million rows with random data
INSERT INTO sampletest (a, b)
SELECT
    substr(md5(random()::text), 0, 15),
    (100000000 * random())::integer::varchar
FROM generate_series(1, 1000000);

-- Create function that converts string to float with exception handling
CREATE OR REPLACE FUNCTION toFloat(str VARCHAR, val REAL)
RETURNS REAL AS $$
BEGIN
    RETURN CASE
        WHEN str IS NULL THEN val
        ELSE str::REAL
    END;
EXCEPTION
    WHEN OTHERS THEN
        RETURN val;
END;
$$ LANGUAGE plpgsql
   COST 1
   IMMUTABLE;

-- Test query to trigger 1M exceptions
-- (all conversions will fail since we inserted random MD5 strings)
\timing on
SELECT MAX(toFloat(a, NULL)) FROM sampletest;

The ~8 second difference is due to the initial enumeration and other
coding changes that were made by gettext.  Keep in mind that for 1M
exceptions we are probably calling gettext 2-3 million times.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

[1] https://savannah.gnu.org/bugs/?67781
From be63384836855ae5356362793f51f0ffdd554537 Mon Sep 17 00:00:00 2001
From: Bryan Green <[email protected]>
Date: Tue, 9 Dec 2025 18:21:45 -0600
Subject: [PATCH v1] Avoid gettext 0.20+ performance bug on Windows.

gettext 0.20.1+ expects Windows locale format ("English_United States")
not POSIX format ("en_US"), and has a cache bug where failed lookups
cause repeated enumeration of all ~259 system locales on every gettext()
call.  This makes exception-heavy workloads 5-6x slower.

PostgreSQL has always converted to POSIX format via IsoLocaleName()
before setting LC_MESSAGES, which triggers this bug.  Setting lc_messages
to 'C' or 'POSIX' triggers it too, since these aren't Windows names.

Fix by using Windows format for gettext 0.20.1+, which handles it
correctly.  Retain POSIX format for 0.19.8 and earlier.  Detect version
via LIBINTL_VERSION macro.

Improves 1M exception test from ~180s to ~40s.
---
 src/backend/utils/adt/pg_locale.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index b26257c0a8..eb8025438a 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -230,10 +230,19 @@ pg_perm_setlocale(int category, const char *locale)
                case LC_MESSAGES:
                        envvar = "LC_MESSAGES";
 #ifdef WIN32
+#if defined(LIBINTL_VERSION) && (LIBINTL_VERSION >= 0x001401)
+                       if (locale == NULL || locale[0] == '\0' ||
+                               strcmp(locale, "C") == 0 || strcmp(locale, 
"POSIX") == 0)
+                               result = setlocale(LC_CTYPE, NULL);
+                       else
+                               result = (char *) locale;
+#else
+                       /* Convert to ISO locale name */
                        result = IsoLocaleName(locale);
                        if (result == NULL)
                                result = (char *) locale;
-                       elog(DEBUG3, "IsoLocaleName() executed; locale: 
\"%s\"", result);
+#endif
+               elog(DEBUG3,"LC_MESSAGES locale: \"%s\"", result);
 #endif                                                 /* WIN32 */
                        break;
 #endif                                                 /* LC_MESSAGES */
-- 
2.52.0.windows.1

Reply via email to