https://github.com/python/cpython/commit/2a2bc82cef9c6ae0b8de833e2b4aee37519de9d7
commit: 2a2bc82cef9c6ae0b8de833e2b4aee37519de9d7
branch: main
author: Serhiy Storchaka <[email protected]>
committer: encukou <[email protected]>
date: 2025-10-16T09:54:41+02:00
summary:

gh-130567: Remove optimistic allocation in locale.strxfrm() (GH-137143)

On modern systems, the result of wcsxfrm() is much larger the size of
the input string (from 4+2*n on Windows to 4+5*n on Linux for simple
ASCII strings), so optimistic allocation of the buffer of the same size
never works.

The exception is if the locale is "C" (or unset), but in that case the `wcsxfrm`
call should be fast (and calling `locale.strxfrm()` doesn't make too much
sense in the first place).

files:
M Modules/_localemodule.c

diff --git a/Modules/_localemodule.c b/Modules/_localemodule.c
index cb448b14d8cd63..7174eebd0c94de 100644
--- a/Modules/_localemodule.c
+++ b/Modules/_localemodule.c
@@ -455,36 +455,24 @@ _locale_strxfrm_impl(PyObject *module, PyObject *str)
         goto exit;
     }
 
-    /* assume no change in size, first */
-    n1 = n1 + 1;
-    /* Yet another +1 is needed to work around a platform bug in wcsxfrm()
-     * on macOS. See gh-130567. */
-    buf = PyMem_New(wchar_t, n1+1);
+    errno = 0;
+    n2 = wcsxfrm(NULL, s, 0);
+    if (errno && errno != ERANGE) {
+        PyErr_SetFromErrno(PyExc_OSError);
+        goto exit;
+    }
+    buf = PyMem_New(wchar_t, n2+1);
     if (!buf) {
         PyErr_NoMemory();
         goto exit;
     }
+
     errno = 0;
-    n2 = wcsxfrm(buf, s, n1);
-    if (errno && errno != ERANGE) {
+    n2 = wcsxfrm(buf, s, n2+1);
+    if (errno) {
         PyErr_SetFromErrno(PyExc_OSError);
         goto exit;
     }
-    if (n2 >= (size_t)n1) {
-        /* more space needed */
-        wchar_t * new_buf = PyMem_Realloc(buf, (n2+1)*sizeof(wchar_t));
-        if (!new_buf) {
-            PyErr_NoMemory();
-            goto exit;
-        }
-        buf = new_buf;
-        errno = 0;
-        n2 = wcsxfrm(buf, s, n2+1);
-        if (errno) {
-            PyErr_SetFromErrno(PyExc_OSError);
-            goto exit;
-        }
-    }
     /* The result is just a sequence of integers, they are not necessary
        Unicode code points, so PyUnicode_FromWideChar() cannot be used
        here. For example, 0xD83D 0xDC0D should not be larger than 0xFF41.

_______________________________________________
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

Reply via email to