Prevent buffer overrun in unicode_normalize(). Some UTF8 characters decompose to more than a dozen codepoints. It is possible for an input string that fits into well under 1GB to produce more than 4G decomposed codepoints, causing unicode_normalize()'s decomp_size variable to wrap around to a small positive value. This results in a small output buffer allocation and subsequent buffer overrun.
To fix, test after each addition to see if we've overrun MaxAllocSize, and break out of the loop early if so. In frontend code we want to just return NULL for this failure (treating it like OOM). In the backend, we can rely on the following palloc() call to throw error. I also tightened things up in the calling functions in varlena.c, using size_t rather than int and allocating the input workspace with palloc_array(). These changes are probably unnecessary given the knowledge that the original input and the normalized output_chars array must fit into 1GB, but it's a lot easier to believe the code is safe with these changes. Reported-by: Xint Code Reported-by: Bruce Dang <[email protected]> Author: Tom Lane <[email protected]> Co-authored-by: Heikki Linnakangas <[email protected]> Backpatch-through: 14 Security: CVE-2026-6473 Branch ------ REL_17_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/ebcfa7867fb5f3acb906fda77b5cf0282d9ad81a Author: Tom Lane <[email protected]> Modified Files -------------- src/backend/utils/adt/varlena.c | 14 +++++++------- src/common/unicode_norm.c | 19 +++++++++++++++++++ 2 files changed, 26 insertions(+), 7 deletions(-)
