On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote: > On the current master (after 455f948b0, and before f7faa9976, of > course) > I get an ASAN-detected failure with the following query: > CREATE COLLATION col (provider = icu, locale = '123456789012'); >
Thank you for the report! ICU source specifically says: /** * Useful constant for the maximum size of the language part of a locale ID. * (including the terminating NULL). * @stable ICU 2.0 */ #define ULOC_LANG_CAPACITY 12 So the fact that it returning success without nul-terminating the result is an ICU bug in my opinion, and I reported it here: https://unicode-org.atlassian.net/browse/ICU-22394 Unfortunately that means we need to be a bit more paranoid and always check for that warning, even if we have a preallocated buffer of the "correct" size. It also means that both U_STRING_NOT_TERMINATED_WARNING and U_BUFFER_OVERFLOW_ERROR will be user-facing errors (potentially scary), unless we check for those errors each time and report specific errors for them. Another approach is to always check the length and loop using dynamic allocation so that we never overflow (and forget about any static buffers). That seems like overkill given that the problem case is obviously invalid input; I think as long as we catch it and throw an ERROR it's fine. But I can do this if others think it's worthwhile. Patch attached. It just checks for the U_STRING_NOT_TERMINATED_WARNING in a few places and turns it into an ERROR. It also cleans up the loop for uloc_getLanguageTag() to check for U_STRING_NOT_TERMINATED_WARNING rather than (U_SUCCESS(status) && len >= buflen). -- Jeff Davis PostgreSQL Contributor Team - AWS
From 9c8e9272ca807c9f75a7b32fa3190700cc600260 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Mon, 15 May 2023 13:35:07 -0700 Subject: [PATCH] ICU: check for U_STRING_NOT_TERMINATED_WARNING. In some cases, ICU can fail to NUL-terminate a result string even if using an appropriately-sized static buffer. The caller must either check for len >= buflen or U_STRING_NOT_TERMINATED_WARNING. The specific problem is related to uloc_getLanguage(), but add the check in other cases as well. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf1352...@gmail.com --- src/backend/utils/adt/pg_locale.c | 29 +++++++++++------------------ src/bin/initdb/initdb.c | 15 ++++----------- 2 files changed, 15 insertions(+), 29 deletions(-) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index f0b6567da1..1cf93b2d20 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -2468,7 +2468,7 @@ pg_ucol_open(const char *loc_str) status = U_ZERO_ERROR; uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) { ereport(ERROR, (errmsg("could not get language from locale \"%s\": %s", @@ -2504,7 +2504,7 @@ pg_ucol_open(const char *loc_str) * Pretend the error came from ucol_open(), for consistent error * message across ICU versions. */ - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) { ucol_close(collator); ereport(ERROR, @@ -2639,7 +2639,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar) status = U_ZERO_ERROR; len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1, buff_uchar, len_uchar, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || + status == U_STRING_NOT_TERMINATED_WARNING) ereport(ERROR, (errmsg("%s failed: %s", "ucnv_fromUChars", u_errorName(status)))); @@ -2681,7 +2682,7 @@ icu_set_collation_attributes(UCollator *collator, const char *loc, icu_locale_id = palloc(len + 1); *status = U_ZERO_ERROR; len = uloc_canonicalize(loc, icu_locale_id, len + 1, status); - if (U_FAILURE(*status)) + if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING) return; lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id)); @@ -2765,7 +2766,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc, pfree(lower_str); } - #endif /* @@ -2789,7 +2789,7 @@ icu_language_tag(const char *loc_str, int elevel) status = U_ZERO_ERROR; uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) { if (elevel > 0) ereport(elevel, @@ -2811,19 +2811,12 @@ icu_language_tag(const char *loc_str, int elevel) langtag = palloc(buflen); while (true) { - int32_t len; - status = U_ZERO_ERROR; - len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status); + uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status); - /* - * If the result fits in the buffer exactly (len == buflen), - * uloc_toLanguageTag() will return success without nul-terminating - * the result. Check for either U_BUFFER_OVERFLOW_ERROR or len >= - * buflen and try again. - */ + /* try again if the buffer is not large enough */ if ((status == U_BUFFER_OVERFLOW_ERROR || - (U_SUCCESS(status) && len >= buflen)) && + status == U_STRING_NOT_TERMINATED_WARNING) && buflen < MaxAllocSize) { buflen = Min(buflen * 2, MaxAllocSize); @@ -2878,7 +2871,7 @@ icu_validate_locale(const char *loc_str) /* validate that we can extract the language */ status = U_ZERO_ERROR; uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) { ereport(elevel, (errmsg("could not get language from ICU locale \"%s\": %s", @@ -2901,7 +2894,7 @@ icu_validate_locale(const char *loc_str) status = U_ZERO_ERROR; uloc_getLanguage(otherloc, otherlang, ULOC_LANG_CAPACITY, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) continue; if (strcmp(lang, otherlang) == 0) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index e03d498b1e..30b576932f 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2252,7 +2252,7 @@ icu_language_tag(const char *loc_str) status = U_ZERO_ERROR; uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status); - if (U_FAILURE(status)) + if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING) { pg_fatal("could not get language from locale \"%s\": %s", loc_str, u_errorName(status)); @@ -2272,19 +2272,12 @@ icu_language_tag(const char *loc_str) langtag = pg_malloc(buflen); while (true) { - int32_t len; - status = U_ZERO_ERROR; - len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status); + uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status); - /* - * If the result fits in the buffer exactly (len == buflen), - * uloc_toLanguageTag() will return success without nul-terminating - * the result. Check for either U_BUFFER_OVERFLOW_ERROR or len >= - * buflen and try again. - */ + /* try again if the buffer is not large enough */ if (status == U_BUFFER_OVERFLOW_ERROR || - (U_SUCCESS(status) && len >= buflen)) + status == U_STRING_NOT_TERMINATED_WARNING) { buflen = buflen * 2; langtag = pg_realloc(langtag, buflen); -- 2.34.1