I wrote: > I agree with your point that this is a shouldn't-happen corner case. > The question boils down to, if it *does* happen, does that constitute > a meaningful information leak? Up to now we've taken quite a hard > line about what leakproofness means, so deciding that varstr_cmp > is leakproof would constitute moving the goalposts a bit. They'd > still be in the same stadium, though, IMO.
For most of us it might be more meaningful to look at the non-Windows code paths, for which the question reduces to what we think of this: UErrorCode status; status = U_ZERO_ERROR; result = ucol_strcollUTF8(mylocale->info.icu.ucol, arg1, len1, arg2, len2, &status); if (U_FAILURE(status)) ereport(ERROR, (errmsg("collation failed: %s", u_errorName(status)))); which, as it happens, is also a UTF8-encoding-only code path. Can this throw an error in practice, and if so does that constitute a meaningful information leak? (For bonus points: is this error report up to project standards?) Thumbing through the list of UErrorCode values, it seems like the only ones that are applicable here and aren't internal-error cases are U_INVALID_CHAR_FOUND and the like, so that this boils down to "one of the strings contains a character that ICU can't cope with". Maybe that's impossible except with a pre-existing encoding violation, or maybe not. In any case, from a purely theoretical viewpoint, such an error message *does* constitute a leak of information about the input strings. Whether it's a usable leak is very debatable, but that's basically what we've got to decide. regards, tom lane