Hi Paul, > Shouldn't regex be avoiding strcasecmp entirely? > That is, couldn't there be a weird locale that considers > the lower-case equivalent of "U" to be "uu", or something > weird like that?
In such a locale, strcasecmp would not consider "U" and "uu" as being equivalent; only mbscasecmp would do this. But you're right: for comparing results of nl_langinfo (CODESET), one should not use a locale dependent comparison. You wouldn't want "ISO-8859-9" and "iso-8859-9" to be considered as different, just because the locale is Turkish. > For this particular case c-strcase seems overkill, so how > about the following further patch? > > diff --git a/lib/regcomp.c b/lib/regcomp.c > index 7eb003b..6d5525a 100644 > --- a/lib/regcomp.c > +++ b/lib/regcomp.c > @@ -899,8 +899,10 @@ init_dfa (re_dfa_t *dfa, size_t pat_len) > != 0); > #else > codeset_name = nl_langinfo (CODESET); > - if (strcasecmp (codeset_name, "UTF-8") == 0 > - || strcasecmp (codeset_name, "UTF8") == 0) > + if ((codeset_name[0] == 'U' || codeset_name[0] == 'u') > + && (codeset_name[1] == 'T' || codeset_name[1] == 't') > + && (codeset_name[2] == 'F' || codeset_name[2] == 'f') > + && strcmp (codeset_name + 3 + (codeset_name[3] == '-'), "8") == 0) > dfa->is_utf8 = 1; > > /* We check exhaustively in the loop below if this charset is a > diff --git a/modules/regex b/modules/regex > index 5371bab..cfc5d07 100644 > --- a/modules/regex > +++ b/modules/regex > @@ -26,7 +26,6 @@ mbsinit [test $ac_use_included_regex = yes] > nl_langinfo [test $ac_use_included_regex = yes] > stdbool [test $ac_use_included_regex = yes] > stdint [test $ac_use_included_regex = yes] > -strcase [test $ac_use_included_regex = yes] > wchar [test $ac_use_included_regex = yes] > wcrtomb [test $ac_use_included_regex = yes] > wctype-h [test $ac_use_included_regex = yes] Looks right to me. Please add to this the removal of <strings.h> from regex_internal.h, since I had already committed the #include <strings.h>. Bruno