In perl.git, the branch maint-5.22 has been updated <http://perl5.git.perl.org/perl.git/commitdiff/252ab0bb8fa8a2ec1f266cc4ef62c4afb520b30f?hp=8b0897613193634594a3bc37314e614c6550eb08>
- Log ----------------------------------------------------------------- commit 252ab0bb8fa8a2ec1f266cc4ef62c4afb520b30f Author: Karl Williamson <[email protected]> Date: Wed Mar 23 09:17:05 2016 -0600 PATCH: [perl 127537] /\W/ regression with UTF-8 This bug is apparently uncommon in the field, as I was the one who discovered it. It requires a UTF-8 pattern containing a complemented posix class, like \W or \S, in an inverted character class, like [^\Wfoo] in a pattern that also has a synthetic start class generated by the regex optimizer for it . The fix is trivial. (modified from commit ac33c516140ee213a8a20ada506f97b3a7776ae4 so that it would apply to 5.22. ----------------------------------------------------------------------- Summary of changes: regcomp.c | 8 ++++++-- t/re/re_tests | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/regcomp.c b/regcomp.c index b8d7d38..e7b82a8 100644 --- a/regcomp.c +++ b/regcomp.c @@ -1184,8 +1184,12 @@ S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t *pRExC_state, } /* If this can match all upper Latin1 code points, have to add them - * as well */ - if (ANYOF_FLAGS(node) & ANYOF_MATCHES_ALL_NON_UTF8_NON_ASCII) { + * as well. But don't add them if inverting, as when that gets done below, + * it would exclude all these characters, including the ones it shouldn't + * that were added just above */ + if (ANYOF_FLAGS(node) & (ANYOF_INVERT|ANYOF_MATCHES_ALL_NON_UTF8_NON_ASCII) + == ANYOF_MATCHES_ALL_NON_UTF8_NON_ASCII) + { _invlist_union(invlist, PL_UpperLatin1, &invlist); } diff --git a/t/re/re_tests b/t/re/re_tests index 663307f..85ce7f4 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -1613,6 +1613,8 @@ a(.)\4294967298 ab\o{42}94967298 ya $1 b \d not converted to native; \o{} is ^m?(\d)(.*)\1$ 5b5 y $1 5 ^m?(\d)(.*)\1$ aba n - - +^_?[^\W_0-9]\w\z \xAA\x{100} y $& \xAA\x{100} [perl #127537] + # 17F is 'Long s'; This makes sure the a's in /aa can be separate /s/ai \x{17F} y $& \x{17F} /s/aia \x{17F} n - - -- Perl5 Master Repository
