In perl.git, the branch blead has been updated <https://perl5.git.perl.org/perl.git/commitdiff/1222ab91f6663819e940d74716fd36e292cd58fb?hp=26d5d9442c9fe6c919fee623d17c39a51ff863f2>
- Log ----------------------------------------------------------------- commit 1222ab91f6663819e940d74716fd36e292cd58fb Author: Karl Williamson <[email protected]> Date: Sat Oct 28 13:08:02 2017 -0600 Bump Socket version to 2.020_04 Commit 0cdc775ef423ad6415e6f80b9244c17a52bf5149 made a small change in cpan/Socket/Socket.pm, causing a porting test failure, which is solved by a version bump, and changing customized.dat to account for that. commit 84fefe3fd7fa1f1893200b77ec5bc51014622c0d Author: Karl Williamson <[email protected]> Date: Sat Oct 28 12:51:07 2017 -0600 perlre: Slight clarification commit 754dd7544223a129a81fe68103efea600c63a5a6 Author: Karl Williamson <[email protected]> Date: Sat Oct 28 12:32:57 2017 -0600 perldiag: More detail on /i var length lookbehind See http://nntp.perl.org/group/perl.perl5.porters/245323 commit 8ec77c3c4754a4a36357fd4775016a63053a3a9e Author: Karl Williamson <[email protected]> Date: Fri Oct 27 12:52:26 2017 -0600 regcomp.h: Add comment ----------------------------------------------------------------------- Summary of changes: cpan/Socket/Socket.pm | 2 +- pod/perldelta.pod | 10 +++++++++- pod/perldiag.pod | 44 ++++++++++++++++++++++++++++++++++++-------- pod/perlre.pod | 20 ++++++++++---------- regcomp.h | 1 + t/porting/customized.dat | 2 +- 6 files changed, 58 insertions(+), 21 deletions(-) diff --git a/cpan/Socket/Socket.pm b/cpan/Socket/Socket.pm index 583b9f9d56..833f0fc365 100644 --- a/cpan/Socket/Socket.pm +++ b/cpan/Socket/Socket.pm @@ -3,7 +3,7 @@ package Socket; use strict; { use 5.006001; } -our $VERSION = '2.020_03'; # patched in perl5.git +our $VERSION = '2.020_04'; # patched in perl5.git $VERSION =~ tr/_//d; # make $VERSION numeric =head1 NAME diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 1ea6c597ec..38a9d323d7 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -164,7 +164,15 @@ section. Additionally, the following selected changes have been made: -=head3 L<XXX> +=head3 L<perldiag/Variable length lookbehind not implemented in regex m/%s/> + +This now gives more ideas as to workarounds to the issue that was +introduced in Perl 5.18 (but not documented explicitly in its perldelta) +for the fact that some Unicode C</i> rules cause a few sequences such as + + (?<!st) + +to be considered variable length, and hence disallowed. =over 4 diff --git a/pod/perldiag.pod b/pod/perldiag.pod index d417fb296e..dee59d979b 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -7282,14 +7282,42 @@ known at compile time. For positive lookbehind, you can use the C<\K> regex construct as a way to get the equivalent functionality. See L<(?<=pattern) and \K in perlre|perlre/\K>. -There are non-obvious Unicode rules under C</i> that can match variably, -but which you might not think could. For example, the substring C<"ss"> -can match the single character LATIN SMALL LETTER SHARP S. There are -other sequences of ASCII characters that can match single ligature -characters, such as LATIN SMALL LIGATURE FFI matching C<qr/ffi/i>. -Starting in Perl v5.16, if you only care about ASCII matches, adding the -C</aa> modifier to the regex will exclude all these non-obvious matches, -thus getting rid of this message. You can also say C<S<use re qw(/aa)>> +Starting in Perl 5.18, there are non-obvious Unicode rules under C</i> +that can match variably, but which you might not think could. For +example, the substring C<"ss"> can match the single character LATIN +SMALL LETTER SHARP S. Here's a complete list of the current ones +affecting ASCII characters: + + ASCII + sequence Matches single letter under /i + FF U+FB00 LATIN SMALL LIGATURE FF + FFI U+FB03 LATIN SMALL LIGATURE FFI + FFL U+FB04 LATIN SMALL LIGATURE FFL + FI U+FB01 LATIN SMALL LIGATURE FI + FL U+FB02 LATIN SMALL LIGATURE FL + SS U+00DF LATIN SMALL LETTER SHARP S + U+1E9E LATIN CAPITAL LETTER SHARP S + ST U+FB06 LATIN SMALL LIGATURE ST + U+FB05 LATIN SMALL LIGATURE LONG S T + +This list is subject to change, but is quite unlikely to. +Each ASCII sequence can be any combination of upper- and lowercase. + +You can avoid this by using a bracketed character class in the +lookbehind assertion, like + + (?<![sS]t) + (?<![fF]f[iI]) + +This fools Perl into not matching the ligatures. + +Another option for Perls starting with 5.16, if you only care about +ASCII matches, is to add the C</aa> modifier to the regex. This will +exclude all these non-obvious matches, thus getting rid of this message. +You can also say + + use if $] ge 5.016, re => '/aa'; + to apply C</aa> to all regular expressions compiled within its scope. See L<re>. diff --git a/pod/perlre.pod b/pod/perlre.pod index b11d862b40..068b905acd 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -206,16 +206,16 @@ C<"[">, and matches its literal self: The list of characters within the character class gives the set of characters matched by the class. C<"[abc]"> matches a single "a" or "b" or "c". But if the first character after the C<"["> is C<"^">, the -class matches any character not in the list. Within a list, the C<"-"> -character specifies a range of characters, so that C<a-z> represents all -characters between "a" and "z", inclusive. If you want either C<"-"> or -C<"]"> itself to be a member of a class, put it at the start of the list -(possibly after a C<"^">), or escape it with a backslash. C<"-"> is -also taken literally when it is at the end of the list, just before the -closing C<"]">. (The following all specify the same class of three -characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All are different from -C<[a-z]>, which specifies a class containing twenty-six characters, even -on EBCDIC-based character sets.) +class instead matches any character not in the list. Within a list, the +C<"-"> character specifies a range of characters, so that C<a-z> +represents all characters between "a" and "z", inclusive. If you want +either C<"-"> or C<"]"> itself to be a member of a class, put it at the +start of the list (possibly after a C<"^">), or escape it with a +backslash. C<"-"> is also taken literally when it is at the end of the +list, just before the closing C<"]">. (The following all specify the +same class of three characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All +are different from C<[a-z]>, which specifies a class containing +twenty-six characters, even on EBCDIC-based character sets.) There is lots more to bracketed character classes; full details are in L<perlrecharclass/Bracketed Character Classes>. diff --git a/regcomp.h b/regcomp.h index 8c42d4e76d..0a013e5a09 100644 --- a/regcomp.h +++ b/regcomp.h @@ -1073,6 +1073,7 @@ re.pm, especially to the documentation. PERL_PV_ESCAPE_RE|PERL_PV_ESCAPE_NONASCII |((isuni) ? PERL_PV_ESCAPE_UNI : 0) ); \ const int rlen = SvCUR(dsv) +/* This is currently unsed in the core */ #define RE_SV_ESCAPE(rpv,isuni,dsv,sv,m) \ const char * const rpv = \ pv_pretty((dsv), (SvPV_nolen_const(sv)), (SvCUR(sv)), (m), \ diff --git a/t/porting/customized.dat b/t/porting/customized.dat index e4e3dc22a2..4c49a767bf 100644 --- a/t/porting/customized.dat +++ b/t/porting/customized.dat @@ -15,7 +15,7 @@ Pod::Checker cpan/Pod-Checker/t/pod/selfcheck.t 8ce3cfd38e4b9bcf5bc7fe7f2a14195e Pod::Checker cpan/Pod-Checker/t/pod/testcmp.pl a0cd5c8eca775c7753f4464eee96fa916e3d8a16 Pod::Checker cpan/Pod-Checker/t/pod/testpchk.pl b2072c7f4379fd050e15424175d7cac5facf5b3b Pod::Perldoc cpan/Pod-Perldoc/lib/Pod/Perldoc.pm 582be34c077c9ff44d99914724a0cc2140bcd48c -Socket cpan/Socket/Socket.pm 65c0af9d27d30652a5e73f0a05c881a240240dd4 +Socket cpan/Socket/Socket.pm ee83312b6e3e0185af8d41a18635913d84b1b651 Socket cpan/Socket/Socket.xs edd4fed212785f11c5c2095a75941dad27d586d9 autodie cpan/autodie/t/mkdir.t 9e70d2282a3cc7d76a78bf8144fccba20fb37dac perlfaq cpan/perlfaq/lib/perlfaq5.pod bcc1b6af3b6dff3973643acf8d5e741463374123 -- Perl5 Master Repository
