In perl.git, the branch blead has been updated <https://perl5.git.perl.org/perl.git/commitdiff/59e54936361dbc8a6aa3224d5456d809c079d269?hp=76d3ad4c2443f94d2d636a40a01762c27bbf1c10>
- Log ----------------------------------------------------------------- commit 59e54936361dbc8a6aa3224d5456d809c079d269 Author: Karl Williamson <[email protected]> Date: Sat Sep 21 13:18:12 2019 -0600 regcomp.h: Parenthesize param in macro expansion This is always a good idea commit c5184715c0018eac1440599795d6341d07559dd4 Author: Karl Williamson <[email protected]> Date: Sat Sep 21 13:14:25 2019 -0600 regcomp.h: Remove duplicate macro expansion This macro has the same definition as another. commit b61e55cb1695ff940310c75f08e41cfbfc16d73c Author: Karl Williamson <[email protected]> Date: Thu Sep 26 22:09:51 2019 -0600 regcomp.c: Clarify some comments commit a2f213ef6995b39265d4ac5097a63ca063dbb346 Author: Karl Williamson <[email protected]> Date: Sun Sep 22 15:09:05 2019 -0600 regcomp.sym Update and improve descriptions of some nodes EXACTFU nodes always now fold their strings; the information here had not been updated to reflect that change. And the descriptions of several EXACTish nodes are now changed to be slightly shorter and to remove mention of the string length, which is problematic, and is covered in the description for EXACT commit 484678fc0e05755eaaecb74c8b1cf89e1e54984b Author: Karl Williamson <[email protected]> Date: Thu Sep 26 16:30:21 2019 -0600 regen/regcomp.pl: Rename variable The old name was misleading. commit e21ef6928fa32f8c21414f00ec4a6cae741dec7a Author: Karl Williamson <[email protected]> Date: Thu Sep 26 16:23:33 2019 -0600 regen/regcomp.pl, regcomp.sym: Comments I spent some time in this code trying to understand some things, and as a result I'm commenting previously undocumented features. The comments about what an entry in regcomp.sym should look like are moved to that file, rather than the file that reads it. The former is most often touched, and they had gotten out-of-sync in the latter. Things now make more sense to me, and hopefully anyone using this in the future. commit 27c3e5ad94fad01593474ee3038849be74be86a0 Author: Karl Williamson <[email protected]> Date: Thu Sep 26 20:49:53 2019 -0600 Silence verbatim line pod warning in perldebguts This generated pod has many lines that really can't be wrapped. So change the podcheck.t db to ignore these errors in this file. commit cac0218b84f9ae47a2369c80f167b508534e0351 Author: Karl Williamson <[email protected]> Date: Thu Sep 26 13:14:14 2019 -0600 Add note to debugging output if regex already compiled Prior to this commit, the debugging output says "Compiling REx foo". But there was no indication that it was skipped due to the pattern already being compiled; so that was confusing to people, and was a Stack Overflow question of what is going on. Now there's an extra message that the recompilation is skipped. commit 7cb9b5f3b2a5b765e3399f08c283a1156931be4e Author: Karl Williamson <[email protected]> Date: Sun Sep 22 15:48:51 2019 -0600 perlrun: Note that -W can't be in PERL5OPT ----------------------------------------------------------------------- Summary of changes: pod/perldebguts.pod | 32 +++++++++++++++---------------- pod/perlrun.pod | 2 ++ regcomp.c | 15 +++++++++++---- regcomp.h | 5 +++-- regcomp.sym | 37 +++++++++++++++++++++++++----------- regen/regcomp.pl | 43 +++++++++++++++++++----------------------- regnodes.h | 14 +++++++------- t/porting/known_pod_issues.dat | 2 +- 8 files changed, 85 insertions(+), 65 deletions(-) diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod index b439380d8a..1e23b84af4 100644 --- a/pod/perldebguts.pod +++ b/pod/perldebguts.pod @@ -562,7 +562,7 @@ will be lost. =for regcomp.pl begin - # TYPE arg-description [num-args] [longjump-len] DESCRIPTION + # TYPE arg-description [regnode-struct-suffix] [longjump-len] DESCRIPTION # Exit points @@ -663,25 +663,25 @@ will be lost. EXACTL str Like EXACT, but /l is in effect (used so locale-related warnings can be checked for). - EXACTF str Match this string using /id rules (w/len); + EXACTF str Like EXACT, but match using /id rules; (string not UTF-8, not guaranteed to be folded). - EXACTFL str Match this string using /il rules (w/len); - (string not guaranteed to be folded). - EXACTFU str Match this string using /iu rules (w/len); - (string folded iff in UTF-8; non-UTF8 - folded length <= unfolded). - EXACTFAA str Match this string using /iaa rules (w/len) - (string folded iff in UTF-8; non-UTF8 - folded length <= unfolded). - - EXACTFUP str Match this string using /iu rules (w/len); + EXACTFL str Like EXACT, but match using /il rules; + (string not likely to be folded). + EXACTFU str Like EXACT, but match using /iu rules; + (string folded). + EXACTFAA str Like EXACT, but match using /iaa rules; + (string folded iff pattern is UTF8; folded + length <= unfolded). + + EXACTFUP str Like EXACT, but match using /iu rules; (string not UTF-8, not guaranteed to be - folded; and its Problematic). + folded; and it is Problematic). - EXACTFLU8 str Like EXACTFU, but use /il, UTF-8, folded, - and everything in it is above 255. - EXACTFAA_NO_TRIE str Match this string using /iaa rules (w/len) + EXACTFLU8 str Like EXACTFU, but use /il, UTF-8, (string + is folded, and everything in it is above + 255. + EXACTFAA_NO_TRIE str Like EXACT, but match using /iaa rules (string not UTF-8, not guaranteed to be folded, not currently trie-able). diff --git a/pod/perlrun.pod b/pod/perlrun.pod index 2a32976c01..b32598424f 100644 --- a/pod/perlrun.pod +++ b/pod/perlrun.pod @@ -949,6 +949,8 @@ X<-X> Disables all warnings regardless of C<use warnings> or C<$^W>. See L<warnings>. +Forbidden in L</C<PERL5OPT>>. + =item B<-x> X<-x> diff --git a/regcomp.c b/regcomp.c index b389f9ec7f..e74f4d8fab 100644 --- a/regcomp.c +++ b/regcomp.c @@ -7584,6 +7584,12 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, && memEQ(RX_PRECOMP(old_re), exp, plen) && !runtime_code /* with runtime code, always recompile */ ) { + DEBUG_COMPILE_r({ + SV *dsv= sv_newmortal(); + RE_PV_QUOTED_DECL(s, RExC_utf8, dsv, exp, plen, PL_dump_re_max_len); + Perl_re_printf( aTHX_ "%sSkipping recompilation of unchanged REx%s %s\n", + PL_colors[4], PL_colors[5], s); + }); return old_re; } @@ -19601,8 +19607,9 @@ S_nextchar(pTHX_ RExC_state_t *pRExC_state) STATIC void S_change_engine_size(pTHX_ RExC_state_t *pRExC_state, const Ptrdiff_t size) { - /* 'size' is the delta to add or subtract from the current memory allocated - * to the regex engine being constructed */ + /* 'size' is the delta number of smallest regnode equivalents to add or + * subtract from the current memory allocated to the regex engine being + * constructed. */ PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE; @@ -19634,8 +19641,8 @@ S_change_engine_size(pTHX_ RExC_state_t *pRExC_state, const Ptrdiff_t size) STATIC regnode_offset S_regnode_guts(pTHX_ RExC_state_t *pRExC_state, const U8 op, const STRLEN extra_size, const char* const name) { - /* Allocate a regnode for 'op', with 'extra_size' extra space. It aligns - * and increments RExC_size and RExC_emit + /* Allocate a regnode for 'op', with 'extra_size' extra (smallest) regnode + * equivalents space. It aligns and increments RExC_size and RExC_emit * * It returns the regnode's offset into the regex engine program */ diff --git a/regcomp.h b/regcomp.h index 62f4398ed1..d9f2cbe63e 100644 --- a/regcomp.h +++ b/regcomp.h @@ -331,11 +331,12 @@ struct regnode_ssc { #define FLAGS(p) ((p)->flags) /* Caution: Doesn't apply to all \ regnode types. For some, it's the \ character set of the regnode */ -#define OPERAND(p) (((struct regnode_string *)p)->string) +#define OPERAND(p) STRING(p) + #define MASK(p) ((char*)OPERAND(p)) #define STR_LEN(p) (((struct regnode_string *)p)->str_len) #define STRING(p) (((struct regnode_string *)p)->string) -#define STR_SZ(l) ((l + sizeof(regnode) - 1) / sizeof(regnode)) +#define STR_SZ(l) (((l) + sizeof(regnode) - 1) / sizeof(regnode)) #define NODE_SZ_STR(p) (STR_SZ(STR_LEN(p))+1) #undef NODE_ALIGN diff --git a/regcomp.sym b/regcomp.sym index c69e4c9452..8a2fb240f1 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -11,14 +11,29 @@ # Note that the order in this file is important. # # Format for first section: -# NAME \s+ TYPE, arg-description [num-args] [flags] [longjump] ; DESCRIPTION +# NAME \s+ TYPE, arg-description [struct regnode suffix] [flags] [longjump] ; DESCRIPTION +# arg-description is currently unused +# suffix is appended to 'struct_regnode_' giving which one to use. If empty, +# it means plain 'struct regnode'. If the regnode is a string one, this +# should instead refer to the base regnode, without the char[1] element +# of the structure # flag <S> means is REGNODE_SIMPLE; flag <V> means is REGNODE_VARIES; <.> is -# a placeholder -# longjump is 1 if the (first) argument holds the next offset. -# +# a placeholder +# longjump is 1 if the (first) argument holds the next offset (instead of the +# usual 'next_offset' field # # run perl regen.pl after editing this file +# +- suffix of which struct regnode to use e.g., +# | +- flags (S or V) struct regnode_1 +# un- | | +- longjmp (0, blank, or 1) blank means 0 +# Name Type used | | | ; comment +# -------------------------------------------------------------------------- +# IFMATCH BRANCHJ, off 1 . 1 ; Succeeds if the following matches. +# UNLESSM BRANCHJ, off 1 . 1 ; Fails if the following matches. +# SUSPEND BRANCHJ, off 1 V 1 ; "Independent" sub-RE. +# IFTHEN BRANCHJ, off 1 V 1 ; Switch, should be preceded by switcher. +# GROUPP GROUPP, num 1 ; Whether the group matched. #* Exit points @@ -103,22 +118,22 @@ BRANCH BRANCH, node 0 V ; Match this alternative, or the next... EXACT EXACT, str ; Match this string (flags field is the length). EXACTL EXACT, str ; Like EXACT, but /l is in effect (used so locale-related warnings can be checked for). -EXACTF EXACT, str ; Match this string using /id rules (w/len); (string not UTF-8, not guaranteed to be folded). -EXACTFL EXACT, str ; Match this string using /il rules (w/len); (string not guaranteed to be folded). -EXACTFU EXACT, str ; Match this string using /iu rules (w/len); (string folded iff in UTF-8; non-UTF8 folded length <= unfolded). -EXACTFAA EXACT, str ; Match this string using /iaa rules (w/len) (string folded iff in UTF-8; non-UTF8 folded length <= unfolded). +EXACTF EXACT, str ; Like EXACT, but match using /id rules; (string not UTF-8, not guaranteed to be folded). +EXACTFL EXACT, str ; Like EXACT, but match using /il rules; (string not likely to be folded). +EXACTFU EXACT, str ; Like EXACT, but match using /iu rules; (string folded). +EXACTFAA EXACT, str ; Like EXACT, but match using /iaa rules; (string folded iff pattern is UTF8; folded length <= unfolded). # End of important relative ordering. -EXACTFUP EXACT, str ; Match this string using /iu rules (w/len); (string not UTF-8, not guaranteed to be folded; and its Problematic). +EXACTFUP EXACT, str ; Like EXACT, but match using /iu rules; (string not UTF-8, not guaranteed to be folded; and it is Problematic). # In order for a non-UTF-8 EXACTFAA to think the pattern is pre-folded when # matching a UTF-8 target string, there would have to be something like an # EXACTFAA_MICRO which would not be considered pre-folded for UTF-8 targets, # since the fold of the MICRO SIGN would not be done, and would be # representable in the UTF-8 target string. -EXACTFLU8 EXACT, str ; Like EXACTFU, but use /il, UTF-8, folded, and everything in it is above 255. -EXACTFAA_NO_TRIE EXACT, str ; Match this string using /iaa rules (w/len) (string not UTF-8, not guaranteed to be folded, not currently trie-able). +EXACTFLU8 EXACT, str ; Like EXACTFU, but use /il, UTF-8, (string is folded, and everything in it is above 255. +EXACTFAA_NO_TRIE EXACT, str ; Like EXACT, but match using /iaa rules (string not UTF-8, not guaranteed to be folded, not currently trie-able). EXACT_ONLY8 EXACT, str ; Like EXACT, but only UTF-8 encoded targets can match diff --git a/regen/regcomp.pl b/regen/regcomp.pl index cb9861318d..2eac179684 100644 --- a/regen/regcomp.pl +++ b/regen/regcomp.pl @@ -49,14 +49,17 @@ use strict; # name Both Name of op/state # id Both integer value for this opcode/state # optype Both Either 'op' or 'state' -# line_num Both line_num number of the input file for this item. +# line_num Both line_num number of the input file for this item. # type Op Type of node (aka regkind) -# code Op what code is associated with this node (???) -# args Op what type of args the node has (which regnode struct) -# flags Op (???) +# code Op Apparently not used +# suffix Op which regnode struct this uses, so if this is '1', it +# uses 'struct regnode_1' +# flags Op S for simple; V for varies # longj Op Boolean as to if this node is a longjump -# comment Both Comment about node, if any +# comment Both Comment about node, if any. Placed in perlredebguts +# as its description # pod_comment Both Special comments for pod output (preceding lines in def) +# Such lines begin with '#*' # Global State my @all; # all opcodes/state @@ -97,23 +100,15 @@ sub register_node { } # Parse and add an opcode definition to the global state. -# An opcode definition looks like this: +# What an opcode definition looks like is given in regcomp.sym. # -# +- args -# | +- flags -# | | +- longjmp -# Name Type code | | | ; comment -# -------------------------------------------------------------------------- -# IFMATCH BRANCHJ, off 1 . 2 ; Succeeds if the following matches. -# UNLESSM BRANCHJ, off 1 . 2 ; Fails if the following matches. -# SUSPEND BRANCHJ, off 1 V 1 ; "Independent" sub-RE. -# IFTHEN BRANCHJ, off 1 V 1 ; Switch, should be preceded by switcher. -# GROUPP GROUPP, num 1 ; Whether the group matched. -# -# Not every opcode definition has all of these. We should maybe make this -# nicer/easier to read in the future. Also note that the above is tab +# Not every opcode definition has all of the components. We should maybe make +# this nicer/easier to read in the future. Also note that the above is tab # sensitive. +# Special comments for an entry precede it, and begin with '#*' and are placed +# in the generated pod file just before the entry. + sub parse_opcode_def { my ( $text, $line_num, $pod_comment )= @_; my $node= { @@ -129,10 +124,10 @@ sub parse_opcode_def { or die "Failed to match $_"; # the content of the "desc" field from the first step is extracted here: - @{$node}{qw(type code args flags longj)}= split /[,\s]\s*/, $node->{desc}; + @{$node}{qw(type code suffix flags longj)}= split /[,\s]\s*/, $node->{desc}; defined $node->{$_} or $node->{$_} = "" - for qw(type code args flags longj); + for qw(type code suffix flags longj); register_node($node); # has to be before the type_alias code below @@ -368,7 +363,7 @@ EOP foreach my $node (@ops) { my $size= 0; - $size= "EXTRA_SIZE(struct regnode_$node->{args})" if $node->{args}; + $size= "EXTRA_SIZE(struct regnode_$node->{suffix})" if $node->{suffix}; printf $out "\t%*s\t/* %*s */\n", -37, "$size,", -$rwidth, $node->{name}; } @@ -635,11 +630,11 @@ EOD print <<'END_OF_DESCR'; - # TYPE arg-description [num-args] [longjump-len] DESCRIPTION + # TYPE arg-description [regnode-struct-suffix] [longjump-len] DESCRIPTION END_OF_DESCR for my $n (@ops) { $node= $n; - $code= "$node->{code} " . ( $node->{args} || "" ); + $code= "$node->{code} " . ( $node->{suffix} || "" ); $code .= " $node->{longj}" if $node->{longj}; if ( $node->{pod_comment} ||= "" ) { diff --git a/regnodes.h b/regnodes.h index 3b93b85aa2..a1929b823f 100644 --- a/regnodes.h +++ b/regnodes.h @@ -50,13 +50,13 @@ #define BRANCH 36 /* 0x24 Match this alternative, or the next... */ #define EXACT 37 /* 0x25 Match this string (flags field is the length). */ #define EXACTL 38 /* 0x26 Like EXACT, but /l is in effect (used so locale-related warnings can be checked for). */ -#define EXACTF 39 /* 0x27 Match this string using /id rules (w/len); (string not UTF-8, not guaranteed to be folded). */ -#define EXACTFL 40 /* 0x28 Match this string using /il rules (w/len); (string not guaranteed to be folded). */ -#define EXACTFU 41 /* 0x29 Match this string using /iu rules (w/len); (string folded iff in UTF-8; non-UTF8 folded length <= unfolded). */ -#define EXACTFAA 42 /* 0x2a Match this string using /iaa rules (w/len) (string folded iff in UTF-8; non-UTF8 folded length <= unfolded). */ -#define EXACTFUP 43 /* 0x2b Match this string using /iu rules (w/len); (string not UTF-8, not guaranteed to be folded; and its Problematic). */ -#define EXACTFLU8 44 /* 0x2c Like EXACTFU, but use /il, UTF-8, folded, and everything in it is above 255. */ -#define EXACTFAA_NO_TRIE 45 /* 0x2d Match this string using /iaa rules (w/len) (string not UTF-8, not guaranteed to be folded, not currently trie-able). */ +#define EXACTF 39 /* 0x27 Like EXACT, but match using /id rules; (string not UTF-8, not guaranteed to be folded). */ +#define EXACTFL 40 /* 0x28 Like EXACT, but match using /il rules; (string not likely to be folded). */ +#define EXACTFU 41 /* 0x29 Like EXACT, but match using /iu rules; (string folded). */ +#define EXACTFAA 42 /* 0x2a Like EXACT, but match using /iaa rules; (string folded iff pattern is UTF8; folded length <= unfolded). */ +#define EXACTFUP 43 /* 0x2b Like EXACT, but match using /iu rules; (string not UTF-8, not guaranteed to be folded; and it is Problematic). */ +#define EXACTFLU8 44 /* 0x2c Like EXACTFU, but use /il, UTF-8, (string is folded, and everything in it is above 255. */ +#define EXACTFAA_NO_TRIE 45 /* 0x2d Like EXACT, but match using /iaa rules (string not UTF-8, not guaranteed to be folded, not currently trie-able). */ #define EXACT_ONLY8 46 /* 0x2e Like EXACT, but only UTF-8 encoded targets can match */ #define EXACTFU_ONLY8 47 /* 0x2f Like EXACTFU, but only UTF-8 encoded targets can match */ #define EXACTFU_S_EDGE 48 /* 0x30 /di rules, but nothing in it precludes /ui, except begins and/or ends with [Ss]; (string not UTF-8; compile-time only). */ diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat index 36ac9e4797..b2dd8df6d8 100644 --- a/t/porting/known_pod_issues.dat +++ b/t/porting/known_pod_issues.dat @@ -367,7 +367,7 @@ install ? Should you be using F<...> or maybe L<...> instead of 1 pod/perl.pod Verbatim line length including indents exceeds 79 by 8 pod/perlandroid.pod Verbatim line length including indents exceeds 79 by 3 pod/perlbook.pod Verbatim line length including indents exceeds 79 by 1 -pod/perldebguts.pod Verbatim line length including indents exceeds 79 by 27 +pod/perldebguts.pod Verbatim line length including indents exceeds 79 by -1 pod/perldebtut.pod Verbatim line length including indents exceeds 79 by 3 pod/perldtrace.pod Verbatim line length including indents exceeds 79 by 7 pod/perlgit.pod ? Should you be using F<...> or maybe L<...> instead of 1 -- Perl5 Master Repository
