In perl.git, the branch maint-5.10 has been updated <http://perl5.git.perl.org/perl.git/commitdiff/ca9fba4cf982ceea79271493437c3d8928c9d50d?hp=f3dba27080443db3488db835e838dda26b9de392>
- Log ----------------------------------------------------------------- commit ca9fba4cf982ceea79271493437c3d8928c9d50d Author: [email protected] <[email protected]> Date: Thu Jul 2 11:36:08 2009 +0100 Some bugs in Perl regexp (core Perl issues) "Hugo van der Sanden via RT" <[email protected]> wrote: :This is caused by a failure of the start_class optimization in the case :of lookahead, as per the attached comment. : :In more detail: at the point study_chunk() attempts to deal with the :start_class discovered for the lookahead chunk, we have :SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS | :ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS. [...] :In other words, we need to stack an alternation of ANDs and ORs to cope :with this situation, and we don't have a mechanism to do that except to :recurse into study_chunk() some more. : :A simpler short-term fix is instead to throw up our hands in this :situation, and just nullify start_class. I'm not sure exactly how to do :that, but it seems the more likely to be achievable for 5.10.1. This patch implements the simple fix, and passes all tests including Abigail's test cases for the bug. Yves: note that I've preserved the 'was' code in this chunk, introduced by you in the patch [1], discussed in the thread [2]. As far as I can see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those 3 lines a little later) are simply doing the wrong thing - they seem to be saying "when we combine two start classes using SCF_DO_STCLASS_AND, claim that end-of-string is valid if the first class says it would be even though the second says it wouldn't be". Removing those lines doesn't cause any test failures - can you remember why you introduced those lines, and maybe add a test case that fails without them? Hugo [1] http://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760 [2] http://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89 Message-Id: <[email protected]> rt.perl.org #56690 (cherry picked from commit 906cdd2b284d712169319a6934ba68b578748c8f) M regcomp.c M t/op/re_tests commit 216af8913be9a5749f533260a046236dab01ca88 Author: Jerry D. Hedden <[email protected]> Date: Mon Jun 29 15:13:18 2009 -0400 Unused 'cv' (cherry picked from commit 4ed3fda49b8590b1f2536acfe87ecdec36a6d516) M universal.c commit 1a9552e3eeee512edc9c1b2d018ed2974c5032e6 Author: H.Merijn Brand <[email protected]> Date: Thu Jul 2 12:27:54 2009 +0200 Added docs from Wolfgang Laun to perlpacktut about Intel HEX (cherry picked from commit aa51dd4123784e6e747b83403a96885ffb248802) M pod/perlpacktut.pod ----------------------------------------------------------------------- Summary of changes: pod/perlpacktut.pod | 45 +++++++++++++++++++++++++++++++++++++++++++++ regcomp.c | 21 ++++++++++++++++----- t/op/re_tests | 4 ++-- universal.c | 1 + 4 files changed, 64 insertions(+), 7 deletions(-) diff --git a/pod/perlpacktut.pod b/pod/perlpacktut.pod index 73b2f43..7d2126a 100644 --- a/pod/perlpacktut.pod +++ b/pod/perlpacktut.pod @@ -853,6 +853,51 @@ template for C<pack> and C<unpack> because C<pack> can't determine a repeat count for a C<()>-group. +=head2 Intel HEX + +Intel HEX is a file format for representing binary data, mostly for +programming various chips, as a text file. (See +L<http://en.wikipedia.org/wiki/.hex> for a detailed description, and +L<http://en.wikipedia.org/wiki/SREC_(file_format)> for the Motorola +S-record format, which can be unravelled using the same technique.) +Each line begins with a colon (':') and is followed by a sequence of +hexadecimal characters, specifying a byte count I<n> (8 bit), +an address (16 bit, big endian), a record type (8 bit), I<n> data bytes +and a checksum (8 bit) computed as the least significant byte of the two's +complement sum of the preceding bytes. Example: C<:0300300002337A1E>. + +The first step of processing such a line is the conversion, to binary, +of the hexadecimal data, to obtain the four fields, while checking the +checksum. No surprise here: we'll start with a simple C<pack> call to +convert everything to binary: + + my $binrec = pack( 'H*', substr( $hexrec, 1 ) ); + +The resulting byte sequence is most convenient for checking the checksum. +Don't slow your program down with a for loop adding the C<ord> values +of this string's bytes - the C<unpack> code C<%> is the thing to use +for computing the 8-bit sum of all bytes, which must be equal to zero: + + die unless unpack( "%8C*", $binrec ) == 0; + +Finally, let's get those four fields. By now, you shouldn't have any +problems with the first three fields - but how can we use the byte count +of the data in the first field as a length for the data field? Here +the codes C<x> and C<X> come to the rescue, as they permit jumping +back and forth in the string to unpack. + + my( $addr, $type, $data ) = unpack( "x n C X4 C x3 /a", $bin ); + +Code C<x> skips a byte, since we don't need the count yet. Code C<n> takes +care of the 16-bit big-endian integer address, and C<C> unpacks the +record type. Being at offset 4, where the data begins, we need the count. +C<X4> brings us back to square one, which is the byte at offset 0. +Now we pick up the count, and zoom forth to offset 4, where we are +now fully furnished to extract the exact number of data bytes, leaving +the trailing checksum byte alone. + + + =head1 Packing and Unpacking C Structures In previous sections we have seen how to pack numbers and character diff --git a/regcomp.c b/regcomp.c index ccbe982..49e69b2 100644 --- a/regcomp.c +++ b/regcomp.c @@ -3725,11 +3725,22 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, data->whilem_c = data_fake.whilem_c; } if (f & SCF_DO_STCLASS_AND) { - const int was = (data->start_class->flags & ANYOF_EOS); - - cl_and(data->start_class, &intrnl); - if (was) - data->start_class->flags |= ANYOF_EOS; + if (flags & SCF_DO_STCLASS_OR) { + /* OR before, AND after: ideally we would recurse with + * data_fake to get the AND applied by study of the + * remainder of the pattern, and then derecurse; + * *** HACK *** for now just treat as "no information". + * See [perl #56690]. + */ + cl_init(pRExC_state, data->start_class); + } else { + /* AND before and after: combine and continue */ + const int was = (data->start_class->flags & ANYOF_EOS); + + cl_and(data->start_class, &intrnl); + if (was) + data->start_class->flags |= ANYOF_EOS; + } } } #if PERL_ENABLE_POSITIVE_ASSERTION_STUDY diff --git a/t/op/re_tests b/t/op/re_tests index dddce07..4b0e120 100644 --- a/t/op/re_tests +++ b/t/op/re_tests @@ -1351,8 +1351,8 @@ foo(\h)bar foo\tbar y $1 \t .*?(?:(\w)|(\w))x abx y $1-$2 b- 0{50} 000000000000000000000000000000000000000000000000000 y - - -^a?(?=b)b ab B $& ab # Bug #56690 -^a*(?=b)b ab B $& ab # Bug #56690 +^a?(?=b)b ab y $& ab # Bug #56690 +^a*(?=b)b ab y $& ab # Bug #56690 />\d+$ \n/ix >10\n y $& >10 />\d+$ \n/ix >1\n y $& >1 /\d+$ \n/ix >10\n y $& 10 diff --git a/universal.c b/universal.c index 7788c61..c4e94e9 100644 --- a/universal.c +++ b/universal.c @@ -732,6 +732,7 @@ XS(XS_version_qv) { dVAR; dXSARGS; + PERL_UNUSED_ARG(cv); SP -= items; { SV * ver = ST(0); -- Perl5 Master Repository
