[perl.git] branch maint-5.10, updated. GitLive-maint-5.10-1473-gca9fba4

David Mitchell Fri, 03 Jul 2009 06:27:12 -0700

In perl.git, the branch maint-5.10 has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/ca9fba4cf982ceea79271493437c3d8928c9d50d?hp=f3dba27080443db3488db835e838dda26b9de392>


- Log -----------------------------------------------------------------
commit ca9fba4cf982ceea79271493437c3d8928c9d50d
Author: [email protected] <[email protected]>
Date:   Thu Jul 2 11:36:08 2009 +0100

    Some bugs in Perl regexp (core Perl issues)
    
    "Hugo van der Sanden via RT" <[email protected]> wrote:
    :This is caused by a failure of the start_class optimization in the case
    :of lookahead, as per the attached comment.
    :
    :In more detail: at the point study_chunk() attempts to deal with the
    :start_class discovered for the lookahead chunk, we have
    :SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
    :ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.
    [...]
    :In other words, we need to stack an alternation of ANDs and ORs to cope
    :with this situation, and we don't have a mechanism to do that except to
    :recurse into study_chunk() some more.
    :
    :A simpler short-term fix is instead to throw up our hands in this
    :situation, and just nullify start_class. I'm not sure exactly how to do
    :that, but it seems the more likely to be achievable for 5.10.1.
    
    This patch implements the simple fix, and passes all tests including
    Abigail's test cases for the bug.
    
    Yves: note that I've preserved the 'was' code in this chunk, introduced
    by you in the patch [1], discussed in the thread [2]. As far as I can
    see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those
    3 lines a little later) are simply doing the wrong thing - they seem
    to be saying "when we combine two start classes using SCF_DO_STCLASS_AND,
    claim that end-of-string is valid if the first class says it would be
    even though the second says it wouldn't be". Removing those lines doesn't
    cause any test failures - can you remember why you introduced those lines,
    and maybe add a test case that fails without them?
    
    Hugo
    
    [1] 
http://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760
    [2] 
http://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89
    
    Message-Id: <[email protected]>
    rt.perl.org #56690
    
    (cherry picked from commit 906cdd2b284d712169319a6934ba68b578748c8f)

M       regcomp.c
M       t/op/re_tests

commit 216af8913be9a5749f533260a046236dab01ca88
Author: Jerry D. Hedden <[email protected]>
Date:   Mon Jun 29 15:13:18 2009 -0400

    Unused 'cv'
    
    (cherry picked from commit 4ed3fda49b8590b1f2536acfe87ecdec36a6d516)

M       universal.c

commit 1a9552e3eeee512edc9c1b2d018ed2974c5032e6
Author: H.Merijn Brand <[email protected]>
Date:   Thu Jul 2 12:27:54 2009 +0200

    Added docs from Wolfgang Laun to perlpacktut about Intel HEX
    
    (cherry picked from commit aa51dd4123784e6e747b83403a96885ffb248802)

M       pod/perlpacktut.pod
-----------------------------------------------------------------------

Summary of changes:
 pod/perlpacktut.pod |   45 +++++++++++++++++++++++++++++++++++++++++++++
 regcomp.c           |   21 ++++++++++++++++-----
 t/op/re_tests       |    4 ++--
 universal.c         |    1 +
 4 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/pod/perlpacktut.pod b/pod/perlpacktut.pod
index 73b2f43..7d2126a 100644
--- a/pod/perlpacktut.pod
+++ b/pod/perlpacktut.pod
@@ -853,6 +853,51 @@ template for C<pack> and C<unpack> because C<pack> can't 
determine
 a repeat count for a C<()>-group.
 
 
+=head2 Intel HEX
+
+Intel HEX is a file format for representing binary data, mostly for
+programming various chips, as a text file. (See
+L<http://en.wikipedia.org/wiki/.hex> for a detailed description, and
+L<http://en.wikipedia.org/wiki/SREC_(file_format)> for the Motorola
+S-record format, which can be unravelled using the same technique.)
+Each line begins with a colon (':') and is followed by a sequence of
+hexadecimal characters, specifying a byte count I<n> (8 bit),
+an address (16 bit, big endian), a record type (8 bit), I<n> data bytes
+and a checksum (8 bit) computed as the least significant byte of the two's
+complement sum of the preceding bytes. Example: C<:0300300002337A1E>.
+
+The first step of processing such a line is the conversion, to binary,
+of the hexadecimal data, to obtain the four fields, while checking the
+checksum. No surprise here: we'll start with a simple C<pack> call to 
+convert everything to binary:
+
+   my $binrec = pack( 'H*', substr( $hexrec, 1 ) );
+
+The resulting byte sequence is most convenient for checking the checksum.
+Don't slow your program down with a for loop adding the C<ord> values
+of this string's bytes - the C<unpack> code C<%> is the thing to use
+for computing the 8-bit sum of all bytes, which must be equal to zero:
+
+   die unless unpack( "%8C*", $binrec ) == 0;
+
+Finally, let's get those four fields. By now, you shouldn't have any
+problems with the first three fields - but how can we use the byte count
+of the data in the first field as a length for the data field? Here
+the codes C<x> and C<X> come to the rescue, as they permit jumping
+back and forth in the string to unpack.
+
+   my( $addr, $type, $data ) = unpack( "x n C X4 C x3 /a", $bin ); 
+
+Code C<x> skips a byte, since we don't need the count yet. Code C<n> takes
+care of the 16-bit big-endian integer address, and C<C> unpacks the
+record type. Being at offset 4, where the data begins, we need the count.
+C<X4> brings us back to square one, which is the byte at offset 0.
+Now we pick up the count, and zoom forth to offset 4, where we are
+now fully furnished to extract the exact number of data bytes, leaving
+the trailing checksum byte alone.
+
+
+
 =head1 Packing and Unpacking C Structures
 
 In previous sections we have seen how to pack numbers and character
diff --git a/regcomp.c b/regcomp.c
index ccbe982..49e69b2 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -3725,11 +3725,22 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode 
**scanp,
                     data->whilem_c = data_fake.whilem_c;
                 }
                 if (f & SCF_DO_STCLASS_AND) {
-                    const int was = (data->start_class->flags & ANYOF_EOS);
-
-                    cl_and(data->start_class, &intrnl);
-                    if (was)
-                        data->start_class->flags |= ANYOF_EOS;
+                   if (flags & SCF_DO_STCLASS_OR) {
+                       /* OR before, AND after: ideally we would recurse with
+                        * data_fake to get the AND applied by study of the
+                        * remainder of the pattern, and then derecurse;
+                        * *** HACK *** for now just treat as "no information".
+                        * See [perl #56690].
+                        */
+                       cl_init(pRExC_state, data->start_class);
+                   }  else {
+                       /* AND before and after: combine and continue */
+                       const int was = (data->start_class->flags & ANYOF_EOS);
+
+                       cl_and(data->start_class, &intrnl);
+                       if (was)
+                           data->start_class->flags |= ANYOF_EOS;
+                   }
                 }
            }
 #if PERL_ENABLE_POSITIVE_ASSERTION_STUDY
diff --git a/t/op/re_tests b/t/op/re_tests
index dddce07..4b0e120 100644
--- a/t/op/re_tests
+++ b/t/op/re_tests
@@ -1351,8 +1351,8 @@ foo(\h)bar        foo\tbar        y       $1      \t
 .*?(?:(\w)|(\w))x      abx     y       $1-$2   b-
 
 0{50}  000000000000000000000000000000000000000000000000000     y       -       
-
-^a?(?=b)b      ab      B       $&      ab      # Bug #56690
-^a*(?=b)b      ab      B       $&      ab      # Bug #56690
+^a?(?=b)b      ab      y       $&      ab      # Bug #56690
+^a*(?=b)b      ab      y       $&      ab      # Bug #56690
 />\d+$ \n/ix   >10\n   y       $&      >10
 />\d+$ \n/ix   >1\n    y       $&      >1
 /\d+$ \n/ix    >10\n   y       $&      10
diff --git a/universal.c b/universal.c
index 7788c61..c4e94e9 100644
--- a/universal.c
+++ b/universal.c
@@ -732,6 +732,7 @@ XS(XS_version_qv)
 {
     dVAR;
     dXSARGS;
+    PERL_UNUSED_ARG(cv);
     SP -= items;
     {
        SV * ver = ST(0);

--
Perl5 Master Repository

[perl.git] branch maint-5.10, updated. GitLive-maint-5.10-1473-gca9fba4

Reply via email to