[perl.git] branch smoke-me/khw-regex, created. v5.17.2-158-gff4eb3e

Karl Williamson Wed, 01 Aug 2012 17:36:01 -0700

In perl.git, the branch smoke-me/khw-regex has been created

<http://perl5.git.perl.org/perl.git/commitdiff/ff4eb3ec0b27d3e28b7143b80c1952fa0abbdb6c?hp=0000000000000000000000000000000000000000>


        at  ff4eb3ec0b27d3e28b7143b80c1952fa0abbdb6c (commit)

- Log -----------------------------------------------------------------
commit ff4eb3ec0b27d3e28b7143b80c1952fa0abbdb6c
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 1 18:28:59 2012 -0600

    regcomp.c: Remove unnecessary variable
    
    This variable was used because another was declared 'register'.  But
    that declaration was removed making the temporary variable redundant.

M       regcomp.c

commit 258a252d88eb165a260bad61e908102294248c67
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 1 17:04:13 2012 -0600

    regcomp.c: inline trivial static function

M       embed.fnc
M       proto.h
M       regcomp.c

commit 926425d4dcedb8d65a283799902030b7e2da33ba
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 1 15:12:23 2012 -0600

    regcomp.c: Fix \N{} multi-char fold buffer boundary bug
    
    An earlier commit in this topic branch fixed the bug (for non-\N{})
    cases where a multi-character fold could try to span two EXACTFish
    nodes, where they are split because the first one would otherwise
    contain too long a string.
    
    This commit extends that fix to include characters entered via \N{...}.
    It does this by causing \N handling to be split, so that if the \N
    resolves to a single code point, it goes through the normal processing,
    so that it no longer bypasses the code that was added in the earlier
    commit.

M       regcomp.c
M       t/re/pat_advanced.t
M       t/re/re_tests

commit ebec2cd2b98aa519ebde1cdda31d186906f2b257
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 1 14:49:39 2012 -0600

    regcomp.c: Revise API for static function
    
    This is to allow future changes.   The function now returns success or
    failure, and the created regnode (if any) is set via a parameter
    pointer.
    
    I removed the 'register' declaration to get this to work, because
    such declarations are considered bad form these days, e.g.,
    
http://stackoverflow.com/questions/314994/whats-a-good-example-of-register-variable-usage-in-c

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 6ae2acd94504fcbaf0bce3c8b229731d75f0b7bf
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 13:09:38 2012 -0600

    regcomp.c: Fix multi-char fold bug
    
    Input text to be matched under /i is placed in EXACTFish nodes.  The
    current limit on such text is 255 bytes per node.  Even if we raised
    that limit, it will always be finite.  If the input text is longer than
    this, it is split across 2 or more nodes.  A problem occurs when that
    split occurs within a potential multi-character fold.  For example, if
    the final character that fits in a node is 'f', and the next character
    is 'i', it should be matchable by LATIN SMALL LIGATURE FI, but because
    Perl isn't structured to find multi-char folds that cross node
    boundaries, we will miss this it.
    
    The solution presented here isn't optimum.  What we do is try to prevent
    all EXACTFish nodes from ending in a character that could be at the
    beginning or middle of a multi-char fold.  That prevents the problem.
    But in actuality, the problem only occurs if the input text is actually
    a multi-char fold, which happens much less frequently.  For example,
    we try to not end a full node with an 'f', but the problem doesn't
    actually occur unless the adjacent following node begins with an 'i' (or
    one of the other characters that 'f' participates in).  That is, this
    patch splits when it doesn't need to.
    
    At the point of execution for this patch, we only know that the final
    character that fits in the node is that 'f'.  The next character remains
    unparsed, and could be in any number of forms, a literal 'i', or a hex,
    octal, or named character constant, or it may need to be decoded (from
    'use encoding').  So look-ahead is not really viable.
    
    So finding if a real multi-character fold is involved would have to be
    done later in the process, when we have full knowledge of the nodes, at
    the places where join_exact() is now called, and would require inserting
    a new node(s) in the middle of existing ones.
    
    This solution seems reasonable instead.
    
    It does not yet address named character constants (\N{}) which currently
    bypass the code added here.

M       embedvar.h
M       handy.h
M       intrpvar.h
M       regcomp.c
M       sv.c
M       t/re/pat_advanced.t

commit 2bc2d2fe9b12d381cbada518ba3ce28355b22cd0
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:55:42 2012 -0600

    mktables: Generate tables for chars that aren't in final fold pos
    
    This starts with the existing table that mktables generates that lists
    all the characters in Unicode that occur in multi-character folds, and
    aren't in the final positions of any such fold.
    
    It generates data structures with this information to make it quickly
    available to code that wants to use it.  Future commits will use these
    tables.

M       charclass_invlists.h
M       handy.h
M       l1_char_class_tab.h
M       regen/mk_PL_charclass.pl
M       regen/mk_invlists.pl

commit 8577e0dac3485c96e2919d41fa86e48092316fc2
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:44:55 2012 -0600

    regen/mk_invlists: Add mode to generate above-Latin1 only
    
    This change adds the ability to specify that an output inversion list is
    to contain only those code points that are above Latin-1.  Typically,
    the Latin-1 ones will be accessed from some other means.

M       regen/mk_invlists.pl

commit 913ff4783f932cf0502bbefcec21ab95cb882aa3
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:38:41 2012 -0600

    Unicode::UCD::prop_invlist() Allow to return internal property
    
    This creates an optional undocumented parameter to this function to
    allow it to return the inversion list of an internal-only Perl property.
    This will be used by other functions in Perl, but should not be
    documented, as we don't want to encourage the use of internal-only
    properties, which are subject to change or removal without notice.

M       lib/Unicode/UCD.pm

commit aac234eaaf5a264d27a4a1811f49a201ea393464
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:37:52 2012 -0600

    mktables: Add comment to gen'd data file

M       lib/unicore/mktables

commit 7566eca19b016c381b1e162b3f1aef64bb395a20
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:22:41 2012 -0600

    mktables: grammar in comments

M       lib/unicore/mktables

commit 627e4d0d05b595e152987754698fae41192c4321
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 12:20:42 2012 -0600

    regen/mk_PL_charclass.pl: Remove obsolete code
    
    Octals are no longer checked via this mechanism.

M       regen/mk_PL_charclass.pl

commit f6d909b7f5efab877ff6837f9ca0abe5645b8d9b
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 11:51:43 2012 -0600

    regcomp.c: Make invlist_search() usable from re_comp.c
    
    This was a static function which I couldn't get to be callable from the
    debugging version of regcomp.c.  This makes it public, but known only
    in the regcomp.c source file.  It changes the name to begin with an
    underscore so that if someone cheats by adding preprocessor #defines,
    they still have to call it with the name that convention indicates is a
    private function.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit c032738a2075d08b1a49cc7a1a753013cf2ae4e6
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 18 11:41:18 2012 -0600

    perlop:clarify wording

M       pod/perlop.pod

commit 7a47bd3f8663ea72d56d91e5bf497ec44d0fffb3
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 16 20:02:07 2012 -0600

    regcomp.c: Rename static fcn to better reflect its purpose
    
    This function handles \N of any ilk, not just named sequences.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit e30cd0208e00434d7a2f987e050bc18ee0e31d1a
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 16 19:55:15 2012 -0600

    regcomp.c: Make comment more accurate

M       regcomp.c

commit f2d96eb391b4b2b6de8e8def742b266365c62f96
Author: Karl Williamson <[email protected]>
Date:   Sat Jun 16 19:52:12 2012 -0600

    regcomp.c: Can now do /u instead of forcing to utf8
    
    Now that there is a /u modifier, a regex doesn't have to be in UTF-8 in
    order to force Unicode semantics.  Change this relict from the past.

M       regcomp.c

commit 7c349068b656634c22086db96c361253046107bb
Author: Karl Williamson <[email protected]>
Date:   Wed Jun 6 15:02:43 2012 -0600

    regcomp.c: Comments update
    
    This adds some comments and white-space lines, and updates other
    comments to account for the fact that trie handling has changed since
    they were written.

M       regcomp.c

commit c38e58a6a0f111f6935c51fee1bf00063c7558c4
Author: Karl Williamson <[email protected]>
Date:   Mon May 28 10:49:37 2012 -0600

    regcomp.c: Remove variable whose value needed just once
    
    Previous commits have removed all but one instance of using this
    variable, so just use the expression it equates to.

M       regcomp.c

commit 25790fa516b96a1d5585d46e8fbd2bb58278968c
Author: Karl Williamson <[email protected]>
Date:   Mon May 28 10:42:03 2012 -0600

    regcomp.c: White-space only
    
    This indents and outdents to compensate for newly formed and orphan
    blocks, respectively; and reflows comments to fit in 80 columns

M       regcomp.c

commit fb9c004ac36c5b4d2ef6ecf4b7e0e8eb77599c3d
Author: Karl Williamson <[email protected]>
Date:   Sun May 27 01:08:46 2012 -0600

    regcomp.c: Trade stack space for time
    
    Pass 1 of regular expression compilation merely calculates the size it
    will need. (Note that Yves and I both think this is very suboptimal
    behavior.)  Nothing is written out during this pass, but sizes are
    just incremented.  The code in regcomp.c all knows this, and skips
    writing things in pass 1.  However, when folding, code in other files is
    called which doesn't have this size-only mode, and always writes its
    results out.  Currently, regcomp handles this by passing to that code a
    temporary buffer allocated for the purpose.  In pass1, the result is
    simply ignored; in pass2, the results are copied to the correct final
    destination.
    
    We can avoid that copy by making the temporary buffer large enough to
    hold the whole node, and in pass1, use it instead of the node.  The
    non-regcomp code writes to the same relative spot in the buffer that it
    will use for the real node.  In pass2 the real destination is used, and
    the fold gets written directly to the correct spot.
    
    Note that this increases the size pushed onto the stack, but code is
    ripped out as well.
    
    However, the main reason I'm doing this is not this speed-up; it is
    because it is needed by future commits to fix a bug.

M       regcomp.c

commit 43312570a1135c05fdc9153dddd1bd9826d6c242
Author: Karl Williamson <[email protected]>
Date:   Sun May 27 01:04:39 2012 -0600

    regcomp.c: Use mnemonic not numeric constant
    
    Future commits will add other uses of this number.

M       regcomp.c

commit 04b71f26db00eb7309b9efd417c722de571f0e2c
Author: Karl Williamson <[email protected]>
Date:   Sat May 26 22:19:22 2012 -0600

    regcomp.c: Resolve EBCDIC inconsistency towards simpler
    
    This code has assumed that to_uni_fold() returns its folds in Unicode
    (i.e.  Latin1) rather than native EBCDIC.  Other code in the core
    assumes the opposite.  One has to change.  I'm changing this one, as the
    issues should be dealt with at the lowest level possible, which is in
    to_uni_fold().  Since we don't currently have an EBCDIC platform to test
    on, making sure that it all hangs together will have to be deferred
    until such time as we do.
    
    By doing this we make this code simpler and faster.  The fold has
    already been calculated, we just need to copy it to the final place
    (done in pass2).

M       regcomp.c

commit 60edcac7b15f283ef98d01f7f5f0ffc2efdbbf7f
Author: Karl Williamson <[email protected]>
Date:   Sat May 26 21:39:32 2012 -0600

    regcomp.c: Use function instead of repeating its code
    
    A new flag to to_uni_fold() causes it to do the same work that this code
    does, so just call it.

M       regcomp.c

commit a07ddcd0de968b471beb3bc096b8a7bb79bbfc2c
Author: Karl Williamson <[email protected]>
Date:   Sat May 26 14:19:18 2012 -0600

    regcomp.c: Remove (almost) duplicate code
    
    A previous commit opened the way to refactor this so that the two
    fairly lengthy code blocks that are identical (except for changing the
    variable <len>) can have one of them removed.

M       regcomp.c

commit 4755d0e71f789f07e22459fc3e014e56c3aff9e2
Author: Karl Williamson <[email protected]>
Date:   Thu May 24 22:14:04 2012 -0600

    regcomp.c: Refactor so can remove duplicate code
    
    This commit prepares the way for a later commit to remove a chunk of
    essentially duplicate code.  It does this at the cost of an extra
    test of a boolean each time through the loop.  But, it saves calculating
    the fold unless necessary, a potentially expensive operation.  When the
    next input is a quantifier that calculated fold is discarded, unused.
    This commit avoids doing that calculation when the next input is a
    quantifier.

M       regcomp.c

commit b7fc285b28ae6eef2a1cb403711f53878e66fccc
Author: Karl Williamson <[email protected]>
Date:   Thu May 24 21:39:58 2012 -0600

    Revert "regcomp.c: Move duplicated code to inline function"
    
    This reverts commit 1ceb3049131abe6184db5a55104a620ffea6958d.

M       regcomp.c

commit 631ee5bced178aaccdace78a8a32113abae49d35
Author: Karl Williamson <[email protected]>
Date:   Sun May 6 08:10:33 2012 -0600

    regcomp.c: Move duplicated code to inline function
    
    This simply extracts the code to one function with only required
    ancillary changes.  Later commits will clean things up

M       regcomp.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-regex, created. v5.17.2-158-gff4eb3e

Reply via email to