In perl.git, the branch khw/tricky has been created

<http://perl5.git.perl.org/perl.git/commitdiff/2373c1b5d1361ab8bbe954fc8234512fd554e7e7?hp=0000000000000000000000000000000000000000>

        at  2373c1b5d1361ab8bbe954fc8234512fd554e7e7 (commit)

- Log -----------------------------------------------------------------
commit 2373c1b5d1361ab8bbe954fc8234512fd554e7e7
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:42:06 2011 -0700

    re/reg_fold.t: Add and revise comments

M       t/re/reg_fold.t

commit 316393f0b9d30be1862abb6ee1eed22bbaa8b55b
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:35:54 2011 -0700

    reg_fold.t: Test bracketed character classes
    
    These were removed when things were very broken, but now they work,
    except for things like
    
        "\N{LATIN SMALL LIGATURE FFI}" =~ /[a-z]{3}/i
    
    where the multi-char fold crosses single bracketed character class
    boundaries.  These will probably never be fixed in Perl in the general
    case (using \F and fc() instead), but I expect that
    
        "\N{LATIN SMALL LIGATURE FFI}" =~ /[f][f][i]/i
    
    will eventually be changed so the brackets are optimized away, and will
    work.  Then these TODOs will start passing.

M       t/re/reg_fold.t

commit 7b7ef1799255a4a4654ab0d681144b1ed9f2d3f6
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:32:56 2011 -0700

    re/reg_fold.t: Test more code points
    
    This statement was wrong that said all these things are tested in
    fold_grind.t.  It will test them all when run with a particular option,
    but due to time issues, it skips many code points.  reg_fold.t, on the
    other hand, does just basic sanity testing, and so should always test
    every code point for that.

M       t/re/reg_fold.t

commit 526c964465a4dc7ee07f4f5566d4137d0062decb
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:30:20 2011 -0700

    re/reg_fold.t: Remove fixed TODOs
    
    These TODOs have not been tested, mostly, for a while

M       t/re/reg_fold.t

commit 8d56eeaba1f0cb755e32bc94a96998e1e9b8a1e4
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:24:42 2011 -0700

    re/reg_fold.t: Use /u rules for Unicode tests
    
    These tests are for Unicode, so should have /u (instead of /d).

M       t/re/reg_fold.t

commit 13b03be875f7cff6c86906f8145d1c1e304819be
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:20:42 2011 -0700

    regcomp.c: Refactor join_exact() to eliminate extra passes
    
    The strings in every EXACTFish node are examined for certain problematic
    sequences and code points.  Prior to this patch, this was done in
    several passes, but this refactors the routine to do it in a single
    pass.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 8946931a8659c48464c570af7b3dffb5f68aa043
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 25 14:18:55 2011 -0700

    regcomp.c: Modify some comments

M       regcomp.c

commit d1973fb998d4bef100f19357dabf47dc71bb49f0
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 20:19:27 2011 -0700

    regex: Remove FOLDCHAR regnode type
    
    This node type hasn't been used since 5.14.0.  Instead an ANYOFV node
    was generated where formerly a FOLDCHAR node would have been used.  The
    ANYOFV was used because it already existed and was up-to-date, whereas
    FOLDCHAR would have needed some bug fixes to adapt it, even though it
    would be faster in execution than ANYOFV; so the code for it was
    retained in case it was needed.
    
    However, both these solutions were defective, and a previous commit has
    changed things to a different type of solution entirely.  Thus FOLDCHAR
    is obsolescent and can be removed, though the code in it was used as a
    base for some of the new solutions.

M       regcomp.c
M       regcomp.sym
M       regexec.c
M       regnodes.h

commit 1ad0d7f33588302f27cc3eecf5a4d102762137b1
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 20:11:22 2011 -0700

    regex: Fix some tricky fold problems
    
    As described in the comments, this changes the design of handling the
    Unicode tricky fold characters to not generate a node for each possible
    sequence but to get them to work within EXACTFish nodes.
    
    The previous design(s) all used a node to handle these, which suffers
    from the downfall that it precludes legitimate matches that would cross
    the node boundary.
    
    The new design is described in the comments.

M       regcomp.c
M       regcomp.h
M       t/re/re_tests

commit 3197feb2dbc98026252aed7c98a6cf46118a1358
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 19:46:10 2011 -0700

    regcomp.c: Rework join_exact()
    
    This re formats and refactors portions of join_exact() that look for the
    tricky Greek fold sequences.  I renamed various variables, etc, to help
    me understand what was going on.  It turns out that there were two
    off-by-one bugs that prevented this from working properly.
    
    The first bug had the loop quit one too soon  The boundary should be
    "<=", and not strictly less-than.  This means that if the sequence is
    the last thing in the string (or only thing) it will not be found.
    The other bug had the end-needle parameter be 1 too short, which means
    that this would succeed with  only the first 3 bytes of the sequence
    (now called 'tail'), thus matching many more things than it should
    (provided it got the chance to match at all given the first bug).

M       regcomp.c

commit 20483d45a6c99ac58d8c7e727e3aee01a3149273
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 19:37:36 2011 -0700

    regex: Add new node type EXACTFU_NO_TRIE
    
    This new node is like EXACTFU but is not currently trie'able.  This adds
    handling for it in regexec.c, but it is not currently generated; this
    commit is preparing for future commits

M       regcomp.c
M       regcomp.sym
M       regexec.c
M       regnodes.h

commit 2a9220c17c855b18663a9b5c4c7acdfee98712a6
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 19:30:09 2011 -0700

    regex: Add new node type EXACTFU_SS
    
    This node will be used to distinguish between the case in a non-UTF8
    pattern and string where something could be matched that is of different
    lengths.  The only instance where this can happen is the LATIN SMALL
    LETTER SHARP S can match the sequences "ss", "Ss", "sS", or "SS", hence
    the name.
    
    This node is not currently generated; this prepares for future commits

M       regcomp.c
M       regcomp.sym
M       regexec.c
M       regnodes.h

commit f8b1e2fade1b5046a8f1d6493876ac4023c6dc9c
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 19:13:24 2011 -0700

    regcomp.c: Need to account for delta sizes
    
    When a node can match varying sizes, the delta variable in the optimizer
    needs to change to account for that, and it can no longer match a fixed
    length string.
    
    This code was adapted from the existing code for the FOLDCHAR node that
    has to deal with the same problem.

M       regcomp.c

commit a9d15093c69d925c66bb4067fc5eb3fa74a43344
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 18:51:45 2011 -0700

    regcomp.c: Change param to join_exact()
    
    This changes a parameter to this function to instead of changing a running
    total, return the actual value computed by the function; and it changes
    the calling areas of code to compensate.

M       embed.fnc
M       proto.h
M       regcomp.c

commit 8b516b3ea741601171eac6e8de45e0340c9e0ca0
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 16:58:31 2011 -0700

    perlunicode: nit

M       pod/perlunicode.pod

commit 5791593ce7a0466e027f8f0be7e9ab80c1fc22e4
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 12:24:09 2011 -0700

    regcomp.c: regex start class for sharp s
    
    Under most folding types, the optimizer start class should include all
    of s, S, and the sharp s (\xdf) if it includes any of them.  The code
    was neglecting the latter.  This is currently not relevant, as there is
    special handling of the sharp s elsewhere in regcomp.c.  But this is a
    step to changing that special handling to fix some bugs.

M       regcomp.c

commit 1e4ce0b4ef6f349abb2cc48006c4823839665d23
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 08:48:07 2011 -0700

    regcomp.c: white-space only and comments only

M       regcomp.c

commit 9d0bd308bcede234634a1749efe48171092bee4b
Author: Karl Williamson <[email protected]>
Date:   Fri Dec 23 08:42:17 2011 -0700

    regcomp.c: Save computed value in variable for later use
    
    This will be used in future commits.  Retrieving it via OP() doesn't
    work in pass1 of the regex compiler.

M       regcomp.c

commit 5ac5d9023ba9865ab255d9da42611d9f705eb9b8
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 22 20:09:11 2011 -0700

    regcomp.c: Make sure trie can handle node passed to it

M       regcomp.c

commit 7f0c80d29600ee862e626a53799cdb8745ee9688
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 22 20:03:55 2011 -0700

    regexec.c: white space only

M       regexec.c

commit dbd139cd57ea313ff632ca940d311a211e2895d4
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 22 19:51:37 2011 -0700

    regexec.c: EXACTF nodes can never be UTF
    
    By definition a regex pattern that is in UTF-8 uses Unicode matching
    rules, and EXACTF is non-Unicode (unless the target string is UTF-8).
    Therefore an EXACTF node will never be generated for a UTF-8 pattern,
    and there is no need to test for it being so.

M       regexec.c

commit aa157951fcf655cf0c8bc8cc644e4d01277d36cd
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 22 17:58:20 2011 -0700

    regcomp.c: Silence valgrind warning
    
    This happens only in doing debug output.  Initialize these two debugging
    variables

M       regcomp.c

commit 78dee5f2d1bd1184e5038416777b3681b8eadfe9
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 22 14:29:12 2011 -0700

    regexp_noamp.t: Add comment

M       t/re/regexp_noamp.t

commit e468997e35db3e4be53adfd0111bd95fea55787c
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 21 09:57:43 2011 -0700

    t/re/re_tests: Add some tests

M       t/re/re_tests

commit d69b77426ba803ae2194eec18b6d0821fa72a751
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 21 09:54:38 2011 -0700

    t/re/re_tests: revise test
    
    This is the wrong test for the cited ticket.  That one is for tests
    occurring in bracketed character classes.

M       t/re/re_tests

commit cc3a798902d3116960c536f616194a668ee32f4e
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 21 09:53:41 2011 -0700

    t/re/re_tests: Update comment
    
    This reflects that now that there is autoloading of \N{}, such tests can
    go in this file

M       t/re/re_tests

commit 1a071a476b71f6295442e624a5bd6f544a2f8f6b
Author: Karl Williamson <[email protected]>
Date:   Tue Dec 20 09:28:47 2011 -0700

    util.c: Add comment

M       util.c

commit bc417572dc9080f3602a3cf498141ff3e70b44ed
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 18 13:27:06 2011 -0700

    regcomp.c: Don't print incorrect debug info
    
    The break out of the loop should be done before the debug statements
    that indicate the things that happen only if the break isn't done.

M       regcomp.c

commit 39be665482910c9448d2b8abd039492f9193efed
Author: Karl Williamson <[email protected]>
Date:   Sun Dec 18 12:22:11 2011 -0700

    regcomp.sym: Change comments

M       regcomp.sym
M       regnodes.h
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to