In perl.git, the branch smoke-me/trie3 has been created

<http://perl5.git.perl.org/perl.git/commitdiff/12ab89e997a72ef628e340e8dd2a5a61d9a0e1e5?hp=0000000000000000000000000000000000000000>

        at  12ab89e997a72ef628e340e8dd2a5a61d9a0e1e5 (commit)

- Log -----------------------------------------------------------------
commit 12ab89e997a72ef628e340e8dd2a5a61d9a0e1e5
Author: Yves Orton <[email protected]>
Date:   Sun Feb 19 21:32:05 2012 +0100

    rework how the trie logic handles the newer EXACT nodetypes
    
    This cleans up and simplifies and extends how the trie
    logic interacts with the new node types. This change ultimately
    makes the EXACTFU, EXACTFU_SS, EXACTFU_NO_TRIE (renamed to
    EXACTFU_TRICKYFOLD) work properly with the trie engine regardless
    of whether the string is utf8 or latin1.
    
    This patch depends on the following:
    
        EXACT              => utf8 or "binary" text
    
        EXACTFU            => either pre-folded utf8, or latin1 that has to be 
folded as though it was utf8
        EXACTFU_SS         => special case of EXACTFU to handle \xDF/ss 
(affects latin1 treatment)
        EXACTFU_TRICKYFOLD => special case of EXACTFU to handle tricky 
non-latin1 fold rules
    
        EXACTF             => "old style fold logic" untriable nodetype
        EXACTFA            => (currently) untriable nodetype
        EXACTFL            => (currently) untriable nodetype
    
    See the comments in regcomp.sym for these fold types.
    
    This patch involves a number of distinct, but related parts. Starting
    from compilation:
    
    * Simplify how we detect a triable sequence given the new nodetypes,
      this also probably fixed some "bugs" in how we detected certain
      sequences, like /||foo|bar/.
    
    * Simplify how we read EXACTFU nodes under utf8 by removing the now
      redundant folding logic (EXACTFU nodes under utf8 are prefolded).
      Also extend this logic to handle latin1 patterns properly (in
      conjunction with  other changes)
    
    * Part of the problems associated with EXACTFU_SS and EXACTFU_TRICKYFOLD
      have to do with how the trie logic interacts with the minlen logic.
      This change handles both by pessimising the minlen when encounting
      these nodetypes. One observation is that the minlen logic is basically
      broken, and works only because it conflates bytes and codepoints in
      such a way that we more or less always get a value small enough that 
things work out
      anyway. Fixing that is properly is the job of another patch.
    
    * Part of the problem of doing folding under unicode rules is that
      there are a lot of foldings possible, some with strange rules. This
      means that the bitmap logic does not work correctly in all cases,
      as we currently do not have any way to populate it properly.
      So this patch disables the bitmap entirely when folding is involved
      until that is fixed.
    
    The end result of this is: we can TRIE/AHOCORASICK any sequence of
    EXACT, or EXACTFU (ish) nodes, regardless of utf8 or not, but we disable
    the bitmap when folding.
    
    A note for follow up relating to this patch is that the way EXACTFU_XXX
    nodes are currently dealt with we wont build the "maximal" trie because
    of their presence, instead creating a "jumptrie" consisting of either a
    leading EXACTFU node followed by a EXACTFU_XXX node, or vice versa. We
    should eventually address that.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c
M       regcomp.sym
M       regexec.c
M       regnodes.h
M       t/re/fold_grind.t

commit bef064b88fbce5d166d4bb4f89cc73f0e42bc3c9
Author: Yves Orton <[email protected]>
Date:   Sun Feb 19 21:04:44 2012 +0100

    make test.pl show test number and name in failure diagnostics output
    
    The old output would show only the line number as diagnostics
    but not the test number, nor the test name, which often contains
    very useful information. This patch makes sure this is visible in
    the diagnostics output of test failures.

M       t/test.pl
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to