In perl.git, the branch smoke-me/khw-optimizer has been created

<http://perl5.git.perl.org/perl.git/commitdiff/596608ffd648c76a783af092d349586436e55010?hp=0000000000000000000000000000000000000000>

        at  596608ffd648c76a783af092d349586436e55010 (commit)

- Log -----------------------------------------------------------------
commit 596608ffd648c76a783af092d349586436e55010
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 5 22:40:54 2013 -0600

    Enlarge dummy regex pass1 compilation node
    
    In pass 1 of compiling regular expressions, the needed size is
    calculated.  There is space allocated for a scratch node that can be
    used for the things that the real one will hold in pass 2.  It is valid
    only while working on the current node, and gets overwritten in the next
    node.
    
    Until this commit, this scratch space was sized only for the smallest
    node type, meaning that larger types could not use it for scratch.  Now
    it is sized to be the largest non EXACTish node.
    
    We could make it an array of 256 + overhead bytes instead to be able to
    hold the EXACTish nodes, but I don't see a need for that now.

M       regcomp.c
M       regcomp.h

commit 22841c083d4b5fd8459a36626b90c15527cbb006
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 15:27:08 2013 -0600

    regcomp.c: Use STR_WITH_LEN to avoid bookkeeping
    
    By changing the order of the parameters to the static function
    S_add_data, we can call it with STR_WITH_LEN and avoid a human having to
    count characters.

M       embed.fnc
M       proto.h
M       regcomp.c

commit adce5e2e5a87c7531477f7945d45a64e22718494
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 15:07:44 2013 -0600

    Rename regex flag bit for clarity
    
    ANYOF_UNICODE_ALL doesn't mean every Unicode code point.  It means those
    above the Latin1 range.  Rename it, while retaining the old one for back
    compat.

M       regcomp.c
M       regcomp.h
M       regexec.c

commit c647635de8d658a09874eb9d381570e4c0def382
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:55:16 2013 -0600

    regcomp.c: Better DEBUGGING builds error detection
    
    The code had a default: catch-all in the switch statement, but the
    comments indicated that it was uncertain what all was being caught.
    This changes this to panic only in DEBUGGING builds so that we can find
    out if there are indeed other possibilities that we haven't handled, and
    which could use better handling than the default, match everything.
    The two known possibilities are given separate case: statements in
    preparation for handling them differently.

M       regcomp.c

commit e12338af9827ba0bb9af49d827699672e1063563
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:49:37 2013 -0600

    regcomp.c: Change some static parameters to const
    
    I found I needed const in a planned future commit.

M       embed.fnc
M       proto.h
M       regcomp.c

commit 8b1e55b7ad70633625975e188b1d72d107ca7e96
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:27:53 2013 -0600

    Retain an inversion list's mortality in its replacement
    
    A couple of inversion list handling functions end up sometimes creating
    a new inversion list, replacing the old one instead of modifying it.
    This commit causes the replacement list to have the same mortality or
    not of the old one.  That is, mortality is now preserved across these
    operations.

M       regcomp.c

commit bd13e52626d5e1a6dd079bfe8a81504b52085558
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:04:43 2013 -0600

    perl.c: Clean up some SV*s at termination
    
    These were omitted from cleaning up when PERL_DESTRUCT_LEVEL is non-zero

M       perl.c

commit aae90a9c3204f65ce0ec49239abb60cfb1b584fb
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 11:19:02 2013 -0600

    regcomp.c: Add parameter to static function
    
    This parameter will be used in future commits.  This commit is really
    only to make the difference listing smaller in those, by committing
    separately just the book-keeping parts.  This parameter requires also
    passing the aTHX_ thread parameter

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 2e56a64eaf0804a17cedc29b5886a92990e20e77
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:59:01 2013 -0600

    Remove PL_ASCII; use existing array slots for it
    
    PL_ASCII contains an inversion list to match the ASCII-range code
    points.  It is unusable outside the core regular expression code because
    all the functions that manipulate inversion lists are defined only
    within a few core files.  Therefore no outside code should be depending
    on it.
    
    It turns out that there are arrays of similar inversion lists, and these
    all have slots which should have this inversion list in them.  This
    commit fills them, instead of using PL_ASCII.

M       embedvar.h
M       intrpvar.h
M       regcomp.c
M       sv.c

commit 1a53a6857217f98bf8777dc20f512531d1df8a97
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:51:24 2013 -0600

    regcomp.c: Typos in comments; Fix another comment
    
    The non-typo fix is the result of allowing a parameter to the function
    be NULL, and not updating the comments to reflect that.

M       regcomp.c

commit 2f99c5022cafb68ac83a5b074573e10e104235dd
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:39:14 2013 -0600

    regcomp.c: Fix syntax error in #ifdef'd out code
    
    This line is currently not compiled, but would fail if the #ifdef is
    changed.

M       regcomp.c

commit 986bebd9a1b97fc2eef8350630536b299af1594e
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:36:29 2013 -0600

    perl.h: Don't pollute global namespace
    
    These structures are used internally in the regular expression files,
    and are declared here only because of #include ordering issues.  Wrap
    them in an #ifdef so only visible to the correct files.

M       perl.h

commit 7bb309bbde214055eb8abe7d5cd2b4998a8cc954
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 21:13:52 2013 -0600

    Make typedef fully typedef
    
    The regcomp.c struct RExC_state_t has not been usable fully as a
    typedef, requiring the 'struct' at times.  This has caused me, and I
    presume others, wasted time when we forget to use it under those
    circumstances when it should be used, but it's never been a big enough
    issue to cause me to spend tuits on it.  But, working on something else,
    I finally came to the realization of what the problem is.  It is because
    proto.h is #included before regcomp.h is, and so functions that are
    declared in proto.h that have something that is a RExC_state_t as a
    parameter don't know that it is a typedef because that is defined in
    regcomp.h.  A way around this is already used for other similar
    structures, and that is to declare them in perl.h which is always read
    in before proto.h, leaving the definitions to regcomp.h.  Thus proto.h
    knows enough to compile.
    
    The structure was already declared in perl.h; just not typedef'd.
    Otherwise proto.h would not know about it at all.  This patch moves two
    regcomp.c related declarations in perl.h to the same section as the
    others, and changes the one for RExC_state_t to be a typedef.  All the
    'struct' uses are removed.

M       embed.fnc
M       embed.h
M       perl.h
M       proto.h
M       regcomp.c

commit a5599705c3f13f07e5fb6d4f815a265bb88cbde0
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:39:38 2013 -0600

    regcomp.h: Create new typedef synonym for clarity
    
    This commit finishes (at least for now) removing some of the overloading
    of the term class.  A 'regnode_charclass_class' node contains space for
    storing the posix classes it matches that are never defined until the
    moment of matching because they are subject to the current run-time
    locale.  This commit creates a typedef 'regnode_charclass_posixl'
    synonym that doesn't re-use the term 'class' for two different purposes.

M       perl.h
M       regcomp.h

commit 569ebcabc9183845c56e5be9c4761f255c0d8610
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 9 12:21:53 2013 -0600

    regcomp.h: Parenthesize macro formal parameter
    
    Not doing so can cause problems, so it is standard procedure to
    parenthesize all parameters within a macro definition.

M       regcomp.h

commit 3703cd2e303cdc4e246ceb9878eff7738a643a54
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 9 11:51:09 2013 -0600

    regcomp.h: Add better named synonyms
    
    This continues the process started two commits ago of removing some of
    the overloading of the term 'class'.
    
    In this case, this commit adds some #defines referring to the portions
    of the regnode associated with bracketed character classes, the ANYOF
    node.  Specifically those portions that deal with the Posix character
    classes, like \w and [:punct:] under /l (locale) matching are renamed
    substituting POSIXL for CLASS.  POSIXL is already used for POSIX-related
    things under /l.  I remember being terribly confused when I started
    reading this code about this.  One had a class within a class.  This
    should clarify things somewhat.
    
    The old names are retained in case files outside the core #include and
    use it (there are a few such in cpan).

M       regcomp.c
M       regcomp.h
M       regexec.c

commit b9cc34f62b9a7f0ae29fd43277a700f95438b484
Author: Karl Williamson <[email protected]>
Date:   Tue Aug 6 21:41:53 2013 -0600

    regcomp.h: Move #define
    
    This moves it to be adjacent to similar #defines

M       regcomp.h

commit 0959196ae78a56b19da0030c9ca52503e19527df
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:19:18 2013 -0600

    regcomp.c: Change names of some static functions
    
    The term 'class' is very overloaded in regex code and documentation.
    perlrecharclass.pod calls the dot (matching any char) a class, and
    calls the [] form "bracketed character classes".  There are other
    meanings as well.  This is the first commit in a short series that
    removes some of those overloadings.
    
    One instance of class is the "synthetic start class", generated by the
    regex optimizer to be a list of all the code points a sucessful match
    could possibly start with.  This is useful in more quickly finding where
    to start looking in matching against a target string.  Prior to this
    commit, the routines that referred to this began with 'cl_', and the
    formal parameters were 'cl', which could mean any class.  This commit
    changes those instances of 'cl' to 'ssc' to indicate this is the only
    type of class that is being handled.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit fc58215136c278baef8a7147ba97adf68db907eb
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 10:01:53 2013 -0600

    regcomp.c: Rework static function call; comments
    
    The previous commit just extracted out code into a function.  This
    commit renames a parameter for clarity, combines two parameters to make
    the interface cleaner, and adds and moves comments around.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit ddd7bb8b9cbf24c497f3c875e4524029afe672d6
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:09:58 2013 -0600

    regcomp.c: Extract code into separate function
    
    A future commit will use this functionality from another place.  For
    now, just cut and paste, and do the minimal ancillary work to get it to
    compile and pass.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 91187f2c3f43f73fe189fec7f8a053021edde6b5
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 2 12:33:07 2013 -0600

    regcomp.c: Use PL_sv_undef instead of NULL in an AV
    
    The NULL gets turned into an SVt_NULL anyway.  This array is read only
    by S_core_regclass_swash() in regexec.c.  That uses an SvROK, so it
    doesn't have to change.
    
    This commit also beefs up the comments around this operation

M       regcomp.c

commit a8721538ca5df4589e4cec7921f7162ffa6476d9
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 1 14:49:29 2013 -0600

    Add regnode struct for synthetic start class
    
    As part of extending the regular expression optimizer to properly handle
    above Latin1 code points, I need an inversion list to contain which code
    points the synthetic start class (ssc) matches.
    
    The ssc currently is the same as a locale-aware ANYOF node, which uses
    the struct of a regular ANYOF node, plus some extra fields at the end.
    
    This commit creates a new typedef for ssc use, which is the locale-aware
    ANYOF node, plus an extra SV* at the end to hold the inversion list.

M       embed.fnc
M       embed.h
M       perl.h
M       proto.h
M       regcomp.c
M       regcomp.h

commit bbaaa96b1ec81c8c2c5b4d01e71a93f85acc13d7
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 24 19:56:24 2013 -0600

    regcomp.c: Move a #define, add a similar one
    
    Future commits will use this #define (and the new one) earlier in the
    file than currently defined.

M       regcomp.c

commit d163fa5e952b4844e9e35ce6a7bece3aa38a9030
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 23 10:01:29 2013 -0600

    Add inversion list for U+80 - U+FF
    
    This is the upper half of the Latin1 range.  This simplifies some code
    very slightly, but will be of use in future commits.

M       charclass_invlists.h
M       embedvar.h
M       intrpvar.h
M       regcomp.c
M       regen/mk_invlists.pl
M       sv.c

commit 3f9b1a8ecbff94970595af1b586e4f1c41fdccc6
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 21:13:38 2013 -0600

    regcomp.c: Extract code into separate function
    
    This is in preparation for it to be called from more than one place, in
    a future commit.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 942d2c65f6279ee529fd2fea9d37ad8dad8cbfad
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 10:10:56 2013 -0600

    regcomp.c: Remove redundant matching possibilities
    
    The flag ANYOF_UNICODE_ALL is for performance.  It is set when the
    inversion list for the ANYOF node includes every code point above
    Latin1, and avoids runtime searching through the list.  We don't need
    both, as the flag being set short-circuits even looking at the other
    list.  By removing the code points from the list, we perhaps will get
    rid of the list entirely, thus saving some operations, or will shorten
    it so that later binary searches run faster.

M       regcomp.c

commit ba34c43bc53980b70d31bc1efbb35b6c0a27e9b8
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 08:21:34 2013 -0600

    regcomp.c: Centralize assignment
    
    It's better to do something in one common place than two.  This properly
    initializes the regex opcode for the synthetic start class when it is
    created, rather than at the end where the code has to be repeated to get
    all instances.

M       regcomp.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to