In perl.git, the branch smoke-me/khw-optimizer has been created

<http://perl5.git.perl.org/perl.git/commitdiff/b794bcd085f94e06d38f128719fd5fae9194127c?hp=0000000000000000000000000000000000000000>

        at  b794bcd085f94e06d38f128719fd5fae9194127c (commit)

- Log -----------------------------------------------------------------
commit b794bcd085f94e06d38f128719fd5fae9194127c
Author: Karl Williamson <[email protected]>
Date:   Mon Sep 23 10:43:31 2013 -0600

    XXX empty string

M       regcomp.c
M       regcomp.h

commit c21e2680001dfb6c28e5058e0f2eda8afb29adea
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 23:12:20 2013 -0600

    regcomp.c: White-space, comments only
    
    This moves the static functions introduced a few commits ago to more
    logical places in the file, and wraps some long lines to 79 columns, and
    a few nits in comments

M       regcomp.c

commit 5cfad46efd0a4e681171217d32287c2df2d41587
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 22:56:20 2013 -0600

    regcomp.c: Remove unused parameter in static function
    
    This parameter is no longer used, since a few commits ago in this
    series.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 20d83bc240bec70cde1b371e05204319df7c9931
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 22:46:10 2013 -0600

    Add some tests for the regex optimizer
    
    We don't have the infrastructure to test the regex optimizer, and I'm
    not sure how to do it properly, without tying the tests to particular
    optimizations.  What I did, however, was to go through the recently
    changed optimizer code and write tests to exercise every branch, as far
    as I could tell.

M       t/re/pat_advanced.t
M       t/re/re_tests

commit f82b3f203107f8646d04c25a9e701b1b40e64a62
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 22:36:57 2013 -0600

    regcomp.c: Tighten optimizer for /li matches
    
    The synthetic start class (ssc) generated by the regex optimizer
    frequently has case-sensitive matching enabled, even if nowhere in the
    pattern is there a /i.  This commit causes any pattern that doesn't have
    /i to not have its ssc contain a /i.

M       regcomp.c

commit 868b7730b6f72c1dd2c0f72a3c3f95077596fca0
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 21:36:29 2013 -0600

    XXX better msg: Teach regex optimizer to handle above-Latin1
    
    Until this commit, the regular expression optimizer has essentially
    punted on above-Latin1 code points.  Under some circumstances, they
    would be taken into account, more or less.  With the advent of inversion
    lists which it becomes feasible to actually fully handle them.  This
    commit changes the optimizer to use inversion lists.  This required
    rewriting the base level ...

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c
M       regcomp.h

commit 4ccbdbea46c17ab51f9e53c5ad7e02aa1052a6b8
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 22 20:43:02 2013 -0600

    regcomp.c: Add some static functions
    
    This commit adds some functions that are currently unused, but will be
    used in a future commit.  This commit is essentially to make the
    differences smaller in that commit, as 'diff' is getting confused and
    not outputting the logical differences.  The functions are added in a
    block at the beginning of the file to avoid the 'diff' issues.  A later
    white-space only commit will move them to more appropriate positions.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c
M       regcomp.h

commit 85247e8c6ce5e7c3e1daea68b3067fbb6665055e
Author: Karl Williamson <[email protected]>
Date:   Mon Sep 9 20:33:48 2013 -0600

    regcomp.c: Use macro accessor uniformly
    
    These instances were using the structure field directly; everywhere else
    uses a macro that hides the field's location in the structure.  This
    converts to use the macro everywhere.

M       regcomp.c

commit 3d0a2af081a6a9be954c25a8745e50b53b7533d2
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 14 19:03:39 2013 -0600

    regcomp.c: Optimize e.g. /[\w\W]/l into dot
    
    This is an unlikely scenario for someone to include a Posix class and
    its complement in the same bracketed character class, but looking for
    this and optimizing it away helps the algorithm coming in a future
    commit to look at the synthetic start class.
    
    This commit only does this for /l matching.  For all other matching, if
    we know at compile time what the posix classes match, this optimization
    is already done.

M       regcomp.c

commit d67d6aba74068b404d9d4b1ec1fd81b8a0d44af4
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 5 22:40:54 2013 -0600

    Enlarge dummy regex pass1 compilation node
    
    In pass 1 of compiling regular expressions, the needed size is
    calculated.  There is space allocated for a scratch node that can be
    used for the things that the real one will hold in pass 2.  It is valid
    only while working on the current node, and gets overwritten in the next
    node.
    
    Until this commit, this scratch space was sized only for the smallest
    node type, meaning that larger types could not use it for scratch.  Now
    it is sized to be the largest non EXACTish node.
    
    We could make it an array of 256 + overhead bytes instead to be able to
    hold the EXACTish nodes, but I don't see a need for that now.

M       regcomp.c
M       regcomp.h

commit 9bb000f391b58fd5090a4a3799ff48885053dc2f
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 15:27:08 2013 -0600

    regcomp.c: Use STR_WITH_LEN to avoid bookkeeping
    
    By changing the order of the parameters to the static function
    S_add_data, we can call it with STR_WITH_LEN and avoid a human having to
    count characters.

M       embed.fnc
M       proto.h
M       regcomp.c

commit 366cbc7128f8e1f09b4f350285da58772310a7e2
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 15:07:44 2013 -0600

    Rename regex flag bit for clarity
    
    ANYOF_UNICODE_ALL doesn't mean every Unicode code point.  It means those
    above the Latin1 range.  Rename it, while retaining the old one for back
    compat.

M       regcomp.c
M       regcomp.h
M       regexec.c

commit 26a26e949e92f435035962ace1377e690785b61a
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:55:16 2013 -0600

    regcomp.c: Better DEBUGGING builds error detection
    
    The code had a default: catch-all in the switch statement, but the
    comments indicated that it was uncertain what all was being caught.
    This changes this to panic only in DEBUGGING builds so that we can find
    out if there are indeed other possibilities that we haven't handled, and
    which could use better handling than the default, match everything.
    The two known possibilities are given separate case: statements in
    preparation for handling them differently.

M       regcomp.c

commit 8b80a07cd4895da4d88491ea86b95e74d899d1c7
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:49:37 2013 -0600

    regcomp.c: Change some static parameters to const
    
    I found I needed const in a planned future commit.

M       embed.fnc
M       proto.h
M       regcomp.c

commit 099022a31579d3730c00f5d42b70ba27b54c1ed1
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:27:53 2013 -0600

    Retain an inversion list's mortality in its replacement
    
    A couple of inversion list handling functions end up sometimes creating
    a new inversion list, replacing the old one instead of modifying it.
    This commit causes the replacement list to have the same mortality or
    not of the old one.  That is, mortality is now preserved across these
    operations.

M       regcomp.c

commit d17c711349542d42d498bc1c87ed0e166b967f87
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 14:04:43 2013 -0600

    perl.c: Clean up some SV*s at termination
    
    These were omitted from cleaning up when PERL_DESTRUCT_LEVEL is non-zero

M       perl.c

commit efa5f7bfd8a5d1c1ce259afd297d067547bf029b
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 11:19:02 2013 -0600

    regcomp.c: Add parameter to static function
    
    This parameter will be used in future commits.  This commit is really
    only to make the difference listing smaller in those, by committing
    separately just the book-keeping parts.  This parameter requires also
    passing the aTHX_ thread parameter

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit ee196112fabc480ad38e0a4af9928ab47401cb9e
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:59:01 2013 -0600

    Remove PL_ASCII; use existing array slots for it
    
    PL_ASCII contains an inversion list to match the ASCII-range code
    points.  It is unusable outside the core regular expression code because
    all the functions that manipulate inversion lists are defined only
    within a few core files.  Therefore no outside code should be depending
    on it.
    
    It turns out that there are arrays of similar inversion lists, and these
    all have slots which should have this inversion list in them.  This
    commit fills them, instead of using PL_ASCII.

M       embedvar.h
M       intrpvar.h
M       regcomp.c
M       sv.c

commit ff8f9a634ff67c21ac012ee2e97dee96755c67c4
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:51:24 2013 -0600

    regcomp.c: Typos in comments; Fix another comment
    
    The non-typo fix is the result of allowing a parameter to the function
    be NULL, and not updating the comments to reflect that.

M       regcomp.c

commit 3429f4956387d81a34c4d46aa85a80f88d5486d9
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:39:14 2013 -0600

    regcomp.c: Fix syntax error in #ifdef'd out code
    
    This line is currently not compiled, but would fail if the #ifdef is
    changed.

M       regcomp.c

commit d668e9c42669fc86bab20403119576717763ee42
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 15 10:36:29 2013 -0600

    perl.h: Don't pollute global namespace
    
    These structures are used internally in the regular expression files,
    and are declared here only because of #include ordering issues.  Wrap
    them in an #ifdef so only visible to the correct files.

M       perl.h

commit 8a1ba0ca0c8c4316df3fa823593513d0b79611f4
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 21:13:52 2013 -0600

    Make typedef fully typedef
    
    The regcomp.c struct RExC_state_t has not been usable fully as a
    typedef, requiring the 'struct' at times.  This has caused me, and I
    presume others, wasted time when we forget to use it under those
    circumstances when it should be used, but it's never been a big enough
    issue to cause me to spend tuits on it.  But, working on something else,
    I finally came to the realization of what the problem is.  It is because
    proto.h is #included before regcomp.h is, and so functions that are
    declared in proto.h that have something that is a RExC_state_t as a
    parameter don't know that it is a typedef because that is defined in
    regcomp.h.  A way around this is already used for other similar
    structures, and that is to declare them in perl.h which is always read
    in before proto.h, leaving the definitions to regcomp.h.  Thus proto.h
    knows enough to compile.
    
    The structure was already declared in perl.h; just not typedef'd.
    Otherwise proto.h would not know about it at all.  This patch moves two
    regcomp.c related declarations in perl.h to the same section as the
    others, and changes the one for RExC_state_t to be a typedef.  All the
    'struct' uses are removed.

M       embed.fnc
M       embed.h
M       perl.h
M       proto.h
M       regcomp.c

commit a4486fbc9820f1a50dd066713bf4aca13db45356
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:39:38 2013 -0600

    regcomp.h: Create new typedef synonym for clarity
    
    This commit finishes (at least for now) removing some of the overloading
    of the term class.  A 'regnode_charclass_class' node contains space for
    storing the posix classes it matches that are never defined until the
    moment of matching because they are subject to the current run-time
    locale.  This commit creates a typedef 'regnode_charclass_posixl'
    synonym that doesn't re-use the term 'class' for two different purposes.

M       perl.h
M       pod/perlreguts.pod
M       regcomp.h

commit acac108054ebdd8027835e66421d9411dc615f8a
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 9 12:21:53 2013 -0600

    regcomp.h: Parenthesize macro formal parameter
    
    Not doing so can cause problems, so it is standard procedure to
    parenthesize all parameters within a macro definition.

M       regcomp.h

commit 8afc12f2bb6e40f4e3956fdb8811af17e9e5669f
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 9 11:51:09 2013 -0600

    regcomp.h: Add better named synonyms
    
    This continues the process started two commits ago of removing some of
    the overloading of the term 'class'.
    
    In this case, this commit adds some #defines referring to the portions
    of the regnode associated with bracketed character classes, the ANYOF
    node.  Specifically those portions that deal with the Posix character
    classes, like \w and [:punct:] under /l (locale) matching are renamed
    substituting POSIXL for CLASS.  POSIXL is already used for POSIX-related
    things under /l.  I remember being terribly confused when I started
    reading this code about this.  One had a class within a class.  This
    should clarify things somewhat.
    
    The old names are retained in case files outside the core #include and
    use it (there are a few such in cpan).

M       regcomp.c
M       regcomp.h
M       regexec.c

commit ec7136883bc496b2795d2041f2177f70d46b1f81
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 14 18:57:26 2013 -0600

    regcomp.c: Clarify comment
    
    This continues the process of removing some overloading of the word
    'class', by changing this comment to use 'bracketed class', and
    re-wrapping

M       regcomp.c

commit f507db14ee54e7e4dae79a556c3c2c02b3e6e392
Author: Karl Williamson <[email protected]>
Date:   Tue Aug 6 21:41:53 2013 -0600

    regcomp.h: Move #define
    
    This moves it to be adjacent to similar #defines

M       regcomp.h

commit 9ecadaf5392557e9465a75a4a1f975156acba661
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:19:18 2013 -0600

    regcomp.c: Change names of some static functions
    
    The term 'class' is very overloaded in regex code and documentation.
    perlrecharclass.pod calls the dot (matching any char) a class, and
    calls the [] form "bracketed character classes".  There are other
    meanings as well.  This is the first commit in a short series that
    removes some of those overloadings.
    
    One instance of class is the "synthetic start class", generated by the
    regex optimizer to be a list of all the code points a sucessful match
    could possibly start with.  This is useful in more quickly finding where
    to start looking in matching against a target string.  Prior to this
    commit, the routines that referred to this began with 'cl_', and the
    formal parameters were 'cl', which could mean any class.  This commit
    changes those instances of 'cl' to 'ssc' to indicate this is the only
    type of class that is being handled.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit b6d5c7d55054188d9cef69a9027b9b67a9eb214a
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 10:01:53 2013 -0600

    regcomp.c: Rework static function call; comments
    
    The previous commit just extracted out code into a function.  This
    commit renames a parameter for clarity, combines two parameters to make
    the interface cleaner, and adds and moves comments around.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 2f8b639a4b2e7f1522c9988bf847c754f0a9f9bb
Author: Karl Williamson <[email protected]>
Date:   Wed Aug 14 11:09:58 2013 -0600

    regcomp.c: Extract code into separate function
    
    A future commit will use this functionality from another place.  For
    now, just cut and paste, and do the minimal ancillary work to get it to
    compile and pass.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 3aaf80fcbdd646df31d8e354df60457d7985052a
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 2 12:33:07 2013 -0600

    regcomp.c: Use PL_sv_undef instead of NULL in an AV
    
    The NULL gets turned into an SVt_NULL anyway.  This array is read only
    by S_core_regclass_swash() in regexec.c.  That uses an SvROK, so it
    doesn't have to change.
    
    This commit also beefs up the comments around this operation

M       regcomp.c

commit df64dd727f6715b3376f32e458137baa86035890
Author: Karl Williamson <[email protected]>
Date:   Thu Aug 1 14:49:29 2013 -0600

    Add regnode struct for synthetic start class
    
    As part of extending the regular expression optimizer to properly handle
    above Latin1 code points, I need an inversion list to contain which code
    points the synthetic start class (ssc) matches.
    
    The ssc currently is the same as a locale-aware ANYOF node, which uses
    the struct of a regular ANYOF node, plus some extra fields at the end.
    
    This commit creates a new typedef for ssc use, which is the locale-aware
    ANYOF node, plus an extra SV* at the end to hold the inversion list.

M       embed.fnc
M       embed.h
M       perl.h
M       proto.h
M       regcomp.c
M       regcomp.h

commit 050cc0bb8e2aa1e1e733371c075c07e42603cf62
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 24 19:56:24 2013 -0600

    regcomp.c: Move a #define, add a similar one
    
    Future commits will use this #define (and the new one) earlier in the
    file than currently defined.

M       regcomp.c

commit 30bd78f89852ec7a49cfdd205207470876404fd1
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 23 10:01:29 2013 -0600

    Add inversion list for U+80 - U+FF
    
    This is the upper half of the Latin1 range.  This simplifies some code
    very slightly, but will be of use in future commits.

M       charclass_invlists.h
M       embedvar.h
M       intrpvar.h
M       regcomp.c
M       regen/mk_invlists.pl
M       sv.c

commit d88eddc1cc2dfd90b8aeb1e336eeae33b343c3b1
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 21:13:38 2013 -0600

    regcomp.c: Extract code into separate function
    
    This is in preparation for it to be called from more than one place, in
    a future commit.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit e1c4754b6a4406eb7c1e1c61adb97655b509c889
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 10:10:56 2013 -0600

    regcomp.c: Remove redundant matching possibilities
    
    The flag ANYOF_UNICODE_ALL is for performance.  It is set when the
    inversion list for the ANYOF node includes every code point above
    Latin1, and avoids runtime searching through the list.  We don't need
    both, as the flag being set short-circuits even looking at the other
    list.  By removing the code points from the list, we perhaps will get
    rid of the list entirely, thus saving some operations, or will shorten
    it so that later binary searches run faster.

M       regcomp.c

commit 8f7b6077286b6fd0d3453c078ac65f759769ccbe
Author: Karl Williamson <[email protected]>
Date:   Sun Jul 21 08:21:34 2013 -0600

    regcomp.c: Centralize assignment
    
    It's better to do something in one common place than two.  This properly
    initializes the regex opcode for the synthetic start class when it is
    created, rather than at the end where the code has to be repeated to get
    all instances.

M       regcomp.c

commit 7aaf241a8df58474fa6de4baddf3ea54c32391e8
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 12 19:42:51 2013 -0600

    perlreguts: Bring up-to-date
    
    Various changes have been made to regcomp.c that didn't make it into
    perlreguts until now.

M       pod/perlreguts.pod

commit 9af2b4b1a800380bc554580dc45cb062688a6728
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 12 18:03:19 2013 -0600

    perlreguts.pod: Nits

M       pod/perlreguts.pod

commit c7f276d1e28cfc4b4bcbb047f1e4d15a3eb2ef5a
Author: Karl Williamson <[email protected]>
Date:   Sat Sep 14 13:17:21 2013 -0600

    regcomp.c: Convert another I32 to SSize_t
    
    This code is normally #ifdef'd out, and so was missed in the earlier
    conversions, commit ed56dbcb51c55e631d5f4931f88efe008e5349c4.

M       regcomp.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to