In perl.git, the branch smoke-me/khw-clump has been created

<http://perl5.git.perl.org/perl.git/commitdiff/a95df5d20ee28f5958282291355902c727b08064?hp=0000000000000000000000000000000000000000>

        at  a95df5d20ee28f5958282291355902c727b08064 (commit)

- Log -----------------------------------------------------------------
commit a95df5d20ee28f5958282291355902c727b08064
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 20:56:09 2012 -0600

    utf8.h: Use machine generated IS_UTF8_CHAR()
    
    This takes the output of regen/regcharclass.pl for all the 1-4 byte
    UTF8-representations of Unicode code points, and replaces the current
    hand-rolled definition there.  It does this only for ASCII platforms,
    leaving EBCDIC to be machine generated when run on such a platform.
    
    I would rather have both versions to be regenerated each time it is
    needed to save an EBCDIC dependency, but it takes more than 10 minutes
    on my computer to process the 2 billion code points that have to be
    checked for on ASCII platforms, and currently t/porting/regen.t runs
    this program every times; and that slow down would be unacceptable.  If
    this is ever run under EBCDIC, the macro should be machine computed
    (very slowly).  So, even though there is an EBCDIC dependency, it has
    essentially been solved.

M       regen/regcharclass.pl
M       utf8.h

commit 0fe2099a56866e2d01f852e23c94b5bebe22b536
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 20:48:15 2012 -0600

    regen/regcharclass.pl: Add ability to restrict platforms
    
    This adds the capability to skip definitions if they are for other than
    a desired platform.

M       regen/regcharclass.pl

commit fd07688fb0f9a5568799fb908f6578264c1c9118
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 20:32:29 2012 -0600

    utf8.h: Remove some EBCDIC dependencies
    
    regen/regcharclass.pl has been enhanced in previous commits so that it
    generates as good code as these hand-defined macro definitions for
    various UTF-8 constructs.  And, it should be able to generate EBCDIC
    ones as well.  By using its definitions, we can remove the EBCDIC
    dependencies for them.  It is quite possible that the EBCDIC versions
    were wrong, since they have never been tested.  Even if
    regcharclass.pl has bugs under EBCDIC, it is easier to find and fix
    those in one place, than all the sundry definitions.

M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h

commit d5d6537b407a7bfd9e47203410bd8cf5a6e5c71c
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 15:18:09 2012 -0600

    regen/regcharclass.pl: Add optimization
    
    On UTF-8 input known to be valid, continuation bytes must be in the
    range 0x80 .. 0x9F.  Therefore, any tests for being within those bounds
    will always be true, and may be omitted.

M       regcharclass.h
M       regen/regcharclass.pl

commit 277bb7bc224e2745df60a1c993938f181e16af1c
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 15:14:59 2012 -0600

    regen/regcharclass.pl: White-space only
    
    Indent a newly-formed block

M       regen/regcharclass.pl

commit 8567691dd1c94c4acb7e0d6f426d0d0c6ab8ad3f
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 15:00:52 2012 -0600

    regen/regcharclass.pl: Extend previously added optimization
    
    A previous commit added an optimization to save a branch in the
    generated code at the expense of an extra mask when the input class has
    certain characteristics.  This extends that to the case where
    sub-portions of the class have similar characteristics.  The first
    optimization for the entire class is moved to right before the new loop
    that checks each range in it.

M       regcharclass.h
M       regen/regcharclass.pl

commit 242b6b634241288353c94c868e2a1f813f55684c
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 09:30:34 2012 -0600

    regen/regcharclass.pl: Rmv always true components from gen'd macro
    
    This adds a test and returns 1 from a subroutine if the condition will
    always match; and in the caller it adds a check for that, and omits the
    condition from the generated macro.

M       regen/regcharclass.pl

commit 38877c8ac730d93fb5554da543c322d28d6822ae
Author: Karl Williamson <[email protected]>
Date:   Tue Sep 4 14:54:26 2012 -0600

    regen/regcharclass.pl: Add an optimization
    
    Branches can be eliminated from the macros that are generated here
    by using a mask in cases where applicable.  This adds checking to see if
    this optimization is possible, and applies it if so.

M       regcharclass.h
M       regen/regcharclass.pl

commit cda7fef31fe67f3a27e8be9e3b4aef548ba40c6e
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 5 10:26:22 2012 -0600

    regen/regcharclass.pl: Rename a variable
    
    I find it confusing that the array element name is the same as the full 
array

M       regen/regcharclass.pl

commit e87b0ad4364f86bf05ae633406a640d50f03e448
Author: Karl Williamson <[email protected]>
Date:   Tue Sep 4 14:12:13 2012 -0600

    regen/regcharclass.pl: Pass options deeper into call stack
    
    This is to prepare for future commits which will act differently at the
    deep level depending on some of the options.

M       regen/regcharclass.pl

commit 4ce32977781f8bff1db2b7d0b2dae7f77b781a36
Author: Karl Williamson <[email protected]>
Date:   Mon Sep 3 16:59:09 2012 -0600

    XXX Benchmark: pp.c: Use macro not swash for utf8 quotemeta
    
    The rules for matching whether an above-Latin1 code point are now saved
    in a macro generated from a trie by regen/regcharclass.pl, and these are
    now used by pp.c to test these cases.  This allows removal of a wrapper
    subroutine, and also there is no need for dynamic loading at run-time
    into a swash.
    
    This macro is about as big as I'm comfortable compiling in, but the
    savings of a hash and the removed subroutine and interpreter variables
    make it a wash I suspect, without checking.

M       embed.fnc
M       embed.h
M       embedvar.h
M       intrpvar.h
M       pp.c
M       proto.h
M       regcharclass.h
M       regen/regcharclass.pl
M       sv.c
M       utf8.c

commit 589296ec5a664e51b5d1fd0e2dbc039bb5b300fe
Author: Karl Williamson <[email protected]>
Date:   Mon Sep 3 16:54:56 2012 -0600

    regen/regcharclass.pl: Add new output macro type
    
    The new type 'high' is used on only above-Latin1 code points.  It is
    designed for code that already knows the tested code point is not
    Latin1, and avoids unnecessary tests.

M       regen/regcharclass.pl

commit 888fa0d7c5a147d0dbd1e47bacba789b808bb298
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 18:29:42 2012 -0600

    regen/regcharclass.pl: Add documentation

M       regen/regcharclass.pl

commit 3ddd3ed4b7b836ca88beec802575874c862fe355
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 18:28:19 2012 -0600

    regen/regcharclass.pl: Error check input better
    
    This makes sure that the modifiers specified in the input are known to
    the program.

M       regen/regcharclass.pl

commit ae0eedeb35ba617b8e99646315aa7fef378df479
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 16:48:14 2012 -0600

    regen/regcharclass.pl: Allow comments in input
    
    Lines whose first non-blank character is a '#' are now considered to be
    comments, and ignored.  This allows the moving of some lines that have
    been commented out back to after the __DATA__ where they really belong.

M       regen/regcharclass.pl

commit 36c10281e3a3c33dc7ac13129938b1298ba468c3
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 15:58:41 2012 -0600

    regen/unicode_constants.pl: Add name parameter
    
    A future commit will want to use the first surrogate code point's UTF-8
    value.  Add this to the generated macros, and give it a name, since
    there is no official one.  The program has to be modified to cope with
    this.

M       regen/unicode_constants.pl
M       unicode_constants.h

commit 60644b22a741cb42269b33a068369401b5bef1b2
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 15:29:32 2012 -0600

    Move 2 functions from utf8.c to regexec.c
    
    One of these functions is currently commented out.  The other is called
    only in regexec.c in one place, and was recently revised to no longer
    require the static function in utf8.c that it formerly called.  They can
    be made static inline.

M       embed.fnc
M       embed.h
M       proto.h
M       regexec.c
M       utf8.c

commit e967b2c98113fb54d4529a6dc355026c0fe5fcc1
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 14:46:38 2012 -0600

    XXX benchmarks regexec.c: Use new macros instead of swashes
    
    A previous commit has caused macros to be generated that will match
    Unicode code points of interest to the \X algorithm.  XXX

M       embed.fnc
M       embed.h
M       embedvar.h
M       intrpvar.h
M       proto.h
M       regen/unicode_constants.pl
M       regexec.c
M       sv.c
M       unicode_constants.h
M       utf8.c

commit c84b2290e63ac2cfaac978e1a18e9a3faf0b8c08
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 14:31:59 2012 -0600

    regen/regcharclass.pl: Generate macros for \X processing
    
    \X is implemented in regexec.c as a complicated series of property
    look-ups.  It turns out that many of those are for just a few code
    points, and so can be more efficiently implemented with a macro than a
    swash.  This generates those.

M       regcharclass.h
M       regen/regcharclass.pl

commit b503148c4fb55216e628afc42b14deef05b721ab
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 14:26:20 2012 -0600

    regen/regcharclass.pl: Change to work  on an empty class
    
    Future commits will add Unicode properties for this to generate macros,
    and some of them may be empty in some Unicode releases.  This just
    causes such a generated macro to evaluate to 0.

M       regen/regcharclass.pl

commit 25377a052ced77332eaa6ffd5e8c9f910d6628c0
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 31 17:04:30 2012 -0600

    regen/regcharclass.pl: Fix bug for character '0'
    
    The character '0' could be omitted from some generated macros due to
    it's testing the value of a hash entry (getting 0 or false) instead
    of if it exists or not.

M       regen/regcharclass.pl

commit 878d1ee42f154b318b11d689991571f3266191ce
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 31 17:00:27 2012 -0600

    regen/regcharclass.pl: Work on EBCDIC platforms
    
    This will now automatically generate macros for non-ASCII platforms,
    by mapping the Unicode input to native output.
    
    Doing this will allow several cases of EBCDIC dependencies in other code
    to be removed, and fixes the bug that this previously had with non-ASCII
    platforms.

M       regen/regcharclass.pl

commit 8d930aead7dfa95ab62704011bf704bfae3095b2
Author: Karl Williamson <[email protected]>
Date:   Mon Sep 3 16:22:32 2012 -0600

    regen/regcharclass.pl: Remove Encode:: dependency
    
    Newer options to unpack alleviate the need for Encode, and run faster.

M       regen/regcharclass.pl

commit 8a6d228b0b8fca1947c70c60691f5f95ae89dd4a
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 31 16:39:31 2012 -0600

    regen/regcharclass.pl: Handle ranges, \p{}
    
    Instead of having to list all code points in a class, you can now use
    \p{} or a range.
    
    This changes some classes to use the \p{}, so that any changes Unicode
    makes to the definitions don't have to manually be done here as well.

M       regcharclass.h
M       regen/regcharclass.pl

commit ce47d3dd885f6ed91eca62b8d62cbbfff95268c6
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 13:09:48 2012 -0600

    utf8.h: Save a branch in a macro
    
    By adding a mask, we can save a branch.  The two expressions match the
    exact same code points.

M       utf8.h

commit 017bf4ad71fcdd2b3cb06f9010bba23e83e07ddc
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 13:08:21 2012 -0600

    utf8.h: White-space only
    
    This reflows some lines to fit into 80 columns

M       utf8.h

commit 58f8427dc8cf4200faf8c1d3dfa99fd6b950f52d
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 13:01:50 2012 -0600

    utf8.h: Correct improper EBCDIC conversion
    
    These macros were incorrect for EBCDIC.  The relationships are based on
    I8, the intermediate-utf8 defined for UTF-EBCDIC, not the final encoding.
    I was the culprit who did this orginally; I was confused by the names of
    the conversion macros.  I'm adding names that are clearer to me; which
    have already been defined in utfebcdic.h, but weren't defined for
    non-EBCDIC platforms.

M       utf8.h

commit 0c56998dfe202cd3ab120cab7174cd7a07669511
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 10:37:26 2012 -0600

    ext/B/B.xs: Remove EBCDIC dependency
    
    These are unnecessary EBCDIC dependencies: It uses isPRINT() on EBCDIC,
    and an expression on ASCII, but isPRINT() is defined to be precisely
    that expression on ASCII platforms.

M       ext/B/B.xs

commit a6d19a8d4ca6e4ef9e7d28564d6f44d4d50160a7
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 10:30:32 2012 -0600

    Remove some EBCDIC dependencies
    
    A new regen'd header file has been created that contains the native
    values for certain characters.  By using those macros, we can eliminate
    EBCDIC dependencies.

M       perl.h
M       utf8.h
M       utfebcdic.h
M       x2p/a2py.c

commit 4c4fc299e55b9bbf9a4bba6f4b7ed40926a04175
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 09:58:43 2012 -0600

    Rename regen'd hdr to reflect expanded capabilities
    
    The recently added utf8_strings.h has been expanded to include more than
    just strings.  I'm renaming it to avoid confusion.

M       MANIFEST
M       regcomp.c
A       regen/unicode_constants.pl
D       regen/utf8_strings.pl
M       regexec.c
A       unicode_constants.h
D       utf8_strings.h

commit 6f7dac32304a0dbb7a416c13b03638941acb9625
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 09:44:22 2012 -0600

    regen/utf8_strings.pl: Add ability to get native charset
    
    This adds a new capability to this program: to input a Unicode code point 
and
    create a macro that expands to the platform's native value for it.
    
    This will allow removal of a bunch of EBCDIC dependencies in the core.

M       regen/utf8_strings.pl
M       utf8_strings.h

commit 3efcf089ad9b591b6e4e48609dad192004177a38
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 09:28:55 2012 -0600

    regen/utf8_strings.pl: Allow explicit default on input
    
    An input line without a command is considered to be a request for the
    UTF-8 encoded string of the code point.  This allows an explicit
    'string' to be used.

M       regen/utf8_strings.pl

commit e04f6576857c1bf441098c45e476adddf52dc866
Author: Karl Williamson <[email protected]>
Date:   Sun Sep 2 09:22:16 2012 -0600

    regen/utf8_strings.pl: Copy empty input lines to output
    
    This allows the generated .h to look better.

M       regen/utf8_strings.pl
M       utf8_strings.h

commit c70734392ab78654ad68fd357069ee7ef5fd4dff
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 31 17:41:14 2012 -0600

    /regcharclass.pl, utf8_strings.pl: Add guard to .h
    
    Future commits will have other headers #include the headers generated by
    these programs.  It is best to guard against the preprocessor from
    trying to process these twice

M       regcharclass.h
M       regen/regcharclass.pl
M       regen/utf8_strings.pl
M       utf8_strings.h

commit 1a4dac99bc47c484cf139b1b77e1ab3904a81c4f
Author: Karl Williamson <[email protected]>
Date:   Fri Aug 31 17:39:04 2012 -0600

    Unicode/UCD.pm: Clarify pod

M       lib/Unicode/UCD.pm

commit 281acc57cc9deed1351e4044c9aeffa680abf79a
Author: Karl Williamson <[email protected]>
Date:   Tue Aug 28 17:41:41 2012 -0600

    Fix \X handling for Unicode 5.1 - 6.0
    
    Commit 27d4fc33343f0dd4287f0e7b9e6b4ff67c5d8399 neglected to include a
    change required for a few Unicode releases where the \X prepend property
    is not empty.  This does that, and suppresses a mktables warning for
    Unicode releases prior to 6.2

M       lib/unicore/mktables
M       regexec.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to