In perl.git, the branch smoke-me/khw-mktables has been created

<http://perl5.git.perl.org/perl.git/commitdiff/4540266f2fdca9ef2ed2399057f12ef596094073?hp=0000000000000000000000000000000000000000>

        at  4540266f2fdca9ef2ed2399057f12ef596094073 (commit)

- Log -----------------------------------------------------------------
commit 4540266f2fdca9ef2ed2399057f12ef596094073
Author: Karl Williamson <[email protected]>
Date:   Sat Dec 21 19:08:46 2013 -0700

    XXX Draft patch to get Unicode::Normalize to depend on unicore files

M       cpan/Unicode-Normalize/Makefile.PL

commit defb57d48e15d7620c9b0e2ee1f825f195adbfef
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 26 14:01:49 2013 -0700

    White-space only
    
    This indents various newly-formed blocks (by the previous commit) in
    these three files, and reflows lines to fit into 79 columns

M       lib/Unicode/UCD.pm
M       lib/Unicode/UCD.t
M       utf8.c

commit e583c11bcf73077768e5dc7dfb1f9a607434bde8
Author: Karl Williamson <[email protected]>
Date:   Tue Dec 24 20:11:23 2013 -0700

    Change format of mktables output binary property tables
    
    mktables now outputs the tables for binary properties as inversion
    lists, with a size as the first element.  This means simpler handling of
    these tables in the core, including removal of an entire pass over them
    (just to get the size).  These tables are marked as for internal use by
    the Perl core only, so their format is changeable at will.

M       embed.fnc
M       embed.h
M       lib/Unicode/UCD.pm
M       lib/Unicode/UCD.t
M       lib/unicore/mktables
M       proto.h
M       regcomp.c
M       utf8.c

commit 2a97132dfdf555081ac20d56b9ac33aacfd8e40e
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 23 20:35:54 2013 -0700

    Change \p{} matching for  above-Unicode code points
    
    http://markmail.org/message/eod7ukhbbh5tnll4 is the beginning of the
    thread that led to this commit.
    
    This commit revises the handling of \p{} and \P{} to treat above-Unicode
    code points as typical Unicode unassigned ones, and only output a
    warning during matching when the answer is arguable under strict Unicode
    rules (that is "matched" for \p{}, and "didn't match" for \P{}).  The
    exception is if the warning category has been made fatal, then it tries
    hard to always output the warning.  The definition of \p{All} is changed
    to be qr/./s, and no warning is issued at all for matching it against
    above-Unicode code points.

M       lib/Unicode/UCD.pm
M       lib/Unicode/UCD.t
M       lib/diagnostics.t
M       lib/unicore/mktables
M       pod/perldelta.pod
M       pod/perldiag.pod
M       pod/perlrecharclass.pod
M       pod/perlunicode.pod
M       regcomp.c
M       regexec.c
M       t/lib/warnings/utf8
M       t/porting/diag.t
M       t/re/pat.t

commit 94980eac548cb6132ec88ffe41e2d0ff3509c0f8
Author: Karl Williamson <[email protected]>
Date:   Mon Dec 23 21:32:17 2013 -0700

    XXX: Need to add tests, get ssc to happen
    
    Test with this patch on top

M       regexec.c

commit 4b6d63313ead279cdc58f6a0714efc3569508308
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 18 22:57:55 2013 -0700

    regcomp.c: comment typo and rewording

M       regcomp.c

commit b4b4be716bded807bee9c8f123894e5c069fb8f6
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 18 22:53:46 2013 -0700

    regcomp.c: Refactor 'if' statement
    
    This refactoring makes it clear that within a (?[]), we don't try to
    optimize the [] part.  This is for clarity for the future only, as
    currently the only changed behavior is if this is being compiled with /l
    rules, and (?[]) generates a syntax error under /l.

M       regcomp.c

commit 675306c2e2f07dc41b41bcc49ce0ae8d513e90a0
Author: Karl Williamson <[email protected]>
Date:   Wed Dec 18 22:41:35 2013 -0700

    Fatalized non-unicode warnings skip regex optimization
    
    This makes sure that fatalized non-unicode warnings actually get output.
    For example \p{Line_Break=CR} would normally get optimized into an EXACT
    node.  But if the user has made non-unicode warnings fatal indicating
    they want to be sure not to try to even match such code points, the
    optimization is skipped so that the checks are made.
    
    Documentation for this change will be in a future commit.

M       regcomp.c
M       t/lib/warnings/utf8

commit 4d55603c8674e125f35fc817fcbaf26050c08d04
Author: Karl Williamson <[email protected]>
Date:   Wed Nov 27 12:16:25 2013 -0700

    mktables: Split off some functionality
    
    This adds a new function that formats a count of code points.  Currently
    it calls the current function that formats a generic number.  A future
    commit will change so that the output of the two functions differ.  The
    reason for this commit is to make that later commit's difference listing
    smaller and easier to understand.

M       lib/unicore/mktables

commit af3e27d835110d61fd96e0618b3a60a0b04d8a7c
Author: Karl Williamson <[email protected]>
Date:   Wed Nov 27 11:39:48 2013 -0700

    mktables: Add \p{Unicode}
    
    This is a clearer synonym for \p{Any}

M       lib/unicore/mktables
M       pod/perldelta.pod

commit 1cde2d9688aaa3faf4b47e90a05e49c71379e9ba
Author: Karl Williamson <[email protected]>
Date:   Wed Nov 27 10:59:08 2013 -0700

    mktables: Separate out defns of \p{Any} and \p{All}
    
    This is in preparation to making them mean different things, in a future
    commit

M       lib/unicore/mktables

commit d44bd737574eac86b8a22bc8c2ed32d6f36ff6b4
Author: Karl Williamson <[email protected]>
Date:   Mon Nov 25 20:18:31 2013 -0700

    regcomp.h: Reorder some #defines
    
    There are no logic changes.  The previous commit changed the numbers for
    some of the bits.  This commit re-arranges things so that the #defines
    are again in numerical order.

M       regcomp.h

commit 15f25a3c003448b39aa23d4ab34d082e5323dba2
Author: Karl Williamson <[email protected]>
Date:   Mon Nov 25 20:12:33 2013 -0700

    Re-order some flag bits to avoid potential branches
    
    The ANYOF_INVERT flag is used in every single pattern match of
    [bracketed character classes].  With backtracking, this can be a huge
    number.  All the other flags' uses pale by comparison.  I noticed that
    by making it the lowest bit, we don't have to use CBOOL, as the only
    possibilities are 0 and 1.  cBOOL hopefully will be optimized away, but
    not always.  This commit reorders some of the flag bits to make this one
    the lowest, and adds a compile check to make sure it isn't inadvertently
    changed.

M       regcomp.h
M       regexec.c

commit 42bd52051ebcca2c16aaa538ed784c09f4016248
Author: Karl Williamson <[email protected]>
Date:   Mon Nov 25 19:40:12 2013 -0700

    Convert regnode to a flag for [...]
    
    Prior to this commit, there were 3 types of ANYOF nodes; now there are
    two: regular, and one for the synthetic start class (ssc).  This commit
    onverted the third type dealing with warning about matching \p{} against
    non-Unicode code points, into using the spare flag bit for ANYOF nodes.
    
    This allows this bit to apply to ssc ANYOF nodes, whereas previously it
    couldn't.  There is a bug in which the warning isn't raised if the match
    is rejected by the optimizer, because of this inability.  This bug will
    be fixed in a later commit.
    
    Another option would have been to create a new node-type which was an
    ANYOF_SSC_WARN_SUPER node.  But this adds extra complications to things;
    and we have a spare bit that we might as well use.  The comments give
    better possibilities for freeing up 2 bits should they be needed.

M       pod/perldebguts.pod
M       regcomp.c
M       regcomp.h
M       regcomp.sym
M       regexec.c
M       regnodes.h

commit 41d873c99374224fe07276091e4d4f8c74bcd9a4
Author: Karl Williamson <[email protected]>
Date:   Mon Nov 25 19:31:57 2013 -0700

    mktables: Better comment some variables

M       lib/unicore/mktables

commit acf003c0b1582077b8564a9656de800e74b8cefc
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 14 21:12:40 2013 -0700

    mktables: Calculate debugging information placement
    
    When outputting debugging information under the -annotate option, it's
    nice to line up the columns.  This commit does a pass through the tables
    where the final real data column is variable width so that it can figure
    out where to put the debugging info so as almost all of the columns can
    be lined up, and not have to be right-shifted because of overlong real
    data.
    
    Certain tables prior to this commit had been manually eyeballed and
    column information hard-coded in.  This is no longer necessary.  This
    means that one parameter to the write() function is no longer used, and
    is removed here.

M       lib/unicore/mktables

commit c94362306648aa3cf6ee74ec8c9c9e0fd34dc431
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 14 19:30:42 2013 -0700

    mktables: White-space only
    
    Outdent a just-removed block, and better align several other statements

M       lib/unicore/mktables

commit 316aa11b8326689a6fad5520b1b56e16560cdad3
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 14 19:32:44 2013 -0700

    mktables: Convert to use new function
    
    The previous commit added a new function used in newly added code; this
    changes some existing code to use that function

M       lib/unicore/mktables

commit 7f87306e99daf0ce1877e69f42350525ca377c1e
Author: Karl Williamson <[email protected]>
Date:   Wed Nov 13 21:56:31 2013 -0700

    mktables: Don't change table format with debugging info
    
    The -annotate option to mktables causes it to output extra information
    (in the form of comments) to its generated tables to make them human
    readable and useful for debugging.  Prior to this commit, this caused
    the tables' formats to be changed somewhat by causing what normally
    ranges to have a line output for each element of the range.  This bloats
    the tables, and causes UCD.t to fail, as it is looking for a
    particular syntax for the tables.
    
    This commit causes the debugging information to be placed separately
    but adjacent to the real data.  The ranges remain as they would be
    without -annotate.  This removes the bloat (as the debugging info is
    stripped out as the table is read in) and causes UCD.t to pass.
    
    It also allows for the format of the real data to change in a later
    commit, and the debugging info can remain relevant.
    
    A future commit will improve the indentation of the comment annotations

M       lib/unicore/mktables

commit 632db99169a9c195074ea46ba770a5f3d04e17de
Author: Karl Williamson <[email protected]>
Date:   Tue Nov 12 12:09:19 2013 -0700

    mktables: Improve display of debugging information
    
    Under the -annotate option, mktables outputs the UTF-8 for the printable
    characters.  This commit adds a non-spacing blank before each such one
    that is supposed to combine with its preceding character (marks).  This
    causes the display of the character to look better.
    
    This necessitated making a local variable more global in scope.

M       lib/unicore/mktables

commit 1f6176b02219a579ef16592ee2146ae5b6649062
Author: Karl Williamson <[email protected]>
Date:   Fri Nov 8 09:34:54 2013 -0700

    lib/Unicode/UCD.t: White-space only
    
    Indent a newly formed block

M       lib/Unicode/UCD.t

commit 915e0c2c42fb44b0e1fbdfd75e6d5eaf251a2954
Author: Karl Williamson <[email protected]>
Date:   Fri Nov 8 09:26:51 2013 -0700

    Add tests for legacy Unicode data files
    
    There are 5 files in lib/unicore/To that may be in direct use by
    applications, and which are not used by Perl itself.  These have been
    changed in an earlier stable release to have comments in them saying,
    their use is deprecated, and that Unicode::UCD gives a stable API for
    access to the data they contain.  However, no warning is given if an
    application reads these files, so the deprecation cycle needs to be
    quite long.  Until we decide to get rid of these files sometime in the
    future, we should make sure they exist and are correct.  Since they
    aren't actually used by Perl, there were no such tests.  This commit
    adds some tests.  It puts them in lib/Unicode/UCD.t, as that required
    the least amount of work, as it already has nearly all the
    infrastructure required for testing these.

M       lib/Unicode/UCD.t

commit dc77158d21f09bdc8d761b87c2798750d2ddd717
Author: Karl Williamson <[email protected]>
Date:   Fri Nov 8 09:21:11 2013 -0700

    lib/Unicode/UCD.t: Anchor a couple of regexes
    
    A future commit will need these to be anchored to avoid false positives.

M       lib/Unicode/UCD.t

commit 6ec2bc73d9fdbec8d32d58bc3f72877024ae4b3e
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 7 12:38:31 2013 -0700

    lib/Unicode/UCD.t: Clarify diagnostic
    
    This diagnostic comes from either of 2 problems, so mention both of
    them.

M       lib/Unicode/UCD.t

commit b88616e569d5f472f561dbb688ca1c919962609d
Author: Karl Williamson <[email protected]>
Date:   Thu Nov 7 11:56:09 2013 -0700

    lib/Unicode/UCD.t: Rename a $variable
    
    This is in preparation for a future commit where the new name makes more
    sense.

M       lib/Unicode/UCD.t

commit 74ba4f39594ea22df7c331362fdc04d04c1cabdd
Author: Karl Williamson <[email protected]>
Date:   Wed Nov 6 10:56:07 2013 -0700

    Unicode/UCD.t: Add missing 'next' statement
    
    When a test fails, it should do a 'next' to stop processing the current
    property.

M       lib/Unicode/UCD.t

commit b847081a1f14255868d00cc1fb8bfa73af169d10
Author: Karl Williamson <[email protected]>
Date:   Tue Nov 5 22:52:10 2013 -0700

    mktables: White-space only
    
    Align a few lines to begin on same column which has been outdented so
    nothing exceeds 79 columns

M       lib/unicore/mktables

commit 2b4e72f42f18b51633276e77826a6bd7138aeee3
Author: Karl Williamson <[email protected]>
Date:   Tue Nov 5 22:33:06 2013 -0700

    Unicode::UCD: Remove access to some legacy-only properties
    
    Five files are currently being kept around only because they existed
    before Unicode::UCD gave access to the properties they define, and some
    application programs may rely on their presence, and format.  More
    compact files have supplanted the use of these files by the Perl core.
    
    Mistakenly, Unicode::UCD gave access to these files via the made-up
    property names that they are referred to by in mktables.  This was
    undocumented.  This commit removes this access.

M       lib/Unicode/UCD.t
M       lib/unicore/mktables

commit fd7ae644e0388613de341f20854f7da193cf0fa7
Author: Karl Williamson <[email protected]>
Date:   Mon Nov 4 09:57:29 2013 -0700

    mktables: Clarify overloaded variable name
    
    The term 'full' is overloaded here in this small section of code.  In
    some cases it refers to the full case mapping versus the simple case
    mapping; in other cases it refers to the full name for a property as
    opposed to the abbreviated name.  This commit expands each to indicate
    which is meant.

M       lib/unicore/mktables

commit a246ea715737f3afc5463c9237cb350c3917fb6a
Author: Karl Williamson <[email protected]>
Date:   Sat Nov 2 23:22:48 2013 -0600

    mktables, UCD.t: Fix nits in comments; add comment

M       lib/Unicode/UCD.t
M       lib/unicore/mktables

commit 1b3f5e6cdd531b0868d8f04164d2c643e1d4dede
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 28 19:49:55 2013 -0600

    mktables: Don't output trailing tabs in tables
    
    This makes sure that the tabs aren't output unless there is a following
    non-null value, saving some disk space

M       lib/unicore/mktables

commit a7112a485f251df5e345845a2bdaddb67de03773
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 28 17:00:25 2013 -0600

    Unicode/UCD.t: white-space, comments
    
    Wrap to 79 columns; add a comment

M       lib/Unicode/UCD.t

commit 64433dc1f94e3f44fe7afdf179d814e706f69930
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 28 16:43:01 2013 -0600

    mktables: Stop generating most leading zeros
    
    Leading zeros were generated to conform with Unicode usage, but these
    are machine-read files so this just takes up some extra space and extra
    parsing cycles at run-time.  It's a small matter, but we should design
    our files to be the most efficient possible.  It is possible to get more
    human-readable files by using the -annotate option to mktables.
    
    Certain files whose existence has been published have their formats
    unchanged, in case some application is reading them.  The files contain
    comments that their use is deprecated, but there is no warning generated
    if they are opened and read, nor is it really feasible to add such a
    warning.  At some time in the future, we may feel it's ok to remove
    these files, as their contents have been available since v5.16 through a
    stable API in Unicode::UCD, but until we remove them, we shouldn't
    change their formats.
    
    Not all other leading zeros are removed; just the ones that were
    convenient to remove.

M       lib/Unicode/UCD.t
M       lib/unicore/mktables

commit 3733dd3b5fc1b7ceec9353126d8a2f90f11d7d6e
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 20 10:57:21 2013 -0600

    mktables: Further explain how things work in a comment

M       lib/unicore/mktables

commit a9690c0b5bc86457404a35b9b87dae93377ffee6
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 20 10:27:42 2013 -0600

    mktables: Add an advisory comment to generated files.

M       lib/unicore/mktables

commit 628c8b6bbee822676ad6b5459ff073137f72c0b5
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 20 10:20:13 2013 -0600

    mktables: Regenerate if called with different cmd line args
    
    mktables acts pretty much like its own Makefile.  This is because the
    rules for regenerating are complicated and too hard to keep in sync in a
    Makefile with new versions of Unicode.  mktables itself already has
    enough intelligence to automatically update the rules when it gets
    modified to account for new files from Unicode.
    
    However, prior to this commit, it didn't keep track of the options it
    was called with, thus it wouldn't necessarily run when those options
    changed to affect the desired outputs.

M       lib/unicore/mktables

commit 56c8f3e8544ebade2e1c42b01626af50738a774b
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 20 10:13:39 2013 -0600

    mktables: Tighten regex match to real data
    
    The actual file has spaces, so use \s instead of the more dangerous dot.
    Also, after processing the line, no need to look to see if it matches
    something else.

M       lib/unicore/mktables

commit 7c51930746933517a3bb524ccba3cfe386174573
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 17 20:05:18 2013 -0600

    mktables: Fixup debugging info
    
    The -annotate parameter generates extra information in the tables
    created by mktables which is useful to me in understanding the Unicode
    standard and debugging.  I doubt that anyone else has ever used it.  It
    has been broken for some tables for some time.  This commit fixes those.

M       lib/unicore/mktables

commit bd9bbb14b045ac359f6a73c7334447faf6d8bc72
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 17 20:03:52 2013 -0600

    mktables: White-space only: wrap to 79 cols

M       lib/unicore/mktables
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to