In perl.git, the branch khw/ebcdic has been created

<http://perl5.git.perl.org/perl.git/commitdiff/1ff9d4147867c9796a36c90d0ffe5128dd1459c8?hp=0000000000000000000000000000000000000000>

        at  1ff9d4147867c9796a36c90d0ffe5128dd1459c8 (commit)

- Log -----------------------------------------------------------------
commit 1ff9d4147867c9796a36c90d0ffe5128dd1459c8
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:48:22 2013 -0600

    regen/mk_PL_charclass.pl: XXX Make EBCDIC friendly
    
    need more of a commit message

M       regen/mk_PL_charclass.pl

commit d33aabb72849187ba115e12946db60c7541f9dbd
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:44:44 2013 -0600

    XXX make various things more EBCDIC friendly
    
    Adds trailing white space errors
    Need to know what to do about ^A meaning 0x1, and M-foo meaning meta

M       lib/DB.pm
M       lib/dumpvar.pl
M       lib/perl5db.pl
M       lib/sigtrap.pm

commit 068f296772b795858cd6dc8bedc1871f86ef1b1e
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:41:57 2013 -0600

    XXX charnames.t: Make more EBCDIC friendly
    
    Why need utf8::unicode_to_native

M       lib/charnames.t

commit b62565a341c39c856293e877681a0680ea40192e
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:41:15 2013 -0600

    XXX Temp add debug statements

M       lib/_charnames.pm
M       regen/unicode_constants.pl

commit 8d0142a2433965ce808a764d569aab4f62285a6b
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 12:37:13 2013 -0600

    XXX: regen/regcharclass.pl: Temp for testing

M       regen/regcharclass.pl

commit 0e9cba4fe5688d65b7efb707ad07707f560f94dd
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 16:52:45 2013 -0600

    regcomp.c: Fix bug in EBCDIC
    
    The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
    code points, but under EBCDIC those code points are 0-127.

M       regcomp.c

commit c290e8ac4fb6faaf9cd40c4ffc568750acadcc06
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 12:26:15 2013 -0600

    hints/os390.sh: Suppress bogus compiler message

M       hints/os390.sh

commit 8492b5401ebe3faafcee62bb73ed386ae05a434e
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 11:57:24 2013 -0600

    re/charset.t: Allow to work on EBCDIC
    
    This just converts the hard-coded character numbers to native, so will
    work on any platform.

M       t/re/charset.t

commit 2cdaa52c8cf9ffbef659a753c54693d5b545a5c0
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 11:50:35 2013 -0600

    XS-APItest/t/handy.t: Change output message
    
    On EBCDIC platforms, the output is not in terms of \N{U+}; change text
    to \x{ }

M       ext/XS-APItest/t/handy.t

commit 445b7546ad34abed007dd59c4418d30217bb1cde
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 21:44:16 2013 -0600

    XXX Dumper.xs: Don't know why this stopped compiling

M       dist/Data-Dumper/Dumper.xs

commit c5278ab838d1fdf4519cbc0b40ca7468af909750
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:22:28 2013 -0600

    toke.c: Fix an ASCII-platform dependency

M       toke.c

commit 57b5ac79d7cf28899afc593404db6f9290e8cb79
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:20:23 2013 -0600

    toke.c: Simplify some code
    
    We don't have to test separately for lower vs uppercase here, as
    upper/lower case A-Z and a-z are not intermixed in the gaps in A-Z and
    a-z under EBCDIC.

M       toke.c

commit 2a559f721112014501f3f4c6fe583c7b55b12293
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:18:12 2013 -0600

    genpacksizetables.pl: Correct comment typo

M       genpacksizetables.pl

commit f4b208b59ff5b5e1d980781c0ff5641879630ceb
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:17:39 2013 -0600

    APItest/t/handy.t: Make EBCDIC-friendly

M       ext/XS-APItest/t/handy.t

commit 307eb7a85355774f3e3c35320e9845eea52ecd7a
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:16:14 2013 -0600

    Data-Dumper: Make EBCDIC-friendly

M       dist/Data-Dumper/Dumper.xs

commit f40c5532c3283d24af993ade06edcf03caa33bdd
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:14:31 2013 -0600

    sv.c: Make less ASCII-centric

M       sv.c

commit 67488493a5b402754536df947c6f01e426570b80
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:07:52 2013 -0600

    lib/charnames.t: Make some tests work under EBCDIC

M       lib/charnames.t

commit a23c62f5c38021cf2a9882929a680415cf9940c4
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:05:46 2013 -0600

    dump.c: Make less ASCII-centric:
    
    This has the added advantage of being clearer as to what is going on.

M       dump.c

commit 570d8d0ae78d11a1202c76ac933ef248dba39948
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:02:52 2013 -0600

    hv.c: Stop being ASCII-centric
    
    This uses macros which work cross-platform.  This has the added advantge
    that it is much clearer what is going on.

M       hv.c

commit 8eab790e8fa897a8b4cd493a82a1ad738e2e9677
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 12 22:34:17 2013 -0600

    t/TEST: Don't bail if fails in t/base

M       t/TEST

commit 8f28f8890a9636b407d1cb6c1cf06fba020f7d97
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 11 15:11:10 2013 -0600

    Added Porting/reorder_charclass_invlists.pl
    
    This program is used too bootstrap perl onto a non-ASCII platform with
    no pre-existing perl.

M       MANIFEST
A       Porting/reorder_charclass_invlists.pl

commit 283b086629fc3c47f80dc7c32d2a1956f777595c
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 22:17:31 2013 -0600

    XXX See if changing \xE2 to \xE1 causes lex.t to work for EBCDIC
    
    \xE2 is 'S' in EBCDIC, and so is going to be legal.  \xE1 is not an
    ASCII equivalent.

M       t/base/lex.t

commit 092d3fe91a1f0ac072be3605e30771391d1a0a1f
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 11:01:32 2013 -0700

    XXX EBCDIC header files

M       charclass_invlists.h
M       l1_char_class_tab.h
M       unicode_constants.h

commit db60a78987d029161753559457727e6a3e714639
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 12:31:25 2013 -0700

    XXX Temporary for z/OS long long support

M       Configure
M       hints/os390.sh

commit 1c0174b9e52a0380a411c89271031f8361427d09
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 13:11:07 2013 -0600

    XXX Temporary comment out ParseXS check
    
    this is to get things to compile for now

M       dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

commit 7a07d028116d71b31dac92f3282eff41f59e7188
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 11:34:10 2013 -0600

    XXX Collate, Normalize: Allow to compile under EBCDIC

M       cpan/Unicode-Collate/Collate.pm
M       cpan/Unicode-Collate/mkheader
M       cpan/Unicode-Normalize/Normalize.pm
M       cpan/Unicode-Normalize/mkheader

commit 3868399c8422e235d132700fee0239c27a07f821
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 9 21:57:38 2013 -0700

    XXX dquote_static.c: Silence wrong warning on EBCDIC
    
    Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
    which should be a separate commit anyway.
    
    This silences an inappropriate warning that doesn't happen on ASCII
    platforms.  CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms.  But
    0x14 is a C1 control on EBCDIC, a C0 on ASCII.  Therefore the test that
    it's a control should include both C0 and C1, which isCNTRL_L1() does.
    
    Also has a white-space change, outdenting a line so it doesn't wrap in
    an 80 column window.

M       dquote_static.c

commit 5185f583c2d45879dead2cc355355c9fad2c714d
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 7 12:08:41 2013 -0700

    utfebcdic.h: Change 'unsigned char' to U8
    
    This is for consistency with the rest of Perl

M       utfebcdic.h

commit 09691f4318feb878277d464141dea0ae882d7613
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 08:11:38 2013 -0700

    regen/regcharclass.pl: Make more EBCDIC-friendly
    
    This commit changes the code generated by the macros so that they work
    right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs.  THEY
    ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
    onto the target platform, and regcharclass.pl can be run there,
    generating macros correct UTF-8.

M       regcharclass.h
M       regen/regcharclass.pl

commit e872909aeefd21212c41c1341e7f59cfb74acfcd
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:30:01 2013 -0700

    utfebcdic.h: Add (UV) cast
    
    The operand of this macro is implicitly a UV.  Make sure that it is.

M       utfebcdic.h

commit 9e04a36a64ea92143a6b0c8c8f010e418f715b39
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 17:04:58 2013 -0700

    handy.h: Allow bootstrapping to non-ASCII platform
    
    This adds a bunch of macros and moves things around to support
    conditional compilation when Configure is called with
    -DBOOTSTRAP_CHARSET.  Doing so causes the usual macros that are
    table-driven to not be used, since the table may not be valid when
    bringing Perl up for the first time on a non-ASCII platform.
    
    This allows it to compile using the platform's native C library ctype
    functions, which should work enough to compile miniperl, and allow the
    table to be changed to be valid.  Then Configure can be re-run to not
    bootstrap, and normal compilation can proceed

M       handy.h
M       inline.h

commit a00751cd7325631529638a6d6a1510dd18702566
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:43:26 2013 -0700

    gv.c: Remove EBCDIC dependency

M       gv.c

commit f46536eaae5608b61828074ac466223f23684f3b
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:00:47 2013 -0700

    toke.c: Remove EBCDIC dependency

M       toke.c

commit 3ebaf981ad2ea4df277d68beefd6fc32cad3e7aa
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:14:25 2013 -0700

    toke.c: Remove character set dependency
    
    Instead of hard-coding the bit patterns that comprise the Byte Order
    Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
    the current platform.
    
    This removes some EBCDIC-only code.

M       toke.c

commit 1aab7ed700bbd15a5e58253d5dca823afac6cae9
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:10:27 2013 -0700

    unicode_constants.h: Add #defines for Byte Order Mark
    
    These will be used in future commits

M       regen/unicode_constants.pl
M       unicode_constants.h

commit d6cea33c3774b44e7f1218388817feb710d7009c
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 15:04:18 2013 -0700

    XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
    
    This macro may not be present, and is currently used exclusively in
    IS_UTF8_CHAR, which itself may be undefined, and code should cope with
    that.  This is a work-around until a better solution is found.

M       utf8.c
M       utf8.h

commit 1d257338db08f776659e6468b8167eb7f570c36b
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 14:09:04 2013 -0700

    Add Porting tool for help with non-ASCII platforms
    
    Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
    non-ASCII platform with no working Perl.

M       MANIFEST
A       Porting/reorder_l1_char_class_tab.pl
M       regen/mk_PL_charclass.pl

commit fe4ebcb55a6608c683fb45e9d061ac55c2d8e0f8
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 13:06:58 2013 -0700

    inline.h: Reorder functions
    
    The comment implied that the functions below it in the file were
    deprecated, but in fact only the next two functions were.  This
    clarifies that and moves them so they are the final ones in the file

M       inline.h

commit d678c473ff3b66b57330e3199de47726c235c8e2
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:33:42 2013 -0700

    utfebcdic.h: Add comment

M       utfebcdic.h

commit 7375aaf9187724c48b895731a47190db0cdf75a1
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:12:11 2013 -0700

    utf8.h: Clean up START_MARK definition and use
    
    The previous definition broke good encapsulation rules.  UTF_START_MARK
    should return something that fits in a byte; it shouldn't be the caller
    that does this.  So the mask is moved into the definition.  This means
    it can apply only to the portion that creates something larger than a
    byte.  Further, the EBCDIC version can be simplified, since 7 is the
    largest possible number of bytes in an EBCDIC UTF8 character.

M       utf8.h
M       utfebcdic.h

commit ee58a3a9c340cd5005b67faff85b48e9aebebdc6
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:05:26 2013 -0700

    utf8.h: Move #includes
    
    These two files were only being #included for non-ebcdic compiles; they
    should be included always.

M       utf8.h

commit 555bb042762b2dcbc8d4139b582a0f470e9286d0
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 11:49:14 2013 -0700

    utfebcdic.h: Remove extra parameter expansions
    
    These two macros were improperly expanding the parameters as well as
    defining the operation, leading to compile errors.

M       utfebcdic.h

commit 226ff9f76f745bdc31112dafbf25330c69e9ee8e
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 1 08:28:52 2013 -0700

    utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
    
    These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
    UTF8_TWO_BYTE_LO.  But the EIGHT_BIT versions can use the less general
    and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
    domain is restricted in the EIGHT_BIT.  Note that on ASCII platforms,
    these both expand to the same thing, so the difference matters only on
    EBCDIC.

M       utf8.h

commit 5730c64bceb470ad820dfb9f755dab31788aee19
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 09:25:27 2013 -0700

    XXX temp:  show makedepend cerr

M       makedepend.SH

commit 701dfeb8b185a6cd9aab8addc7f22c9d4c7b4dee
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit bea76f647fe063ff6f4f98320ca4c2755564e748
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit e3462894604e26af40bd29bbc3abee337f9e0312
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit e18195561a00905d13bd08c7bfa6fc40482a183d
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit 86a6647b0e884065b0c74d4d4432f8f2fd52a8d6
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 08:38:19 2013 -0700

    handy.h: Remove extraneous parens

M       handy.h

commit 47a0c940be4be76a79831ebcf292e5539d8203eb
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <[email protected]> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit 4b8724a25c6eed2727ed855bbe666a30cba5d024
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <[email protected]> for reporting this.

M       hints/os390.sh

commit d4204fcf2f048595f42a26897290ca5628592d5c
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:45:19 2013 -0700

    utf8.c: Use more clearly named macro
    
    In the case of invariants these two macros should do the same thing,
    but it seems to me that the latter name more clearly indicates what is
    going on.

M       utf8.c

commit 4ebf592305b4c47891e18db1e58a7ce9274a02e7
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:35:12 2013 -0700

    Add macro OFFUNISKIP
    
    This means use official Unicode code point numbering, not native.  Doing
    this converts the existing UNISKIP calls in the code to refer to native
    code points, which is what they meant anyway.  The terminology is
    somewhat ambiguous, but I don't think will cause real confusion.
    NATIVESKIP is also introduced for situations where it is important to be
    precise.

M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 10ee5d70656ee6dcaa796a08a1f8a1cb204608ef
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:22:19 2013 -0700

    toke.c: white space only

M       toke.c

commit 2c0131c9d8fdceb0622293f00cee6480bd5472d5
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 12:08:50 2013 -0700

    utf8.c: Deprecate two functions
    
    This is to force any code that has been using these functions to change.
    Since the Unicode tables are now stored in native order, these functions
    should only rarely be needed.
    
    However, the functionality of these is needed, and in actuality, on
    ASCII platforms, the native functions are #defined to these.  So what
    this commit does is rename the functions to something else, and create
    wrappers with the old names, so that anyone using them will get the
    deprecation.

M       embed.fnc
M       embed.h
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 8c913c2c1df1ad9130e7b7d6435336126f33876c
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:26:09 2013 -0700

    Deprecate uvuni_to_utf8()
    
    Code should almost never be dealing with non-native code points

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit c45d097381f088d093247f5a066732198fb8b483
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:02:33 2013 -0700

    Deprecate utf8_to_uni_buf()
    
    Now that the tables are stored in native order, there is almost no need
    for code to be dealing in Unicode order.

M       embed.fnc
M       proto.h
M       utf8.c

commit cf019318dd26ff019a50475d075a4e104e4eb06c
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit 1e31679408bd07b5cd38e06d8ea66e39a82a8605
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:26:44 2013 -0700

    Deprecate valid_utf8_to_uvuni()
    
    Now that all the tables are stored in native format, there is very
    little reason to use this function; and those who do need this kind of
    functionality should be using the bottom level routine, so as to make it
    clear they are doing nonstandard stuff.

M       embed.fnc
M       proto.h
M       utf8.c

commit aa99cae41e37d55aff63831f2634939e86d71493
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:14:26 2013 -0700

    utf8.c: Swap which fcn wraps the other
    
    This is in preparation for the current wrapee becoming deprecated

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
M       utf8.h

commit 03e676746084a3725058fccb1f1faf3724f7270b
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:29:34 2013 -0700

    utf8.c: Skip a no-op
    
    Since the value is invariant under both UTF-8 and not, we already have
    it in 'uv'; no need to do anything else to get it

M       utf8.c

commit 0f957bd9af609663e666e8fe4c49e369f07d17fb
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:26:50 2013 -0700

    utf8.c: Move comment to where makes more sense

M       utf8.c

commit 7cc6dfa9f2e3298b3ab42a730381d47cafde3e0a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:30:10 2013 -0700

    APItest: Test native code points, instead of Unicode

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 4bec9e4bd8b19e749475ab4d02be327c24955a30
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:25:08 2013 -0700

    XXX CPAN Normalize
    
    This converts Unicode::Normalize to use the native tables that are used
    by Perl starting in XXX, while using the Unicode-ordered ones that were
    used before then.
    
    Another alternative would be to have mktables generate just these tables
    in Unicode ordering.

M       cpan/Unicode-Normalize/Normalize.xs

commit f05b1691f6ede1f6884c55db64601e8228c06119
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:22:55 2013 -0700

    XXX CPAN prob wrong Collate
    
    This changes to implicity usenative code points.  This is likely wrong,
    as the module comes with its own data, that are probably in terms of
    Unicode

M       cpan/Unicode-Collate/Collate.xs

commit 11198024a28b4599c68b357ef22348d89b1d4e7a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:12:53 2013 -0700

    XXX CPAN Encode.xs
    
    Use core function if available.  This will insulate this code from any
    future changes.

M       cpan/Encode/Encode.xs

commit d30fcc8263761b703c754f47bc6c37f83e16d591
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:04:24 2013 -0700

    XXX CPAN and unsure Encode

M       cpan/Encode/Encode.xs
M       cpan/Encode/Unicode/Unicode.xs

commit 23a6ee56dc92727e8baea4e771ee543344cd0d50
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:00:47 2013 -0700

    XXX CPAN Encode.xs: fix indent

M       cpan/Encode/Encode.xs

commit d3cea73fd207d19374fd0337c3bb962710e9de73
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 17:23:15 2013 -0700

    Don't refer to U+XXXX when mean native
    
    These messages say the output number is Unicode, but it is really
    native, so change to saying is 0xXXXX.

M       regen/regcharclass_multi_char_folds.pl
M       regexec.c

commit d20c30cdc20b1b1186bcc784b0a3489c7da7a7b9
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:43:59 2013 -0700

    Convert some uvuni() to uvchr()
    
    All the tables are now based on the native character set, so using
    uvuni() in almost all cases is wrong.

M       cygwin/cygwin.c
M       doop.c
M       op.c
M       pp_pack.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c

commit 3a00f2a69f8df6017c58c1b3b3c155a0c99c65f6
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:25:47 2013 -0700

    handy.h: White space only

M       handy.h

commit 9de115cf1dd6caef1b9aa93c7652826b04a016df
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:19:49 2013 -0700

    t/test.pl: Allow native/latin1 string conversions to work on utf8.
    
    These functions no longer have the hard-coded definitions in them,
    but now end up resolving to internal functions, so that new encodings
    could be added and these would automatically understand them.
    
    Instead of using tr///, these now go character by character and
    converting to/from ord, which is slower, but allows them to operate on
    utf8 strings.
    
    Peephole optimization should make these essentially no-ops on ascii
    platforms.

M       t/test.pl

commit eebcf1e5d8a4347511038b03f66bab0f9fa99b68
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:05:55 2013 -0700

    t/test.pl: Simplify ord to/from native fcns
    
    This commit changes these functions from converting to/from a string to
    calling utf8:: functions which operate on ordinals instead.

M       t/test.pl

commit 34f042a2cb59faa3bb27d4046fa3cf0832211ce0
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:35:38 2013 -0700

    Make casing tables native
    
    These are final tables that haven't been converted to native character
    set casing.

M       perl.h
M       utfebcdic.h

commit de873077e91076ed604d726ce8f29283c27a2080
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:32:30 2013 -0700

    utfebcdic.h: Remove trailing spaces

M       utfebcdic.h

commit 33581a4976f86fea1114397d1a8dfff4ea73c8de
Author: Karl Williamson <[email protected]>
Date:   Fri Feb 22 18:55:26 2013 -0700

    EBCDIC has the unicode bug too
    
    We have not had a working modern Perl on EBCDIC for some years.  When I
    started out, comments and code led me to conclude erroneously that
    natively it supported semantics for all 256 characters 0-255.  It turns
    out that I was wrong; it natively (at least on some platforms) has the
    same rules (essentially none) for the characters which don't correspond
    to ASCII onees, as the rules for these on ASCII platforms.
    
    A previous commit for 5.18 changed the docs about this issue.  This
    current commit forces ASCII rules on EBCDIC platforms (even should there
    be one that natively uses all 256).  To get all 256, the same things
    like 'use feature "unicode_strings"' must now be done.

M       handy.h

commit 43fe57ce5a010b9d4fc28e944ae3b11a075917f8
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:47:52 2013 -0700

    handy.h: Solve a failure to compile problem under EBCDIC
    
    handy.h is included in files that don't include perl.h, and hence not
    utf8.h.  We can't rely therefore on the ASCII/EBCDIC conversion
    macros being available to us.  The best way to cope is to use the native
    ctype functions.  Most, but not all, of the macros in this commit
    currently resolve to use those native ones, but a future commit will
    change that.

M       handy.h

commit b2f825edd0374edb380b4a2c4a81e0ad1724a23d
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:35:12 2013 -0700

    handy.h: Simplify some macro definitions
    
    Now, only one of the macros relies on magic numbers (isPRINT), leading
    to clearer definitions.

M       handy.h

commit e9b76fb63faf21e5e5bcec9f5f461889c386b521
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:26:49 2013 -0700

    handy.h: Combine macros that are same in ASCII, EBCDIC
    
    These 4 macros can have the same RHS for their ASCII and EBCDIC
    versions, so no need to duplicate their definitions
    
    This also enables the EBCDIC versions to not have undefined expansions
    when compiling without perl.h

M       handy.h

commit b6f40fc6296b8261c70f3b91673114dc96b30a36
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:39:48 2013 -0700

    Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
    
    These macros are no longer called in the Perl core.  This commit turns
    them into functions so that they can use gcc's deprecation facility.
    
    I believe these were defective right from the beginning, and I have
    struggled to understand what's going on.  From the name, it appears
    NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
    appropriate parameter indicates that.  But that is impossible to do
    correctly from that API, as for variant characters, it needs to return
    two bytes.  It could only work correctly if ch is an I8 byte, which
    isn't native, and hence the name would be wrong.
    
    Similar arguments for ASCII_TO_NEED.
    
    The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
    does what I think NATIVE_TO_NEED intended.

M       embed.fnc
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.h
M       utfebcdic.h

commit 5a6c238d71c9d329ddeec031a7579b9eec3b1f3d
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:26:43 2013 -0700

    Remove remaining calls of NATIVE_TO_NEED
    
    These calls are just copying the input to the output byte by byte.
    There is no need to worry about UTF-8 or not, as the output is just an
    exact copy of the input

M       toke.c

commit 06495349fbbb6793d86df41a58739da6ed10874d
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:12:15 2013 -0700

    toke.c: Remove some NATIVE_TO_NEED calls
    
    I believe NATIVE_TO_NEED is defective, and will remove it in a future
    commit.  But, just in case I'm wrong, I'm doing it in small steps so
    bisects will show the culprit.  This removes the calls to it where the
    parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
    result can't be other than just the parameter.

M       toke.c

commit fab8d0e4fe7c30da8cdf12e6e4e55866e6829f67
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:22:07 2013 -0700

    toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
    
    This code is attempting to deal with the problem of holes in the ranges
    a-z and A-Z in EBCDIC.  Prior to this patch, it accepeted things like A
    WITH GRAVE, etc, which shouldn't have the special processing to deal
    with the holes

M       toke.c

commit 9ef7a0f4b4138526ecc4afcc498a4b9a879198ff
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 19 15:13:19 2013 -0700

    Use real illegal UTF-8 byte
    
    The code here was wrong in assuming that \xFF is not legal in UTF-8
    encoded strings.  It currently doesn't work due to a bug, but that may
    eventually be fixed: [perl #116867].  The comments are also wrong that
    all bytes are legal in UTF-EBCDIC.
    
    It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
    (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
    of an illegal overlong sequence.
    
    This creates a #define for an illegal byte using one of the real illegal
    ones, and changes the code to use that.
    
    No test is included due to #116867.

M       op.c
M       toke.c
M       utf8.h

commit 0f14654d2bcc78cf560f4854f7f6c648f5185c6f
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 14:00:13 2013 -0700

    toke.c: Don't remap \N{} for EBCDIC
    
    Everything is now in native,

M       toke.c

commit 676022641a0f38fe82df757697a349d409f06f74
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:50:45 2013 -0700

    toke.c: Remove remapping for EBCDIC for octal
    
    The code prior to this commit converted something like \04 into its
    EBCDIC equivalent only in double-quoted strings.  This was not done in
    patterns, and so gave inconsistent results.  The correct thing to do
    should be to do the native thing, what someone who works on a platform
    would think \04 do.  Platform independent characters are available
    through \N{}, either by name or by U+.
    
    The comment changed by this was wrong, as in some cases it was native,
    and in some cases Unicode.

M       toke.c

commit 8d633312a182705b21baf1540cba486adbf6970d
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:47:13 2013 -0700

    Remove EBCDIC remappings
    
    Now that the tables are stored in native format, we shouldn't be doing
    remapping.
    
    Note that this assumes that the Latin1 casing tables are stored in
    native order; not all of this has been done yet.

M       handy.h
M       perly.c
M       pp.c
M       regcomp.c
M       regexec.c
M       utf8.c

commit 52cc960aaf71b41c38783740d1cf523e8fbdcd61
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 12:46:05 2013 -0700

    Add and use macro to return EBCDIC
    
    The conversion from UTF-8 to code point should generally be to the
    native code point.  This adds a macro to do that, and converts the
    core calls to the existing macro to use the new one instead.  The old
    macro is retained for possible backwards compatibility, though it
    probably should be deprecated.

M       handy.h
M       pp.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c
M       utf8.h

commit a438fcd3ee183d5302bb239afba43a270fb184b0
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 09:18:06 2013 -0700

    charnames: fix nit in comment

M       lib/_charnames.pm

commit 2bfacfc7e46455b8978689228c609d6915b19ab9
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 11:05:44 2013 -0700

    charnames: Make work in EBCDIC
    
    Now that mktables generates native tables, the only thing that was
    needed was to make U+ mean Unicode instead of native.

M       lib/_charnames.pm
M       lib/charnames.pm

commit 0144f96948b54002ee665d8864c3b401da8d5034
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 09:35:56 2013 -0700

    Unicode::UCD: Work on non-ASCII platforms
    
    Now that mktables generates native tables, it is a fairly simple matter
    to get Unicode::UCD to work on those platforms.

M       lib/Unicode/UCD.pm

commit 01adc1344760dd54ad2c0ded1d930e89a05b3347
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 22:16:38 2013 -0700

    mktables: Generate native code-point tables
    
    The output tables for mktables are now in the platform's native
    character set.  This means there is no change for ASCII platforms, but
    is a change for EBCDIC ones.
    
    Since we currently don't have any EBCDIC test platforms, I tested this
    by faking it out to generate EBCDIC data, and then eye-balled the
    results.
    
    Code that didn't realize there was a potential difference between EBCDIC
    and non-EBCDIC platforms will now start to work; code that tried to do
    the right thing under these circumstances will no longer work.  Fixing
    that comes in later commits.

M       lib/unicore/mktables

commit 4772b59d3178b78e07cededebb6efc329e470729
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 10:50:00 2013 -0700

    Fix some EBCDIC problems
    
    These spots have native code points, so should be using the macros for
    native code points, instead of Unicode ones.

M       regcomp.c
M       sv.c
M       toke.c

commit 0cdcc173fb29f594d5846ae530d1ea0116e534f3
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:10:19 2013 -0700

    Remove unnecessary temp variable in converting to UTF-8
    
    These areas of code included a temporary that is unnecessary.

M       inline.h
M       regcomp.c
M       sv.c

commit cb8d4e9256cc8a8943ce2959f3b086958ea047ba
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:00:55 2013 -0700

    utf8.h: Correct macros for EBCDIC
    
    These macros were incorrect for EBCDIC.  The 3 step process given in
    utfebcdic.h wasn't being followed.

M       utf8.h

commit fed6fd2f2e8a7302ce0b3328b5faaef9685a4d19
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 9 21:23:30 2013 -0700

    Extract common code to an inline function
    
    This fairly short paradigm is repeated in several places; a later commit
    will improve it.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_pack.c
M       proto.h
M       sv.c
M       toke.c
M       utf8.c

commit e3c099f6f88eaacedf7f64855730ac2518b00bc4
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 21:35:57 2013 -0700

    Don't use EBCDIC macro for a C language escape
    
    C recognizes '\a' (for BEL); just use that instead of a look-up.
    
    regen/unicode_constants.pl could be used to generate the character for
    the ESC (set in surrounding code), but I didn't do that because of
    potential bootstrapping problems when porting to an EBCDIC platform
    without a working perl.  (The other characters generated in that .pl are
    less likely to cause problems when compiling perl.)

M       regcomp.c
M       toke.c

commit 0e9e8cf0a9b69d5837c43edbedee845195b34018
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 19:53:38 2013 -0700

    Use byte domain EBCDIC/LATIN1 macro where appropriate
    
    The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
    whole Unicode range.  In the locations affected by this commit, it is
    known that the domain is limited to a single byte, so the simpler ones
    whose names contain LATIN1 may be used.
    
    On ASCII platforms, all the macros are null, so there is no effective
    change.

M       handy.h
M       regcomp.c
M       utf8.c

commit d3e20d27cf8939ce6f9737cd5133bb777357c73f
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 14:31:09 2013 -0700

    Use new clearer named #defines
    
    This converts several areas of code to use the more clearly named macros
    introduced in a recent commit

M       op.c
M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 31d1c4eb93e64179cccc6f60f09348a4c0391324
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 13:52:31 2013 -0700

    utf8.h, utfebcdic.h: Create less confusing #defines
    
    This commit creates macros whose names mean something to me, and I don't
    find confusing.  The older names are retained for backwards
    compatibility.  Future commits will fix bugs I introduced from
    misunderstanding the meaning of the older names.
    
    The older names are now #defined in terms of the newer ones, and moved
    so that they are only defined once, valid for both ASCII and EBCDIC
    platforms.

M       utf8.h
M       utfebcdic.h

commit e323b1d92e1e3357341184c862faee117f4c51a0
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 4 14:22:02 2013 -0700

    pp_ctl.c: Use isCNTRL instead of hard-coded mask
    
    This is clearer and portable to EBCDIC.

M       pp_ctl.c

commit 514c7c92e44ffd90b8db1bdc6a91ca9c2b89d597
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:51:05 2013 -0700

    utf8.c: is_utf8_char_slow() should use native length
    
    What is passed is the actual length of the native utf8 character.  What
    this was calculating was the length it would be if it were a Unicode
    character, and then compares, apples to oranges.

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to