[perl.git] branch khw/ebcdic, created. v5.17.10-195-g939f924

Karl Williamson Mon, 01 Apr 2013 20:57:44 -0700

In perl.git, the branch khw/ebcdic has been created

<http://perl5.git.perl.org/perl.git/commitdiff/939f9241d723e0292d4d2d084d50d726c1cd6938?hp=0000000000000000000000000000000000000000>


        at  939f9241d723e0292d4d2d084d50d726c1cd6938 (commit)

- Log -----------------------------------------------------------------
commit 939f9241d723e0292d4d2d084d50d726c1cd6938
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 20 22:21:15 2013 -0600

    XXX to compile on Linux

M       Configure
M       charclass_invlists.h
M       l1_char_class_tab.h
M       regcharclass.h
M       unicode_constants.h

commit 20e6de676447f910a0e5e3299661ef391cc2c72d
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 1 21:08:20 2013 -0600

    t/op/split.t: EBCDIC fixes

M       t/op/split.t

commit ff45254961717dba3bc801c1690e28fdbeb1e27c
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 1 20:43:03 2013 -0600

    re/pat_advanced.t: EBCDIC fixes
    
    This includes not skipping some EBCDIC that formerly was, since we now
    have testing infrastructure that makes this easy.

M       t/re/pat_advanced.t

commit 3abf932ebbdb06e92e4671050a2900bb909659e2
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 1 20:01:04 2013 -0600

    t/io/utf8.t: EBCDIC fixes

M       t/io/utf8.t

commit 3259eebb8935fa002a846d8421788ae54893599d
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 30 21:13:38 2013 -0600

    Unicode::UCD.pm: Nits

M       lib/Unicode/UCD.pm

commit 519bb31048d91fa16aa6ffdacfb9d05bc3287bfb
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 30 12:32:09 2013 -0600

    t/uni/fold.t: EBCDIC fixes

M       t/uni/fold.t

commit 06369a4e4c9aa91157f0e5f94bc80774282014a2
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 29 15:22:28 2013 -0600

    XXX t/op/tiehandle.t: skip for now; deep recursion

M       t/op/tiehandle.t

commit 186b6a7b724b47ff5f8741f002146158c7e776f9
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 29 14:56:16 2013 -0600

    XXX better commit msg utf8.c: Avoid unnecessary UTF-8 conversions
    
    This changes the code so that converting to UTF-8 is avoided unless
    necessary.  For such inputs, the conversion back from UTF-8 is also
    avoided.  The cost of doing this is that the first swatches are combined
    into one that contains the values for all characters 0-255, instead of
    having multiple swatches.  That means when first calculating the swatch
    it calculates all 256, instead of 128 (160 on EBCDIC).
    
    This also fixes an EBCDIC bug in which characters in this range were
    being translated twice.

M       utf8.c

commit c46325f342b0a4082abcf1ca3ed356f2c70facc0
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 29 13:34:59 2013 -0600

    utf8.c: No need to check for UTF-8 malformations
    
    This function assumes that the input is well-formed UTF-8, even though
    until this commit, the preferatory comments didn't say so.  The API does
    not pass the buffer length, so there is no way it could check for
    reading off the end of the buffer.  One code path already calls
    valid_utf8_to_uvchr(); this changes the remaining code path to correspond.

M       utf8.c

commit f12f9359b5c9ec108581068c07f70d696b0001c1
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 28 19:56:39 2013 -0600

    utf8.c: Remove redundant assignment.
    
    This variable is always set just below.

M       utf8.c

commit 055633db833b3f640d5092dc1ba1a88c90dade15
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 28 17:19:16 2013 -0600

    XXX enable _invlist_dump;

M       embed.fnc
M       embed.h
M       proto.h

commit ccc4928be13e819a6aed7f766b916648f8a071c5
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 11:01:32 2013 -0700

    XXX EBCDIC header files

M       charclass_invlists.h
M       l1_char_class_tab.h
M       regcharclass.h
M       unicode_constants.h

commit 6454f58c847020d4c01ad821e7d5bc73e50e4f55
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 12:26:15 2013 -0600

    hints/os390.sh: Suppress bogus compiler message

M       hints/os390.sh

commit b81f883038869b9c86fabb9bdb45836b35b6facd
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 12:31:25 2013 -0700

    XXX Temporary for z/OS long long support

M       Configure
M       hints/os390.sh

commit 05317c37f92f4501a2e80ef86b871ff0e40af6f1
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 18:17:28 2013 -0600

    Add test that to/from native character set works
    
    For non-ASCII systems, there are character set translation tables.  This
    makes sure the two accessible ones are inverses of each other.  If not,
    nothing can be expected to work right.

M       MANIFEST
A       t/base/translate.t

commit b9749fcc9df467ad1e064ab3edf2a868eb2ebb51
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 16:55:55 2013 -0600

    lib/feature/bundle: Fix some things to pass under EBCDIC

M       t/lib/feature/bundle

commit 82b50b87c9cfd8985f1f6099de07e13580088075
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 16:08:04 2013 -0600

    XS-APItest/t/fetch_pad_names.t: Skip if EBCDIC
    
    This could be ported, but there's a lot of stuff to convert; would need
    a function to convert byte strings that form legal UTF-8 into legal
    UTF-EBCDIC

M       ext/XS-APItest/t/fetch_pad_names.t

commit 4bd5eeb3a879e6f446ec92aa4a50cdb3d779a4e9
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 12:11:59 2013 -0600

    XXX Temporary lib/charnames.t, comment out to see if gets further

M       lib/charnames.t

commit a8c3606dc1f47c48c90711d792e0572206c9fc7c
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 12:05:53 2013 -0600

    XXX ext/XS-APItest/t/utf8.t: Fix so passes EBCDIC
    
    This involves skipping much of the tests.  Reexamine later

M       ext/XS-APItest/t/utf8.t

commit 764a732883f365f3fbabc5ba5a9b684de897c2bf
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 11:27:06 2013 -0600

    ext/re/t/re_funcs_u.t: Fix to work under EBCDIC

M       ext/re/t/re_funcs_u.t

commit aac151eecffdeba8f952694b0d27fee3cbe75c35
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 11:11:22 2013 -0600

    XXX dist/IO/t/io_utf8argv.t: Temporarily skip if EBCDIC

M       dist/IO/t/io_utf8argv.t

commit edfbc3dccbdf1c4283fed295c4800bdffee335f8
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 10:33:44 2013 -0600

    t/op/print.t: Skip an EBCDIC test
    
    This could be written (the values would probably change depending on the
    code page), but the code that would get exercised is unlikely to vary
    depending on character set.

M       t/op/print.t

commit e8c85df443729d98bc891189ffc30ddfede785ab
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 26 19:51:06 2013 -0600

    XXX skip folding tests

M       t/re/fold_grind.t
M       t/re/reg_fold.t
M       t/uni/fold.t

commit 0cc35a16cdc7aa3c49afdb5c00b8b8ed0de7506b
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 26 15:44:59 2013 -0600

    XXX t/TEST: Avoid SIGPIPEs

M       t/TEST

commit bceef461e99ab818a66dc3e642febe407c2fb6f7
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 26 15:49:08 2013 -0600

    XXX Temporarily test normalization

M       cpan/Unicode-Normalize/t/fcdc.t
M       cpan/Unicode-Normalize/t/form.t
M       cpan/Unicode-Normalize/t/func.t
M       cpan/Unicode-Normalize/t/illegal.t
M       cpan/Unicode-Normalize/t/norm.t
M       cpan/Unicode-Normalize/t/null.t
M       cpan/Unicode-Normalize/t/partial1.t
M       cpan/Unicode-Normalize/t/partial2.t
M       cpan/Unicode-Normalize/t/proto.t
M       cpan/Unicode-Normalize/t/split.t
M       cpan/Unicode-Normalize/t/test.t
M       cpan/Unicode-Normalize/t/tie.t

commit 86e1f78da579d88a658bb7348c4d2974342f8f2f
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 26 14:06:50 2013 -0600

    op/index.t: Fix tests for EBCDIC
    
    Commit 8a38a836 erroneously translates literals into the native
    encoding, causing a double translation, which is garbage.

M       t/op/index.t

commit 9ab6be462eb87b18902d3b70167b0b47eb94425a
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 20:43:38 2013 -0600

    op/chop.t: Fix for EBCDIC
    
    One test is skipped because the code point is not representable on
    EBCDIC platforms.  Another test is modified to work on EBCDIC.

M       t/op/chop.t

commit c26a78517686313b4bf5591eaab0a197a9de2bcb
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 19:56:50 2013 -0600

    t/op/lc.t: Fix to work under EBCDIC
    
    This had code that attempted this, but it was wrong.  The conversion to
    EBCDIC must be done before the \U, or similar.

M       t/op/lc.t

commit 5c078372112e38c7cea578d36bf923605cde8c15
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 16:37:04 2013 -0600

    XXX fix this later based on comment

M       dist/IO/t/io_utf8argv.t

commit 0fc9f4300daebc6ba88603389c15161fdd957e1e
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 15:33:55 2013 -0600

    Skip some tests under EBCDIC
    
    EBCDIC won't work on these because of inherent differences from ASCII

M       t/porting/customized.t
M       t/porting/manifest.t

commit b195d48cf798ccb7f927bc1284d4ef969a3961cd
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 15:04:14 2013 -0600

    porting/bincompat.t: Skip under EBCDIC
    
    because the sorting order is different

M       t/porting/bincompat.t

commit 43d9f5355a5e99947d713fa57b245a8e5ef036f2
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 14:59:50 2013 -0600

    t/re/regex_sets.t: So will pass under EBCDIC

M       t/re/regex_sets.t

commit d6eaa5aa542062f5d62f8a657365030ac97c8372
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 14:59:26 2013 -0600

    t/porting/bincompat.t: Typo in comment

M       t/porting/bincompat.t

commit 003826548446b97dfe3365953ca99d76e9e6ab4a
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 25 13:09:09 2013 -0600

    XXX fix \x{too large}

M       dist/IO/IO.xs
M       doop.c
M       inline.h
M       pp.c
M       pp_pack.c
M       regcomp.c
M       sv.c
M       toke.c
M       utf8.c
M       utf8.h

commit 1da50cdf2f8599b3dc4a92b4b74d9724328b77f3
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 24 17:59:59 2013 -0600

    mktables: Fix typo in comment
    
    This used a real CTRL-X, instead of $^X

M       lib/unicore/mktables

commit 073a7d186ef27a1f88b51485a2c40f1cdef2936f
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 24 13:16:08 2013 -0600

    utf8.c: Fix so UTF-16 to UTF-8 conversion works under EBCDIC

M       utf8.c

commit b6f80c62b1d073def8fa1412b4c5d0a678653e20
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 24 13:14:34 2013 -0600

    utf8.h, utfebcdic.h: Add #define

M       utf8.h
M       utfebcdic.h

commit 2421111a52702846d5a5c3178d38826ec23fbe9e
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 24 13:11:25 2013 -0600

    utf8.c: Use mnemonics instead of hex numbers

M       utf8.c

commit cb85a949c31d3523708148969c7e08a22544d822
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 20 22:15:58 2013 -0600

    lib/Unicode/UCD.t: Allow to run under EBCDIC,

M       lib/Unicode/UCD.t

commit 3e51eeade80e3ce6dedeb857496eba35afe51e7c
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 15:27:31 2013 -0600

    t/op/quotemeta.t: EBCDIC fixes

M       t/op/quotemeta.t

commit 7b76842e1e14c171a80ecc1af21caa4e2894ee43
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 11:32:55 2013 -0600

    t/re/fold_grind.t: Fixes for EBCDIC

M       t/re/fold_grind.t

commit f87f308bc8e851125ee228c0d9c2ebff5c5c0623
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 11:21:09 2013 -0600

    t/lib/charnames/alias: Fix some EBCDIC problems

M       t/lib/charnames/alias

commit 3c75209d1bbebf85874ef304bd6500f0662eff0c
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 11:20:24 2013 -0600

    t/uni/class.t: Make work on EBCDIC

M       t/uni/class.t

commit 8c4693a99a6c22083e00e0fcd1a17f9f77fc5186
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 11:01:57 2013 -0600

    feature/unicode_strings.t: Fix to work on EBCDIC

M       lib/feature/unicode_strings.t

commit c486c5a0fb8557e26a3c044f999b29caf124b4c4
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 10:12:30 2013 -0600

    regen/regcharclass.pl: make more EBCDIC friendly
    
    One of the possible inputs to this process is a string.  This clarifies
    that it must be specified in Unicode characters, and adds code to
    translate it to native, if necessary.

M       regen/regcharclass.pl

commit 5e92a66e1e3bb528a8956ab5d50be9dfffb981fc
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 10:10:46 2013 -0600

    XXX regen/regcharclass.pl: maybe temp comment out utf8_char

M       regen/regcharclass.pl

commit 2ff621a6a1603631fa187585a2d6946194c4708b
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 19 10:09:53 2013 -0600

    XXX temporary comment out multi-char folds

M       regcomp.c
M       regen/regcharclass.pl
M       regexec.c
M       t/re/fold_grind.t
M       t/re/reg_fold.t

commit aa4938fe4e55528e76b62efc56d72b1a60e1f57e
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 18 22:00:29 2013 -0600

    XXX temp skip perl5db.t

M       lib/perl5db.t

commit aaed4cbdd3a2e0b567b1de6f680528a138aa99a9
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 18 11:45:06 2013 -0600

    pp.c: White-space only
    
    Make a ternary operation more clear

M       pp.c

commit 1eec681671f5286e72b1650d75951534b6cec352
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 18 11:43:42 2013 -0600

    Fix valid_utf8_to_uvchr() for EBCDIC

M       utf8.c

commit 1bfc853569c9cd5d9c053d9ba93c42720d5e1c20
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 17 21:42:20 2013 -0600

    t/test.pl: Add comment about EBCDIC

M       t/test.pl

commit c5286b8a496649fe9c81470dd0a34bb1feefbf90
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 17 17:39:33 2013 -0600

    XXX makedepend.SH: Why does 255 work and 250 not?

M       makedepend.SH

commit 0763d0dab028846302c76846cb8a8d97d4480219
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:48:22 2013 -0600

    XXX regen/mk_PL_charclass.pl: Make EBCDIC friendly
    
    need more of a commit message

M       regen/mk_PL_charclass.pl

commit eaa7fd6a65a09ea01fd6139858f22d8e9c6783d5
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:44:44 2013 -0600

    XXX make various things more EBCDIC friendly
    
    Adds trailing white space errors
    Need to know what to do about ^A meaning 0x1, and M-foo meaning meta

M       lib/DB.pm
M       lib/dumpvar.pl
M       lib/perl5db.pl
M       lib/sigtrap.pm

commit 46d37f258cd1352def18b58e009f374ac9e19d36
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:41:57 2013 -0600

    XXX charnames.t: Make more EBCDIC friendly
    
    Why need utf8::unicode_to_native

M       lib/charnames.t

commit 7d2b927e6dcd67684ae9f3f07ddd1cfc55849c5f
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 22:41:15 2013 -0600

    XXX: Fixup commit message.
    
    Fix UTF8_ACUUMULATE, utf8.c

M       utf8.c
M       utf8.h

commit 4ad649830424caba4d524baa7938be9e5f6389ec
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 16 16:52:45 2013 -0600

    regcomp.c: Fix bug in EBCDIC
    
    The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
    code points, but under EBCDIC those code points are 0-127.

M       regcomp.c

commit 12986da3af444fbc830995b7d465c35805fc33a1
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 11:57:24 2013 -0600

    re/charset.t: Allow to work on EBCDIC
    
    This just converts the hard-coded character numbers to native, so will
    work on any platform.

M       t/re/charset.t

commit d7fefdb63ce8aa4ff6d61aed45a3a62c4c837965
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 15 11:50:35 2013 -0600

    XS-APItest/t/handy.t: Change output message
    
    On EBCDIC platforms, the output is not in terms of \N{U+}; change text
    to \x{ }

M       ext/XS-APItest/t/handy.t

commit 618559aaa77ff3422675d6a2762d4f7ded96b051
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 21:44:16 2013 -0600

    XXX Dumper.xs: Don't know why this stopped compiling

M       dist/Data-Dumper/Dumper.xs

commit ad50dbcbfdf1516ad449e230470dc9e787fc15aa
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:22:28 2013 -0600

    toke.c: Fix an ASCII-platform dependency

M       toke.c

commit bb0c89f59f779db8413b6c84fe40bf05b7de6121
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:20:23 2013 -0600

    toke.c: Simplify some code
    
    We don't have to test separately for lower vs uppercase here, as
    upper/lower case A-Z and a-z are not intermixed in the gaps in A-Z and
    a-z under EBCDIC.

M       toke.c

commit 497cbe345758836f34f7c4e65973d077d798c3c2
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:18:12 2013 -0600

    genpacksizetables.pl: Correct comment typo

M       genpacksizetables.pl

commit 7279160b703ca9cd175ec61b186c19145ceae776
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:17:39 2013 -0600

    APItest/t/handy.t: Make EBCDIC-friendly

M       ext/XS-APItest/t/handy.t

commit caafa998602fc1314b7b08a77f2a81c443f84e62
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:16:14 2013 -0600

    Data-Dumper: Make EBCDIC-friendly

M       dist/Data-Dumper/Dumper.xs

commit 35bffc33ab645e7838782814fdb2e967d533ba47
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:14:31 2013 -0600

    sv.c: Make less ASCII-centric

M       sv.c

commit f4e8c34033efdac46bb1957464a3c5939d141b34
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:07:52 2013 -0600

    lib/charnames.t: Make some tests work under EBCDIC

M       lib/charnames.t

commit 042278bfdc6954993ff49058d981c7ea62367f0e
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:05:46 2013 -0600

    dump.c: Make less ASCII-centric:
    
    This has the added advantage of being clearer as to what is going on.

M       dump.c

commit 21680756c5c7ed3e3f99f6643e37ff841ecdb5b9
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 13 16:02:52 2013 -0600

    hv.c: Stop being ASCII-centric
    
    This uses macros which work cross-platform.  This has the added advantge
    that it is much clearer what is going on.

M       hv.c

commit f087d60e9443956376d8958a96ac2c4819758512
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 12 22:34:17 2013 -0600

    t/TEST: Don't bail if fails in t/base unless minitest
    
    In order to completely compile Perl, many modules must have been parsed
    and compiled, so if there is a full perl, we know that things basically
    work.  The purpose of bailing out is that if these supposedly very base
    level functionality tests don't work, there's no point in continuing.
    But over the years, tests of more esoteric functionality have been
    added here, and if one of them doesn't work, it still could be that Perl
    pretty much does work.
    
    I believe it would be best to move such non-basic tests elsewhere, but
    that's work, and hasn't bitten us much so far; this change lessens the
    severity of the biting even more.  Where it will really bite is if
    things are so bad that a full perl binary can't be compiled, and we are
    trying to figure out why using minitest.

M       t/TEST

commit eec5fb872bb781e80b96a10c234b4bbf1360a57b
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 11 15:11:10 2013 -0600

    Added Porting/reorder_charclass_invlists.pl
    
    This program is used too bootstrap perl onto a non-ASCII platform with
    no pre-existing perl.

M       MANIFEST
A       Porting/reorder_charclass_invlists.pl

commit 81980159f91aa40a1370afde416685f69b6b7283
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 22:17:31 2013 -0600

    t/base/lex.t: Use char suitable for both ASCII and EBCDIC
    
    \xE2 is 'S' in EBCDIC, and so is going to be legal.  \xDF is an alpha
    which has no ASCII equivalent in either character set

M       t/base/lex.t

commit d42bef1e8a6bcbd80ab74c736d65ce18463c86c0
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 13:11:07 2013 -0600

    XXX Temporary comment out ParseXS check
    
    this is to get things to compile for now

M       dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

commit 7365074caa30ceb58cbaa3c44ff8e7a34685acc6
Author: Karl Williamson <[email protected]>
Date:   Sun Mar 10 11:34:10 2013 -0600

    XXX Collate, Normalize: Allow to compile under EBCDIC

M       cpan/Unicode-Collate/Collate.pm
M       cpan/Unicode-Collate/mkheader
M       cpan/Unicode-Normalize/Normalize.pm
M       cpan/Unicode-Normalize/mkheader

commit be2fce06f86a61881156fbe4adf11962893d6b6e
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 9 21:57:38 2013 -0700

    XXX dquote_static.c: Silence wrong warning on EBCDIC
    
    Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
    which should be a separate commit anyway.
    
    This silences an inappropriate warning that doesn't happen on ASCII
    platforms.  CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms.  But
    0x14 is a C1 control on EBCDIC, a C0 on ASCII.  Therefore the test that
    it's a control should include both C0 and C1, which isCNTRL_L1() does.
    
    Also has a white-space change, outdenting a line so it doesn't wrap in
    an 80 column window.

M       dquote_static.c

commit 3b79671771ce92453f272e8e4c5f4199c86dfd14
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 7 12:08:41 2013 -0700

    utfebcdic.h: Change 'unsigned char' to U8
    
    This is for consistency with the rest of Perl

M       utfebcdic.h

commit 6cdfae1555a06d60f172405456e44d63578c951d
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 08:11:38 2013 -0700

    regen/regcharclass.pl: Make more EBCDIC-friendly
    
    This commit changes the code generated by the macros so that they work
    right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs.  THEY
    ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
    onto the target platform, and regcharclass.pl can be run there,
    generating macros correct UTF-8.

M       regcharclass.h
M       regen/regcharclass.pl

commit b94eb2a9b4666aeda9d9862c12fdd7ce973fddf1
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:30:01 2013 -0700

    utfebcdic.h: Add (UV) cast
    
    The operand of this macro is implicitly a UV.  Make sure that it is.

M       utfebcdic.h

commit e0eb45f5d5e21382524b388cee5578bde88ff21f
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 17:04:58 2013 -0700

    handy.h: Allow bootstrapping to non-ASCII platform
    
    This adds a bunch of macros and moves things around to support
    conditional compilation when Configure is called with
    -DBOOTSTRAP_CHARSET.  Doing so causes the usual macros that are
    table-driven to not be used, since the table may not be valid when
    bringing Perl up for the first time on a non-ASCII platform.
    
    This allows it to compile using the platform's native C library ctype
    functions, which should work enough to compile miniperl, and allow the
    table to be changed to be valid.  Then Configure can be re-run to not
    bootstrap, and normal compilation can proceed

M       handy.h
M       inline.h

commit ffa03a5f40a5588769048e296701fb081e5bf0be
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:43:26 2013 -0700

    gv.c: Remove EBCDIC dependency

M       gv.c

commit 4cc6cd5866fec07ef8806b1d1456b046a89c219f
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:00:47 2013 -0700

    toke.c: Remove EBCDIC dependency

M       toke.c

commit 8cfec024ebedbc5ba59eba151c55a732d8212eea
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:14:25 2013 -0700

    toke.c: Remove character set dependency
    
    Instead of hard-coding the bit patterns that comprise the Byte Order
    Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
    the current platform.
    
    This removes some EBCDIC-only code.

M       toke.c

commit 6784624dd783704fabcb64eff9aa3cbb7bb14d18
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:10:27 2013 -0700

    unicode_constants.h: Add #defines for Byte Order Mark
    
    These will be used in future commits

M       regen/unicode_constants.pl
M       unicode_constants.h

commit 24c47803eedcb26a34635f02473d936f0f764664
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 15:04:18 2013 -0700

    XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
    
    This macro may not be present, and is currently used exclusively in
    IS_UTF8_CHAR, which itself may be undefined, and code should cope with
    that.  This is a work-around until a better solution is found.

M       utf8.c
M       utf8.h

commit 5c759033be7e87690843893c5806f0197cace8e2
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 14:09:04 2013 -0700

    Add Porting tool for help with non-ASCII platforms
    
    Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
    non-ASCII platform with no working Perl.

M       MANIFEST
A       Porting/reorder_l1_char_class_tab.pl
M       regen/mk_PL_charclass.pl

commit 5354f9d4ef512e6690ed493841ecc97f2166d729
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 13:06:58 2013 -0700

    inline.h: Reorder functions
    
    The comment implied that the functions below it in the file were
    deprecated, but in fact only the next two functions were.  This
    clarifies that and moves them so they are the final ones in the file

M       inline.h

commit b5cbff6bd44cebb2e19b4f3ee6b9aaf65a2b1971
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:33:42 2013 -0700

    utfebcdic.h: Add comment

M       utfebcdic.h

commit cfb6be2d802311f51de1530a332af88c80bd8d77
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:12:11 2013 -0700

    utf8.h: Clean up START_MARK definition and use
    
    The previous definition broke good encapsulation rules.  UTF_START_MARK
    should return something that fits in a byte; it shouldn't be the caller
    that does this.  So the mask is moved into the definition.  This means
    it can apply only to the portion that creates something larger than a
    byte.  Further, the EBCDIC version can be simplified, since 7 is the
    largest possible number of bytes in an EBCDIC UTF8 character.

M       utf8.h
M       utfebcdic.h

commit 07c82166c0dfd9f309490e04b62b78028757bed4
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:05:26 2013 -0700

    utf8.h: Move #includes
    
    These two files were only being #included for non-ebcdic compiles; they
    should be included always.

M       utf8.h

commit feb695e9ba82d6937a2d817c62a8f7484aa3bb54
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 11:49:14 2013 -0700

    utfebcdic.h: Remove extra parameter expansions
    
    These two macros were improperly expanding the parameters as well as
    defining the operation, leading to compile errors.

M       utfebcdic.h

commit 2a50367de7835b49de6626f07199ab71b83401cf
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 1 08:28:52 2013 -0700

    utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
    
    These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
    UTF8_TWO_BYTE_LO.  But the EIGHT_BIT versions can use the less general
    and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
    domain is restricted in the EIGHT_BIT.  Note that on ASCII platforms,
    these both expand to the same thing, so the difference matters only on
    EBCDIC.

M       utf8.h

commit 189618ca3f331df05bb9540318d53b6a1f689668
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 09:25:27 2013 -0700

    XXX temp:  show makedepend cerr

M       makedepend.SH

commit dbfa073d5c41e002400b4b7b53356a8b3f57ce0c
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit 8fc175095f389ed4b30b44a7adf3997b82da6b38
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit 88d280674b681ef078fdb47e8bf56493aa0a0048
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit de119e1a3b5c2d97511628a77dfa80f0ebcf0839
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit 253f31c17443efb5db7b17d2dcc2c85a544e296a
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 08:38:19 2013 -0700

    handy.h: Remove extraneous parens

M       handy.h

commit a4c1c9bd7de597fc9f0d52fb4a2812b213d1a28a
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <[email protected]> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit eb459497fd538bf650df03c34b03c94764afa900
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <[email protected]> for reporting this.

M       hints/os390.sh

commit d09a23d0f5a01660409a2411d35d891f4a478b7e
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:45:19 2013 -0700

    utf8.c: Use more clearly named macro
    
    In the case of invariants these two macros should do the same thing,
    but it seems to me that the latter name more clearly indicates what is
    going on.

M       utf8.c

commit e1e2c39dffd6cad0a13d3d5c507d4df54cd1817c
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:35:12 2013 -0700

    Add macro OFFUNISKIP
    
    This means use official Unicode code point numbering, not native.  Doing
    this converts the existing UNISKIP calls in the code to refer to native
    code points, which is what they meant anyway.  The terminology is
    somewhat ambiguous, but I don't think will cause real confusion.
    NATIVESKIP is also introduced for situations where it is important to be
    precise.

M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit aa8c3c3012a52cd67a6d6e265854658b0d25a3c8
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:22:19 2013 -0700

    toke.c: white space only

M       toke.c

commit 2ed2cd48a7d1c51e2754cbecaeb54660493ef2f7
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 12:08:50 2013 -0700

    utf8.c: Deprecate two functions
    
    This is to force any code that has been using these functions to change.
    Since the Unicode tables are now stored in native order, these functions
    should only rarely be needed.
    
    However, the functionality of these is needed, and in actuality, on
    ASCII platforms, the native functions are #defined to these.  So what
    this commit does is rename the functions to something else, and create
    wrappers with the old names, so that anyone using them will get the
    deprecation.

M       embed.fnc
M       embed.h
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 2c1ccbeeef9fba68b596ec38079e6961c2e8f7d9
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:26:09 2013 -0700

    Deprecate uvuni_to_utf8()
    
    Code should almost never be dealing with non-native code points

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 02d8a6706fb92d89979c3d0714e7d9140b3af2e0
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:02:33 2013 -0700

    Deprecate utf8_to_uni_buf()
    
    Now that the tables are stored in native order, there is almost no need
    for code to be dealing in Unicode order.

M       embed.fnc
M       proto.h
M       utf8.c

commit a1697544d95978c2379ff795972e4880f68f3733
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit 42abe7c670a392f5c827c2d7d9e4ef39f365953a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:26:44 2013 -0700

    Deprecate valid_utf8_to_uvuni()
    
    Now that all the tables are stored in native format, there is very
    little reason to use this function; and those who do need this kind of
    functionality should be using the bottom level routine, so as to make it
    clear they are doing nonstandard stuff.

M       embed.fnc
M       proto.h
M       utf8.c

commit a001f01ba7dac7477861be5209392e0cf78bfa7e
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:14:26 2013 -0700

    utf8.c: Swap which fcn wraps the other
    
    This is in preparation for the current wrapee becoming deprecated

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
M       utf8.h

commit a0dfd3deaa20fd1c427cd60e09936c0b70b7bdaf
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:29:34 2013 -0700

    utf8.c: Skip a no-op
    
    Since the value is invariant under both UTF-8 and not, we already have
    it in 'uv'; no need to do anything else to get it

M       utf8.c

commit 81b338a34f9e0de0822bf4603b741d0ae6b69858
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:26:50 2013 -0700

    utf8.c: Move comment to where makes more sense

M       utf8.c

commit b68488ac984c4bd1ecc75f5c1b3110771d1969e9
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:30:10 2013 -0700

    APItest: Test native code points, instead of Unicode

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 52d893b0b080a84b6276d98734dfea9fbea9ee2c
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:25:08 2013 -0700

    XXX CPAN Normalize
    
    This converts Unicode::Normalize to use the native tables that are used
    by Perl starting in XXX, while using the Unicode-ordered ones that were
    used before then.
    
    Another alternative would be to have mktables generate just these tables
    in Unicode ordering.

M       cpan/Unicode-Normalize/Normalize.xs

commit 0919a8c8ca78e5720e1b705b0683f685a8dffac1
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:22:55 2013 -0700

    XXX CPAN prob wrong Collate
    
    This changes to implicity usenative code points.  This is likely wrong,
    as the module comes with its own data, that are probably in terms of
    Unicode

M       cpan/Unicode-Collate/Collate.xs

commit e1ab0d15e9a48d5cd99216eaea0c15201d5ea842
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:12:53 2013 -0700

    XXX CPAN Encode.xs
    
    Use core function if available.  This will insulate this code from any
    future changes.

M       cpan/Encode/Encode.xs

commit 638f5db6f93b6f6b38604a02c4d256d9cba52e51
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:04:24 2013 -0700

    XXX CPAN and unsure Encode

M       cpan/Encode/Encode.xs
M       cpan/Encode/Unicode/Unicode.xs

commit b1b4a8898cc78b14088357d81eacc1ac9d30f2e6
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:00:47 2013 -0700

    XXX CPAN Encode.xs: fix indent

M       cpan/Encode/Encode.xs

commit 835ad6a355310e6319a568aef7ce925167405540
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 17:23:15 2013 -0700

    Don't refer to U+XXXX when mean native
    
    These messages say the output number is Unicode, but it is really
    native, so change to saying is 0xXXXX.

M       regen/regcharclass_multi_char_folds.pl
M       regexec.c

commit 181e16f67b928929ce0fd592806a10c16b21cff5
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:43:59 2013 -0700

    Convert some uvuni() to uvchr()
    
    All the tables are now based on the native character set, so using
    uvuni() in almost all cases is wrong.

M       cygwin/cygwin.c
M       doop.c
M       op.c
M       pp_pack.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c

commit 21bf10bb7fb621369ce11a6dbfc07e38077a681f
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:25:47 2013 -0700

    handy.h: White space only

M       handy.h

commit ad97c384e9644b7d87867f5c6eac3810178d23f9
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:19:49 2013 -0700

    t/test.pl: Allow native/latin1 string conversions to work on utf8.
    
    These functions no longer have the hard-coded definitions in them,
    but now end up resolving to internal functions, so that new encodings
    could be added and these would automatically understand them.
    
    Instead of using tr///, these now go character by character and
    converting to/from ord, which is slower, but allows them to operate on
    utf8 strings.
    
    Peephole optimization should make these essentially no-ops on ascii
    platforms.

M       t/test.pl

commit ea634ecda01ff58e3bf8fd0832492dbceaae6d40
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:05:55 2013 -0700

    t/test.pl: Simplify ord to/from native fcns
    
    This commit changes these functions from converting to/from a string to
    calling utf8:: functions which operate on ordinals instead.

M       t/test.pl

commit eaad2876db01d65249827b3d2ef5535a61f14a44
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:35:38 2013 -0700

    Make casing tables native
    
    These are final tables that haven't been converted to native character
    set casing.

M       perl.h
M       utfebcdic.h

commit ff1e3df145bc4b60f313d307d0b696d350e4a694
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:32:30 2013 -0700

    utfebcdic.h: Remove trailing spaces

M       utfebcdic.h

commit 1b0faa78e65d34ad937520471b4078a12d5b4b4c
Author: Karl Williamson <[email protected]>
Date:   Fri Feb 22 18:55:26 2013 -0700

    EBCDIC has the unicode bug too
    
    We have not had a working modern Perl on EBCDIC for some years.  When I
    started out, comments and code led me to conclude erroneously that
    natively it supported semantics for all 256 characters 0-255.  It turns
    out that I was wrong; it natively (at least on some platforms) has the
    same rules (essentially none) for the characters which don't correspond
    to ASCII onees, as the rules for these on ASCII platforms.
    
    A previous commit for 5.18 changed the docs about this issue.  This
    current commit forces ASCII rules on EBCDIC platforms (even should there
    be one that natively uses all 256).  To get all 256, the same things
    like 'use feature "unicode_strings"' must now be done.

M       handy.h

commit 1e3138c53e263cb56c0c4a3b986af1f631f088c4
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:47:52 2013 -0700

    handy.h: Solve a failure to compile problem under EBCDIC
    
    handy.h is included in files that don't include perl.h, and hence not
    utf8.h.  We can't rely therefore on the ASCII/EBCDIC conversion
    macros being available to us.  The best way to cope is to use the native
    ctype functions.  Most, but not all, of the macros in this commit
    currently resolve to use those native ones, but a future commit will
    change that.

M       handy.h

commit 808efba14a00b6d41a80dbc3712510d2abead5a9
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:35:12 2013 -0700

    handy.h: Simplify some macro definitions
    
    Now, only one of the macros relies on magic numbers (isPRINT), leading
    to clearer definitions.

M       handy.h

commit 73f1343f04455f47faf333b4e9fb7317fd657817
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:26:49 2013 -0700

    handy.h: Combine macros that are same in ASCII, EBCDIC
    
    These 4 macros can have the same RHS for their ASCII and EBCDIC
    versions, so no need to duplicate their definitions
    
    This also enables the EBCDIC versions to not have undefined expansions
    when compiling without perl.h

M       handy.h

commit bf4d55027aa63551947e3a18d6aa9adb00723364
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:39:48 2013 -0700

    Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
    
    These macros are no longer called in the Perl core.  This commit turns
    them into functions so that they can use gcc's deprecation facility.
    
    I believe these were defective right from the beginning, and I have
    struggled to understand what's going on.  From the name, it appears
    NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
    appropriate parameter indicates that.  But that is impossible to do
    correctly from that API, as for variant characters, it needs to return
    two bytes.  It could only work correctly if ch is an I8 byte, which
    isn't native, and hence the name would be wrong.
    
    Similar arguments for ASCII_TO_NEED.
    
    The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
    does what I think NATIVE_TO_NEED intended.

M       embed.fnc
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.h
M       utfebcdic.h

commit 927bf338cf1cf6fbc53e8d682ed738186e166752
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:26:43 2013 -0700

    Remove remaining calls of NATIVE_TO_NEED
    
    These calls are just copying the input to the output byte by byte.
    There is no need to worry about UTF-8 or not, as the output is just an
    exact copy of the input

M       toke.c

commit fcd3bb989fc588abbf2d470981c7de81eaf1665f
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:12:15 2013 -0700

    toke.c: Remove some NATIVE_TO_NEED calls
    
    I believe NATIVE_TO_NEED is defective, and will remove it in a future
    commit.  But, just in case I'm wrong, I'm doing it in small steps so
    bisects will show the culprit.  This removes the calls to it where the
    parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
    result can't be other than just the parameter.

M       toke.c

commit ac8e135e6fdf9b5832af054aa0d559ae14c7c6ba
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:22:07 2013 -0700

    toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
    
    This code is attempting to deal with the problem of holes in the ranges
    a-z and A-Z in EBCDIC.  Prior to this patch, it accepeted things like A
    WITH GRAVE, etc, which shouldn't have the special processing to deal
    with the holes

M       toke.c

commit 486af408521246dce85d1f9a2dc20dc47028be26
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 19 15:13:19 2013 -0700

    Use real illegal UTF-8 byte
    
    The code here was wrong in assuming that \xFF is not legal in UTF-8
    encoded strings.  It currently doesn't work due to a bug, but that may
    eventually be fixed: [perl #116867].  The comments are also wrong that
    all bytes are legal in UTF-EBCDIC.
    
    It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
    (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
    of an illegal overlong sequence.
    
    This creates a #define for an illegal byte using one of the real illegal
    ones, and changes the code to use that.
    
    No test is included due to #116867.

M       op.c
M       toke.c
M       utf8.h

commit 8f081a7236ff5513fe24a4e270cc89908806bf71
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 14:00:13 2013 -0700

    toke.c: Don't remap \N{} for EBCDIC
    
    Everything is now in native,

M       toke.c

commit fe0abada5f5c95f25e4bbeb36210f34ec0630d27
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:50:45 2013 -0700

    toke.c: Remove remapping for EBCDIC for octal
    
    The code prior to this commit converted something like \04 into its
    EBCDIC equivalent only in double-quoted strings.  This was not done in
    patterns, and so gave inconsistent results.  The correct thing to do
    should be to do the native thing, what someone who works on a platform
    would think \04 do.  Platform independent characters are available
    through \N{}, either by name or by U+.
    
    The comment changed by this was wrong, as in some cases it was native,
    and in some cases Unicode.

M       toke.c

commit 5e01255a91ab53399921bc02114186ce35b57191
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:47:13 2013 -0700

    Remove EBCDIC remappings
    
    Now that the tables are stored in native format, we shouldn't be doing
    remapping.
    
    Note that this assumes that the Latin1 casing tables are stored in
    native order; not all of this has been done yet.

M       handy.h
M       perly.c
M       pp.c
M       regcomp.c
M       regexec.c
M       utf8.c

commit 4917d3d2e2eb04b80ea165aa9923a2418583b940
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 12:46:05 2013 -0700

    Add and use macro to return EBCDIC
    
    The conversion from UTF-8 to code point should generally be to the
    native code point.  This adds a macro to do that, and converts the
    core calls to the existing macro to use the new one instead.  The old
    macro is retained for possible backwards compatibility, though it
    probably should be deprecated.

M       handy.h
M       pp.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c
M       utf8.h

commit b4b1e68c2a9344d0f7b05c25e2b5efe692573c1c
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 09:18:06 2013 -0700

    charnames: fix nit in comment

M       lib/_charnames.pm

commit c09a71ec04c75d392aa81a9311f95a341974ee78
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 11:05:44 2013 -0700

    charnames: Make work in EBCDIC
    
    Now that mktables generates native tables, the only thing that was
    needed was to make U+ mean Unicode instead of native.

M       lib/_charnames.pm
M       lib/charnames.pm

commit e217bb4903778fa8ea4f82c1e313724a6bf99741
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 09:35:56 2013 -0700

    Unicode::UCD: Work on non-ASCII platforms
    
    Now that mktables generates native tables, it is a fairly simple matter
    to get Unicode::UCD to work on those platforms.

M       lib/Unicode/UCD.pm

commit cfa5852055a5d2b6e63adf5604ba0aa359e5cf2c
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 27 17:01:24 2013 -0600

    Unicode::UCD: Typo in comment

M       lib/Unicode/UCD.pm

commit 8bb9ab983f28d7c310b969b660c7c6e6d5f9b1b6
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 22:16:38 2013 -0700

    mktables: Generate native code-point tables
    
    The output tables for mktables are now in the platform's native
    character set.  This means there is no change for ASCII platforms, but
    is a change for EBCDIC ones.
    
    Since we currently don't have any EBCDIC test platforms, I tested this
    by faking it out to generate EBCDIC data, and then eye-balled the
    results.
    
    Code that didn't realize there was a potential difference between EBCDIC
    and non-EBCDIC platforms will now start to work; code that tried to do
    the right thing under these circumstances will no longer work.  Fixing
    that comes in later commits.

M       lib/unicore/mktables

commit 54bbb311ee53eee48b41d4aa0f7926534de32c6a
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 10:50:00 2013 -0700

    Fix some EBCDIC problems
    
    These spots have native code points, so should be using the macros for
    native code points, instead of Unicode ones.

M       regcomp.c
M       sv.c
M       toke.c

commit 11b65aba37003f3fb0efd391e3ec1274193e5e2c
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:10:19 2013 -0700

    Remove unnecessary temp variable in converting to UTF-8
    
    These areas of code included a temporary that is unnecessary.

M       inline.h
M       regcomp.c
M       sv.c

commit 3270112deb08fbcb6c5a9649c255ed912eb4291d
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:00:55 2013 -0700

    utf8.h: Correct macros for EBCDIC
    
    These macros were incorrect for EBCDIC.  The 3 step process given in
    utfebcdic.h wasn't being followed.

M       utf8.h

commit e91c781380941a9903120dd6f09710095293c1e5
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 9 21:23:30 2013 -0700

    Extract common code to an inline function
    
    This fairly short paradigm is repeated in several places; a later commit
    will improve it.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_pack.c
M       proto.h
M       sv.c
M       toke.c
M       utf8.c

commit 185a63c6de79c6cd8c7365ab9b0acaf8d5dfca94
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 21:35:57 2013 -0700

    Don't use EBCDIC macro for a C language escape
    
    C recognizes '\a' (for BEL); just use that instead of a look-up.
    
    regen/unicode_constants.pl could be used to generate the character for
    the ESC (set in surrounding code), but I didn't do that because of
    potential bootstrapping problems when porting to an EBCDIC platform
    without a working perl.  (The other characters generated in that .pl are
    less likely to cause problems when compiling perl.)

M       regcomp.c
M       toke.c

commit ec33453cc6aea1ace500e9a9d5e7885dbd08abb8
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 19:53:38 2013 -0700

    Use byte domain EBCDIC/LATIN1 macro where appropriate
    
    The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
    whole Unicode range.  In the locations affected by this commit, it is
    known that the domain is limited to a single byte, so the simpler ones
    whose names contain LATIN1 may be used.
    
    On ASCII platforms, all the macros are null, so there is no effective
    change.

M       handy.h
M       regcomp.c
M       utf8.c

commit 121d165d1ab39d4429c7013f68f568c9f36c7460
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 14:31:09 2013 -0700

    Use new clearer named #defines
    
    This converts several areas of code to use the more clearly named macros
    introduced in a recent commit

M       op.c
M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 60a8e87c9b2594aafbc0ba67266b77bcd9268d14
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 13:52:31 2013 -0700

    utf8.h, utfebcdic.h: Create less confusing #defines
    
    This commit creates macros whose names mean something to me, and I don't
    find confusing.  The older names are retained for backwards
    compatibility.  Future commits will fix bugs I introduced from
    misunderstanding the meaning of the older names.
    
    The older names are now #defined in terms of the newer ones, and moved
    so that they are only defined once, valid for both ASCII and EBCDIC
    platforms.

M       utf8.h
M       utfebcdic.h

commit 3c99c96b6594d6677f205609478e89d89bd1587c
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 4 14:22:02 2013 -0700

    pp_ctl.c: Use isCNTRL instead of hard-coded mask
    
    This is clearer and portable to EBCDIC.

M       pp_ctl.c

commit 18337bf95a417fbf0568ec68d362ab0b6a16fe03
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:51:05 2013 -0700

    utf8.c: is_utf8_char_slow() should use native length
    
    What is passed is the actual length of the native utf8 character.  What
    this was calculating was the length it would be if it were a Unicode
    character, and then compares, apples to oranges.

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch khw/ebcdic, created. v5.17.10-195-g939f924

Reply via email to