In perl.git, the branch khw/ebcdic has been created

<http://perl5.git.perl.org/perl.git/commitdiff/1666902a7f0e665fde904483048433402fef70b8?hp=0000000000000000000000000000000000000000>

        at  1666902a7f0e665fde904483048433402fef70b8 (commit)

- Log -----------------------------------------------------------------
commit 1666902a7f0e665fde904483048433402fef70b8
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 7 12:08:41 2013 -0700

    XXXu8

M       utfebcdic.h

commit 5862ed4338593c1d59b0178f88240a6444392f60
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:32:42 2013 -0700

    Revert "XXX get Configure to work on Linux"
    
    This reverts commit 587944ecf24503eddf45df4acf45ae60da17030d.

M       Configure

commit d3ecc9881bd8db224deb511f7a64fc900e812408
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:09:29 2013 -0700

    XXX get Configure to work on Linux

M       Configure

commit 11ecd11c92b627092d3412cc613b221f4bec8be5
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 08:11:38 2013 -0700

    l

M       regcharclass.h
M       regen/regcharclass.pl

commit e19baab085c00bc58dc375dc73655a6cccfb8143
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:47:21 2013 -0700

    XXX: Turn off debug tracing in perly.c
    
    THis is somehow getting into lib/buildcustomize.pl

M       perly.c

commit 8d567935eceebfd6a8a43f05ee52359d8a6f1a0a
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:30:01 2013 -0700

    XXX: rebase: Add cast

M       utfebcdic.h

commit 2c9ce6f26574ba2e744321c81f6fe43f72ef341e
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 17:04:58 2013 -0700

    XXXtemp: Use native, canned values for isFOO()

M       handy.h

commit bdfea9b6e804071473e9ee595d59858e50140563
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 5 10:36:07 2013 -0700

    XXX Enable lex debugging wihout -DDEBUGGING

M       perly.c

commit ccdd424f7f7d982bae9705fa9ee97bc19d18cbc4
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 19:16:31 2013 -0700

    XXX: perly.c: Reinstate some ebcdic code
    
    This is an experiment to see if this fixes things

M       perly.c

commit b57d3780482a97974ec0b2a88343531fec7100bd
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:43:26 2013 -0700

    gv.c: Remove EBCDIC dependency

M       gv.c

commit 4a0bcad6671e62ca6a2d4c8c7d9d1d91fe52f659
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:00:47 2013 -0700

    toke.c: Remove EBCDIC dependency

M       toke.c

commit d0e4ed97768a7b025803b3fe2055e04bdf3e32a0
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:14:25 2013 -0700

    toke.c: Remove character set dependency
    
    Instead of hard-coding the bit patterns that comprise the Byte Order
    Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
    the current platform.
    
    This removes some EBCDIC-only code.

M       toke.c

commit afc05107ebad7c275487f943ed03cfebd6aca1e9
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:10:27 2013 -0700

    unicode_constants.h: Add #defines for Byte Order Mark
    
    These will be used in future commits

M       regen/unicode_constants.pl
M       unicode_constants.h

commit 370920f0a05e0d34b3b262e35e3b2615bfdbd4e7
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 20:53:04 2013 -0700

    regen/unicode_constants.pl: Change #define name
    
    This was added in the 5.17 series so there's no code relying on its
    current name.  I think that the abbreviation is clearer.

M       regen/unicode_constants.pl
M       unicode_constants.h
M       x2p/a2py.c

commit 909dc5300ce1e0a82c29fd90923cd3428896448b
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 20:43:56 2013 -0700

    regen/unicode_constants.pl: Make portable to non-ASCII
    
    This now uses the U+ notation to indicate code points, which is
    unambiguous not matter what the platform's character set is.  (charnames
    accepts the U+ notation)

M       regen/unicode_constants.pl
M       unicode_constants.h

commit 489b5502109a4553221369a191d113a31aa8ed3f
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 20:29:33 2013 -0700

    regen/unicode_constants.pl: Remove unused constant
    
    This was added in the 5.17 series, so can't be yet in the field; and
    isn't needed.

M       regen/unicode_constants.pl
M       unicode_constants.h

commit c27bd4fa16d55302b3cb55c1752f1196b66a20f7
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 19:28:43 2013 -0700

    regen/unicode_constants.pl: Pass through input comments
    
    The data can now have comments, which are converted to C and passed
    through

M       regen/unicode_constants.pl

commit d53e53b4e1ba0b278245bba26d92c33d9dc13aa2
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 19:19:02 2013 -0700

    regen/unicode_constants.pl: Convert '-' in names to '_'
    
    Unicode character names can have dashes in them.  These aren't accepted
    in C macro names.  Change so both blanks and the hyphen-minus are
    converted to underscores.

M       regen/unicode_constants.pl

commit 7a95b8f019e1f10ee01e451aa12ad463aaf758ef
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 15:04:18 2013 -0700

    XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
    
    This macro may not be present, and is currently used exclusively in
    IS_UTF8_CHAR, which itself may be undefined, and code should cope with
    that.  This is a work-around until a better solution is found.

M       utf8.c
M       utf8.h

commit c8c80721be763feb018f4d11a81fa92b45077cb8
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 14:09:04 2013 -0700

    Add Porting tool for help with non-ASCII platforms
    
    Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
    non-ASCII platform with no working Perl.

M       MANIFEST
A       Porting/reorder_l1_char_class_tab.pl

commit f800d0aca07e8397f2b8a5b9c88664b32501716d
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 13:06:58 2013 -0700

    inline.h: Reorder functions
    
    The comment implied that the functions below it in the file were
    deprecated, but in fact only the next two functions were.  This
    clarifies that and moves them so they are the final ones in the file

M       inline.h

commit bfbda92d5558694bac12861bedf2a615af923ac1
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:33:42 2013 -0700

    utfebcdic.h: Add comment

M       utfebcdic.h

commit 9b02083051e2df3c5a34d514aee6dc39e17f8b66
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 12:31:25 2013 -0700

    XXX Temporary for z/OS long long support

M       Configure
M       hints/os390.sh

commit 4b3e1d5c9a18e16aeeb9de16c8b9a50b361b9cfa
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:12:11 2013 -0700

    utf8.h: Clean up START_MARK definition and use
    
    The previous definition broke good encapsulation rules.  UTF_START_MARK
    should return something that fits in a byte; it shouldn't be the caller
    that does this.  So the mask is moved into the definition.  This means
    it can apply only to the portion that creates something larger than a
    byte.  Further, the EBCDIC version can be simplified, since 7 is the
    largest possible number of bytes in an EBCDIC UTF8 character.

M       utf8.h
M       utfebcdic.h

commit 85322e70306ca3ce06fd9d379a251c9c0c96220e
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:05:26 2013 -0700

    utf8.h: Move #includes
    
    These two files were only being #included for non-ebcdic compiles; they
    should be included always.

M       utf8.h

commit 7cf7364cdae81ae4f797b14bde51f965e238d60f
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 11:49:14 2013 -0700

    utfebcdic.h: Remove extra parameter expansions
    
    These two macros were improperly expanding the parameters as well as
    defining the operation, leading to compile errors.

M       utfebcdic.h

commit b2796e5e5c9100406af006082bde5f45004da0ce
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 1 08:28:52 2013 -0700

    utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
    
    These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
    UTF8_TWO_BYTE_LO.  But the EIGHT_BIT versions can use the less general
    and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
    domain is restricted in the EIGHT_BIT.  Note that on ASCII platforms,
    these both expand to the same thing, so the difference matters only on
    EBCDIC.

M       utf8.h

commit 2aef1014f1e8fbdd1dd0a36d0585eecb1f3c83c6
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 21:34:38 2013 -0700

    XXX temp: makedepend.SH \{1000\} doesn't work on z/OS
    
    This tries 500 instead.  We'll keep going down until we get a number
    that works.

M       makedepend.SH

commit 6267c19421c706e544c5f63e068f9890c9e28fbf
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 09:25:27 2013 -0700

    XXX temp:  show makedepend cerr

M       makedepend.SH

commit 49472ae5de46cefd443f7cfab0ac94583a440b74
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit e1a89034961dbc6597375261ecf2137393041541
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit f1d056ba2f5cf7c4729ee624a594e8a19b313a01
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit c9246e5d0e3e21bfe24bb7c08e61b8257a62a5c4
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit d644e7ec2e6c9219430a8aeaf524cf037b3d9cdf
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 08:38:19 2013 -0700

    handy.h: Remove extraneous parens

M       handy.h

commit dd3bf60ccfb92fb4a69fd73cd0044ebfd8b12182
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <[email protected]> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit 5aaa4dd964d09418d9abd24fb851de2c9b266fc0
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <[email protected]> for reporting this.

M       hints/os390.sh

commit 964b0eed9ba2b581183179e6bc541fa12dd9cf9e
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:45:19 2013 -0700

    utf8.c: Use more clearly named macro
    
    In the case of invariants these two macros should do the same thing,
    but it seems to me that the latter name more clearly indicates what is
    going on.

M       utf8.c

commit 77cfe4a263ec2abf1a02d81e9802121f992eb6be
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:35:12 2013 -0700

    Add macro OFFUNISKIP
    
    This means use official Unicode code point numbering, not native.  Doing
    this converts the existing UNISKIP calls in the code to refer to native
    code points, which is what they meant anyway.  The terminology is
    somewhat ambiguous, but I don't think will cause real confusion.
    NATIVESKIP is also introduced for situations where it is important to be
    precise.

M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 8018c873df17e625f9c418b9e4d3a0d1a329e238
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:22:19 2013 -0700

    toke.c: white space only

M       toke.c

commit d4b16339ba66cd932a9bdf5b2bc18a62b355abb1
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 12:08:50 2013 -0700

    utf8.c: Deprecate two functions
    
    This is to force any code that has been using these functions to change.
    Since the Unicode tables are now stored in native order, these functions
    should only rarely be needed.
    
    However, the functionality of these is needed, and in actuality, on
    ASCII platforms, the native functions are #defined to these.  So what
    this commit does is rename the functions to something else, and create
    wrappers with the old names, so that anyone using them will get the
    deprecation.

M       embed.fnc
M       embed.h
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 4d0bc895c403595c2462dbef33c3417384c0828b
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:26:09 2013 -0700

    Deprecate uvuni_to_utf8()
    
    Code should almost never be dealing with non-native code points

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 8fae5f78e1fe84115e71a89ad15dac75ece13b60
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:02:33 2013 -0700

    Deprecate utf8_to_uni_buf()
    
    Now that the tables are stored in native order, there is almost no need
    for code to be dealing in Unicode order.

M       embed.fnc
M       proto.h
M       utf8.c

commit 1266b620adbf02c0e1b80b015f0952c2a49517a1
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit 2fbbf5da9507193bfcc3d80e9f29cbf5b2b8cc04
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:26:44 2013 -0700

    Deprecate valid_utf8_to_uvuni()
    
    Now that all the tables are stored in native format, there is very
    little reason to use this function; and those who do need this kind of
    functionality should be using the bottom level routine, so as to make it
    clear they are doing nonstandard stuff.

M       embed.fnc
M       proto.h
M       utf8.c

commit 2ee912c297b9b805325b94cb2ae711a651d28a50
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:14:26 2013 -0700

    utf8.c: Swap which fcn wraps the other
    
    This is in preparation for the current wrapee becoming deprecated

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
M       utf8.h

commit 331dadac0f476906674c11845f52a74df396de95
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:29:34 2013 -0700

    utf8.c: Skip a no-op
    
    Since the value is invariant under both UTF-8 and not, we already have
    it in 'uv'; no need to do anything else to get it

M       utf8.c

commit 24cecd91ae5ca8a23a1f803136db676e10cfe2e7
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:26:50 2013 -0700

    utf8.c: Move comment to where makes more sense

M       utf8.c

commit 90382955d141821e0c43afe8d8296fab04f298e4
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:30:10 2013 -0700

    APItest: Test native code points, instead of Unicode

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 46b3759bec510089913575d9e27f61cec9fceba8
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:25:08 2013 -0700

    XXX CPAN Normalize
    
    This converts Unicode::Normalize to use the native tables that are used
    by Perl starting in XXX, while using the Unicode-ordered ones that were
    used before then.
    
    Another alternative would be to have mktables generate just these tables
    in Unicode ordering.

M       cpan/Unicode-Normalize/Normalize.xs

commit 416de2aad2ace2fe067b2196b9f1d16587dc9537
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:22:55 2013 -0700

    XXX CPAN prob wrong Collate
    
    This changes to implicity usenative code points.  This is likely wrong,
    as the module comes with its own data, that are probably in terms of
    Unicode

M       cpan/Unicode-Collate/Collate.xs

commit a4a49beb29bad9ec78c4724dc91ae8bd39b92bf0
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:12:53 2013 -0700

    XXX CPAN Encode.xs
    
    Use core function if available.  This will insulate this code from any
    future changes.

M       cpan/Encode/Encode.xs

commit 29a8d2b2b8c866ad418fa17123693643c3d99e47
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:04:24 2013 -0700

    XXX CPAN and unsure Encode

M       cpan/Encode/Encode.xs
M       cpan/Encode/Unicode/Unicode.xs

commit 903f85fbf9e9c5ad5c176da02065897144097702
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:00:47 2013 -0700

    XXX CPAN Encode.xs: fix indent

M       cpan/Encode/Encode.xs

commit 6602048a9aaec01ef1dcf5f27230f188b86e4745
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 17:23:15 2013 -0700

    Don't refer to U+XXXX when mean native
    
    These messages say the output number is Unicode, but it is really
    native, so change to saying is 0xXXXX.

M       regen/regcharclass_multi_char_folds.pl
M       regexec.c

commit 1de86e61350d4fbbea2a9a8f39bd19443e21e75c
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:43:59 2013 -0700

    Convert some uvuni() to uvchr()
    
    All the tables are now based on the native character set, so using
    uvuni() in almost all cases is wrong.

M       cygwin/cygwin.c
M       doop.c
M       op.c
M       pp_pack.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c

commit bd481fc4b7254b0327fd3f47e71a572ec732bba2
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:25:47 2013 -0700

    handy.h: White space only

M       handy.h

commit 2a2492708f137a0ce38700fad7029fff7d57a8bd
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:19:49 2013 -0700

    t/test.pl: Allow native/latin1 string conversions to work on utf8.
    
    These functions no longer have the hard-coded definitions in them,
    but now end up resolving to internal functions, so that new encodings
    could be added and these would automatically understand them.
    
    Instead of using tr///, these now go character by character and
    converting to/from ord, which is slower, but allows them to operate on
    utf8 strings.
    
    Peephole optimization should make these essentially no-ops on ascii
    platforms.

M       t/test.pl

commit 2e30485fc8ba1e91eb2563672a535a1746cb8b5c
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:05:55 2013 -0700

    t/test.pl: Simplify ord to/from native fcns
    
    This commit changes these functions from converting to/from a string to
    calling utf8:: functions which operate on ordinals instead.

M       t/test.pl

commit 5ff0a67a2e15cd104a300b6baed8266ea85a1044
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:35:38 2013 -0700

    Make casing tables native
    
    These are final tables that haven't been converted to native character
    set casing.

M       perl.h
M       utfebcdic.h

commit ea3f4c8f0d8a9dee8fd4013855b78bcbd853a0ae
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:32:30 2013 -0700

    utfebcdic.h: Remove trailing spaces

M       utfebcdic.h

commit 88fe12aa11468daa24cb856699f0d8a48edf0355
Author: Karl Williamson <[email protected]>
Date:   Fri Feb 22 18:55:26 2013 -0700

    EBCDIC has the unicode bug too
    
    We have not had a working modern Perl on EBCDIC for some years.  When I
    started out, comments and code led me to conclude erroneously that
    natively it supported semantics for all 256 characters 0-255.  It turns
    out that I was wrong; it natively (at least on some platforms) has the
    same rules (essentially none) for the characters which don't correspond
    to ASCII onees, as the rules for these on ASCII platforms.
    
    This commit forces those rules on EBCDIC platforms (even should there be
    one that natively uses all 256).  To get all 256, the same things like
    'use feature "unicode_strings"' must now be done.

M       autodoc.pl
M       handy.h
M       pod/perlfunc.pod
M       pod/perlre.pod
M       pod/perlrecharclass.pod
M       pod/perlunicode.pod
M       pod/perlunifaq.pod

commit cda864f5944f770f3f9c780b1dc71323dd003e6f
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:47:52 2013 -0700

    handy.h: Solve a failure to compile problem under EBCDIC
    
    handy.h is included in files that don't include perl.h, and hence not
    utf8.h.  We can't rely therefore on the ASCII/EBCDIC conversion
    macros being available to us.  The best way to cope is to use the native
    ctype functions.  Most, but not all, of the macros in this commit
    currently resolve to use those native ones, but a future commit will
    change that.

M       handy.h

commit 0b61494f87ec00e33ac364d9ba979bbe74d81e8e
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:35:12 2013 -0700

    handy.h: Simplify some macro definitions
    
    Now, only one of the macros relies on magic numbers (isPRINT), leading
    to clearer definitions.

M       handy.h

commit 267f39dea998502b9e7101d2d04bc0b0bf7c8b80
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:26:49 2013 -0700

    handy.h: Combine macros that are same in ASCII, EBCDIC
    
    These 4 macros can have the same RHS for their ASCII and EBCDIC
    versions, so no need to duplicate their definitions
    
    This also enables the EBCDIC versions to not have undefined expansions
    when compiling without perl.h

M       handy.h

commit b6e687b3f446a971e336bda5aee730e7423cbff3
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:39:48 2013 -0700

    Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
    
    These macros are no longer called in the Perl core.  This commit turns
    them into functions so that they can use gcc's deprecation facility.
    
    I believe these were defective right from the beginning, and I have
    struggled to understand what's going on.  From the name, it appears
    NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
    appropriate parameter indicates that.  But that is impossible to do
    correctly from that API, as for variant characters, it needs to return
    two bytes.  It could only work correctly if ch is an I8 byte, which
    isn't native, and hence the name would be wrong.
    
    Similar arguments for ASCII_TO_NEED.
    
    The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
    does what I think NATIVE_TO_NEED intended.

M       embed.fnc
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.h
M       utfebcdic.h

commit c93f281bcce3de7214889e3bfcf65265fdc42e67
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:26:43 2013 -0700

    Remove remaining calls of NATIVE_TO_NEED
    
    These calls are just copying the input to the output byte by byte.
    There is no need to worry about UTF-8 or not, as the output is just an
    exact copy of the input

M       toke.c

commit 4d9d049f2ba63f41df8b43332a6b5f0545a78a14
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:12:15 2013 -0700

    toke.c: Remove some NATIVE_TO_NEED calls
    
    I believe NATIVE_TO_NEED is defective, and will remove it in a future
    commit.  But, just in case I'm wrong, I'm doing it in small steps so
    bisects will show the culprit.  This removes the calls to it where the
    parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
    result can't be other than just the parameter.

M       toke.c

commit e1fcc682bd6c16a1d161a8a3cd40b6ba15d91d8b
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:22:07 2013 -0700

    toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
    
    This code is attempting to deal with the problem of holes in the ranges
    a-z and A-Z in EBCDIC.  Prior to this patch, it accepeted things like A
    WITH GRAVE, etc, which shouldn't have the special processing to deal
    with the holes

M       toke.c

commit 878483481276aedd0566edeef499b8ad406e4d4f
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 19 15:13:19 2013 -0700

    Use real illegal UTF-8 byte
    
    The code here was wrong in assuming that \xFF is not legal in UTF-8
    encoded strings.  It currently doesn't work due to a bug, but that may
    eventually be fixed: [perl #116867].  The comments are also wrong that
    all bytes are legal in UTF-EBCDIC.
    
    It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
    (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
    of an illegal overlong sequence.
    
    This creates a #define for an illegal byte using one of the real illegal
    ones, and changes the code to use that.
    
    No test is included due to #116867.

M       op.c
M       toke.c
M       utf8.h

commit e963e4bd4d4fcc0233d51a1ec33165fc9fb38fe5
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 14:00:13 2013 -0700

    toke.c: Don't remap \N{} for EBCDIC
    
    Everything is now in native,

M       toke.c

commit 3515a462c70ad75ede28000fdce1f971247c7e6a
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:50:45 2013 -0700

    toke.c: Remove remapping for EBCDIC for octal
    
    The code prior to this commit converted something like \04 into its
    EBCDIC equivalent only in double-quoted strings.  This was not done in
    patterns, and so gave inconsistent results.  The correct thing to do
    should be to do the native thing, what someone who works on a platform
    would think \04 do.  Platform independent characters are available
    through \N{}, either by name or by U+.
    
    The comment changed by this was wrong, as in some cases it was native,
    and in some cases Unicode.

M       toke.c

commit f330cc5a6b5ce74ecc67ecac97553fc9cfa76eae
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:47:13 2013 -0700

    Remove EBCDIC remappings
    
    Now that the tables are stored in native format, we shouldn't be doing
    remapping.
    
    Note that this assumes that the Latin1 casing tables are stored in
    native order; this hasn't been done yet.

M       handy.h
M       perly.c
M       pp.c
M       regcomp.c
M       regexec.c
M       utf8.c

commit 0caad749e72a4614c0d13d60a9eb63ac6e8fc631
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 12:46:05 2013 -0700

    Add and use  macro to return EBCDIC
    
    The converstion from UTF-8 to code point should generally be to the
    native code point.  This adds a macro to do that, and converts the
    core calls to the existing macro to use the new one instead.  The old
    macro is retained for possible backwards compatibility, though it
    probably should be deprecated.

M       handy.h
M       pp.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c
M       utf8.h

commit 0db527fc1b0478c7217dd4ce5c26704630c0c99a
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 09:18:06 2013 -0700

    charnames: fix nit in comment

M       lib/_charnames.pm

commit b7f4305e3b8453afbc55be5387c491afb1520702
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 11:05:44 2013 -0700

    charnames: Make work in EBCDIC
    
    Now that mktables generates native tables, the only thing that was
    needed was to make U+ mean Unicode instead of native.

M       lib/_charnames.pm
M       lib/charnames.pm

commit 752462c119e18471190dc471d2d2f26bcdc7f046
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 09:35:56 2013 -0700

    Unicode::UCD: Work on non-ASCII platforms
    
    Now that mktables generates native tables, it is a fairly simple matter
    to get Unicode::UCD to work on those platforms.

M       lib/Unicode/UCD.pm

commit 7b06773e3e2dc42be30feda7666582526cd6f71b
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 22:16:38 2013 -0700

    mktables: Generate native code-point tables
    
    The output tables for mktables are now in the platform's native
    character set.  This means there is no change for ASCII platforms, but
    is a change for EBCDIC ones.
    
    Since we currently don't have any EBCDIC test platforms, I tested this
    by faking it out to generate EBCDIC data, and then eye-balled the
    results.
    
    Code that didn't realize there was a potential difference between EBCDIC
    and non-EBCDIC platforms will now start to work; code that tried to do
    the right thing under these circumstances will no longer work.  Fixing
    that comes in later commits.

M       lib/unicore/mktables

commit fa0db3e97c7ea5ca2c041dcc1f1c6f7c6eca0468
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 10:50:00 2013 -0700

    Fix some EBCDIC problems
    
    These spots have native code points, so should be using the macros for
    native code points, instead of Unicode ones.

M       regcomp.c
M       sv.c
M       toke.c

commit 31f89ae93bb19cf093cd9a2b821bfb9c06951ebb
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:10:19 2013 -0700

    Remove unnecessary temp variable in converting to UTF-8
    
    These areas of code included a temporary that is unnecessary.

M       inline.h
M       regcomp.c
M       sv.c

commit 962db892cae07286131e9d194cc7af4b0a14990f
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:00:55 2013 -0700

    utf8.h: Correct macros for EBCDIC
    
    These macros were incorrect for EBCDIC.  The 3 step process given in
    utfebcdic.h wasn't being followed.

M       utf8.h

commit c265cfd30dc0eaae33d05de8dbb49a56fc9f4aaf
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 9 21:23:30 2013 -0700

    Extract common code to an inline function
    
    This fairly short paradigm is repeated in several places; a later commit
    will improve it.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_pack.c
M       proto.h
M       sv.c
M       toke.c
M       utf8.c

commit b60d0a8769db597c2fabe55907586e6af7caa123
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 21:35:57 2013 -0700

    Don't use EBCDIC macro for a C language escape
    
    C recognizes '\a' (for BEL); just use that instead of a look-up.
    
    regen/unicode_constants.pl could be used to generate the character for
    the ESC (set in surrounding code), but I didn't do that because of
    potential bootstrapping problems when porting to an EBCDIC platform
    without a working perl.  (The other characters generated in that .pl are
    less likely to cause problems when compiling perl.)

M       regcomp.c
M       toke.c

commit d4f37eacd972f87ac586279c80016d5def0a1d64
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 19:53:38 2013 -0700

    Use byte domain EBCDIC/LATIN1 macro where appropriate
    
    The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
    whole Unicode range.  In the locations affected by this commit, it is
    known that the domain is limited to a single byte, so the simpler ones
    whose names contain LATIN1 may be used.
    
    On ASCII platforms, all the macros are null, so there is no effective
    change.

M       handy.h
M       regcomp.c
M       utf8.c

commit b68a6369f3bef40813b6b2f9b9125330c329d15e
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 14:31:09 2013 -0700

    Use new clearer named #defines
    
    This converts several areas of code to use the more clearly named macros
    introduced in a recent commit

M       op.c
M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 316d80bd052136276ed5856d84f0fb2ac9d2a0b3
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 13:52:31 2013 -0700

    utf8.h, utfebcdic.h: Create less confusing #defines
    
    This commit creates macros whose names mean something to me, and I don't
    find confusing.  The older names are retained for backwards
    compatibility.  Future commits will fix bugs I introduced from
    misunderstanding the meaning of the older names.
    
    The older names are now #defined in terms of the newer ones, and moved
    so that they are only defined once, valid for both ASCII and EBCDIC
    platforms.

M       utf8.h
M       utfebcdic.h

commit 905eb24a5ca83a2eaa35ccae179717c1c0d94744
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 4 14:22:02 2013 -0700

    pp_ctl.c: Use isCNTRL instead of hard-coded mask
    
    This is clearer and portable to EBCDIC.

M       pp_ctl.c

commit 9e48f7a7c1b5f3568f07b884816396b9254a9750
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:51:05 2013 -0700

    utf8.c: is_utf8_char_slow() should use native length
    
    What is passed is the actual length of the native utf8 character.  What
    this was calculating was the length it would be if it were a Unicode
    character, and then compares, apples to oranges.

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to