In perl.git, the branch khw/ebcdic has been created

<http://perl5.git.perl.org/perl.git/commitdiff/f6a4f04153c9215dd9421b91734b4adbdcc02abf?hp=0000000000000000000000000000000000000000>

        at  f6a4f04153c9215dd9421b91734b4adbdcc02abf (commit)

- Log -----------------------------------------------------------------
commit f6a4f04153c9215dd9421b91734b4adbdcc02abf
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:32:42 2013 -0700

    Revert "XXX get Configure to work on Linux"
    
    This reverts commit 587944ecf24503eddf45df4acf45ae60da17030d.

M       Configure

commit f54dd51496eabcd33901e35fc6992da8a9842198
Author: Karl Williamson <[email protected]>
Date:   Thu Mar 7 12:08:41 2013 -0700

    XXXu8

M       utfebcdic.h

commit 083a4d1a866d5d63603ef28026a508a626d03295
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:09:29 2013 -0700

    XXX get Configure to work on Linux

M       Configure

commit 430835d3921e6d7f5bdc410861fedaa83259be9d
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 8 08:11:38 2013 -0700

    l

M       regcharclass.h
M       regen/regcharclass.pl

commit d3d3292c7649661fc95d819b1edb7779adf2e695
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:47:21 2013 -0700

    XXX: Turn off debug tracing in perly.c
    
    THis is somehow getting into lib/buildcustomize.pl

M       perly.c

commit 99efb6de12fb99b5684dfb206c38da2ca67df49d
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 21:30:01 2013 -0700

    XXX: rebase: Add cast

M       utfebcdic.h

commit cf85f616a122f1afe9fc87675cf7ad19a08ceb2e
Author: Karl Williamson <[email protected]>
Date:   Wed Mar 6 17:04:58 2013 -0700

    XXXtemp: Use native, canned values for isFOO()

M       handy.h

commit b147fcfc00a27612126a7bbe480135a2b477e33d
Author: Karl Williamson <[email protected]>
Date:   Tue Mar 5 10:36:07 2013 -0700

    XXX Enable lex debugging wihout -DDEBUGGING

M       perly.c

commit fd4667e70d783fac417da3efc55b30fc28a1b14b
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 19:16:31 2013 -0700

    XXX: perly.c: Reinstate some ebcdic code
    
    This is an experiment to see if this fixes things

M       perly.c

commit c3b22538eca47d909f4b31132ae69aa1f1b9ef08
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:43:26 2013 -0700

    gv.c: Remove EBCDIC dependency

M       gv.c

commit a353eb0ab2a5575783e7fd9826e76984267c9826
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 13:00:47 2013 -0700

    toke.c: Remove EBCDIC dependency

M       toke.c

commit 39626f92d43a5ecda86ea2d52c5179b151c493aa
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:14:25 2013 -0700

    toke.c: Remove character set dependency
    
    Instead of hard-coding the bit patterns that comprise the Byte Order
    Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
    the current platform.
    
    This removes some EBCDIC-only code.

M       toke.c

commit f0d3242950ed1ee136fc1b871c3b8531b72bef47
Author: Karl Williamson <[email protected]>
Date:   Mon Mar 4 09:10:27 2013 -0700

    unicode_constants.h: Add #defines for Byte Order Mark
    
    These will be used in future commits

M       regen/unicode_constants.pl
M       unicode_constants.h

commit 7e7ceb07bba0b03bb08074bf8351e0e583e1ba57
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 15:04:18 2013 -0700

    XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
    
    This macro may not be present, and is currently used exclusively in
    IS_UTF8_CHAR, which itself may be undefined, and code should cope with
    that.  This is a work-around until a better solution is found.

M       utf8.c
M       utf8.h

commit ab9fd6aab40cd724ca2adc6353eabd9606465985
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 14:09:04 2013 -0700

    Add Porting tool for help with non-ASCII platforms
    
    Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
    non-ASCII platform with no working Perl.

M       MANIFEST
A       Porting/reorder_l1_char_class_tab.pl

commit d07ee690bdb5905fb913adce156e76741c75d7bb
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 13:06:58 2013 -0700

    inline.h: Reorder functions
    
    The comment implied that the functions below it in the file were
    deprecated, but in fact only the next two functions were.  This
    clarifies that and moves them so they are the final ones in the file

M       inline.h

commit 5ada3e299ec1571c78f25b9519def795e4c8d677
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:33:42 2013 -0700

    utfebcdic.h: Add comment

M       utfebcdic.h

commit c7c7e4ae2c5e1f4e0c8bc4f17ab068884c4cdf96
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 12:31:25 2013 -0700

    XXX Temporary for z/OS long long support

M       Configure
M       hints/os390.sh

commit 06949e9f167f5d3f824ddf1b56467e1cda464d22
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:12:11 2013 -0700

    utf8.h: Clean up START_MARK definition and use
    
    The previous definition broke good encapsulation rules.  UTF_START_MARK
    should return something that fits in a byte; it shouldn't be the caller
    that does this.  So the mask is moved into the definition.  This means
    it can apply only to the portion that creates something larger than a
    byte.  Further, the EBCDIC version can be simplified, since 7 is the
    largest possible number of bytes in an EBCDIC UTF8 character.

M       utf8.h
M       utfebcdic.h

commit 4973c12132e2cfaafbf074b7e0a1dc1d79f910d8
Author: Karl Williamson <[email protected]>
Date:   Sat Mar 2 12:05:26 2013 -0700

    utf8.h: Move #includes
    
    These two files were only being #included for non-ebcdic compiles; they
    should be included always.

M       utf8.h

commit 6c641e738bbf9234b92b232363f1716972ff976b
Author: John Goodyear <[email protected]>
Date:   Sat Mar 2 11:49:14 2013 -0700

    utfebcdic.h: Remove extra parameter expansions
    
    These two macros were improperly expanding the parameters as well as
    defining the operation, leading to compile errors.

M       utfebcdic.h

commit 00b787ce24fc6885973a54dc614900d99e4e17c1
Author: Karl Williamson <[email protected]>
Date:   Fri Mar 1 08:28:52 2013 -0700

    utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
    
    These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
    UTF8_TWO_BYTE_LO.  But the EIGHT_BIT versions can use the less general
    and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
    domain is restricted in the EIGHT_BIT.  Note that on ASCII platforms,
    these both expand to the same thing, so the difference matters only on
    EBCDIC.

M       utf8.h

commit c131b71cc2ba966b0e81f21c8656e587f63b3a14
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 21:34:38 2013 -0700

    XXX temp: makedepend.SH \{1000\} doesn't work on z/OS
    
    This tries 500 instead.  We'll keep going down until we get a number
    that works.

M       makedepend.SH

commit 838c1f6f447874ac69f56defa1dfdb0cbf53bdc6
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 28 09:25:27 2013 -0700

    XXX temp:  show makedepend cerr

M       makedepend.SH

commit 809c2a7977c52914bf1e44f2d71047a32eeb6085
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit 235ecc169ba9f49651bd328c6c995ca0dabc2044
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit 24e74bd168152f0b6626b444678d69c075fd37cd
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit 73a7204e5c2bd5afbf60e4987b92fc8c68986695
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit 968a5b7deadd0871aeef1e711f782801b97809a3
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 08:38:19 2013 -0700

    handy.h: Remove extraneous parens

M       handy.h

commit bfee91ea55786caaca8fb9dd465afb59c7dbc1d2
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <[email protected]> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit 1efb06c1dbaf413371e5459ab5be4af48858296e
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <[email protected]> for reporting this.

M       hints/os390.sh

commit 5ad2fe211bfad97e8988eccc83d65d68dd912892
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:45:19 2013 -0700

    utf8.c: Use more clearly named macro
    
    In the case of invariants these two macros should do the same thing,
    but it seems to me that the latter name more clearly indicates what is
    going on.

M       utf8.c

commit 144058005222ffd147e176643b3a1c9cf8986974
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:35:12 2013 -0700

    Add macro OFFUNISKIP
    
    This means use official Unicode code point numbering, not native.  Doing
    this converts the existing UNISKIP calls in the code to refer to native
    code points, which is what they meant anyway.  The terminology is
    somewhat ambiguous, but I don't think will cause real confusion.
    NATIVESKIP is also introduced for situations where it is important to be
    precise.

M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 95f8782642ada776335e59144147a93b48895727
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:22:19 2013 -0700

    toke.c: white space only

M       toke.c

commit e6c12b915d181f63199b71d5cb04a90b16a57e6f
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 12:08:50 2013 -0700

    utf8.c: Deprecate two functions
    
    This is to force any code that has been using these functions to change.
    Since the Unicode tables are now stored in native order, these functions
    should only rarely be needed.
    
    However, the functionality of these is needed, and in actuality, on
    ASCII platforms, the native functions are #defined to these.  So what
    this commit does is rename the functions to something else, and create
    wrappers with the old names, so that anyone using them will get the
    deprecation.

M       embed.fnc
M       embed.h
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 6c2c5c1e0ca3f7779b04f3a1c9a4dfc3f0a505b7
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:26:09 2013 -0700

    Deprecate uvuni_to_utf8()
    
    Code should almost never be dealing with non-native code points

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit 9938a855872392f49271380ed1eb9f9764a94c29
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 11:02:33 2013 -0700

    Deprecate utf8_to_uni_buf()
    
    Now that the tables are stored in native order, there is almost no need
    for code to be dealing in Unicode order.

M       embed.fnc
M       proto.h
M       utf8.c

commit 36a058b9f98d1dfd68ef961f179be85e724d6af4
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit 73b57f000c86751c99cf01aa62cd9f130aef1c03
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:26:44 2013 -0700

    Deprecate valid_utf8_to_uvuni()
    
    Now that all the tables are stored in native format, there is very
    little reason to use this function; and those who do need this kind of
    functionality should be using the bottom level routine, so as to make it
    clear they are doing nonstandard stuff.

M       embed.fnc
M       proto.h
M       utf8.c

commit eb6f09d2e9cf2607185451d7d6056974d61d6e2a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 20:14:26 2013 -0700

    utf8.c: Swap which fcn wraps the other
    
    This is in preparation for the current wrapee becoming deprecated

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
M       utf8.h

commit 0ed57e59d1dbf4134fa04c570d8f5991e4d1ce0b
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:29:34 2013 -0700

    utf8.c: Skip a no-op
    
    Since the value is invariant under both UTF-8 and not, we already have
    it in 'uv'; no need to do anything else to get it

M       utf8.c

commit c348babcfae7783bfbe0b0b20e92fdc8a3920919
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 19:26:50 2013 -0700

    utf8.c: Move comment to where makes more sense

M       utf8.c

commit 0db007c2e0d4de5fc0bd44667c4d20845b1f35ee
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:30:10 2013 -0700

    APItest: Test native code points, instead of Unicode

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 1c4bc4b1fbd68a3adf3a866cb950c1dfdff80516
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:25:08 2013 -0700

    XXX CPAN Normalize
    
    This converts Unicode::Normalize to use the native tables that are used
    by Perl starting in XXX, while using the Unicode-ordered ones that were
    used before then.
    
    Another alternative would be to have mktables generate just these tables
    in Unicode ordering.

M       cpan/Unicode-Normalize/Normalize.xs

commit 6d247f40dc06a2f5d1778f3b30e987ec7d5773f3
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:22:55 2013 -0700

    XXX CPAN prob wrong Collate
    
    This changes to implicity usenative code points.  This is likely wrong,
    as the module comes with its own data, that are probably in terms of
    Unicode

M       cpan/Unicode-Collate/Collate.xs

commit 2c7389bc06cf7d5fe6fdc394ae922a2f99134603
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:12:53 2013 -0700

    XXX CPAN Encode.xs
    
    Use core function if available.  This will insulate this code from any
    future changes.

M       cpan/Encode/Encode.xs

commit ca9288b251166468f8ca61747166acef35fa090a
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:04:24 2013 -0700

    XXX CPAN and unsure Encode

M       cpan/Encode/Encode.xs
M       cpan/Encode/Unicode/Unicode.xs

commit 483c8ca82941716746793172831e1a57708b4696
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 25 17:00:47 2013 -0700

    XXX CPAN Encode.xs: fix indent

M       cpan/Encode/Encode.xs

commit 49e082e52ddfc6946ccf2d05531b1e326cc03b9f
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 17:23:15 2013 -0700

    Don't refer to U+XXXX when mean native
    
    These messages say the output number is Unicode, but it is really
    native, so change to saying is 0xXXXX.

M       regen/regcharclass_multi_char_folds.pl
M       regexec.c

commit 0be92284f440b61ed62361dee0ac96203a9be0f5
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:43:59 2013 -0700

    Convert some uvuni() to uvchr()
    
    All the tables are now based on the native character set, so using
    uvuni() in almost all cases is wrong.

M       cygwin/cygwin.c
M       doop.c
M       op.c
M       pp_pack.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c

commit f620ab055c7c72916358640967e220fea27a510a
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:25:47 2013 -0700

    handy.h: White space only

M       handy.h

commit f405c31357d1e0d6a6b525e85cdf9390d7c760eb
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:19:49 2013 -0700

    t/test.pl: Allow native/latin1 string conversions to work on utf8.
    
    These functions no longer have the hard-coded definitions in them,
    but now end up resolving to internal functions, so that new encodings
    could be added and these would automatically understand them.
    
    Instead of using tr///, these now go character by character and
    converting to/from ord, which is slower, but allows them to operate on
    utf8 strings.
    
    Peephole optimization should make these essentially no-ops on ascii
    platforms.

M       t/test.pl

commit e5f3a009be8e27b085fa32fdd482abed7e6d638a
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 16:05:55 2013 -0700

    t/test.pl: Simplify ord to/from native fcns
    
    This commit changes these functions from converting to/from a string to
    calling utf8:: functions which operate on ordinals instead.

M       t/test.pl

commit e93971eb09a67e1b78269e12366e54f36647f5b1
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:35:38 2013 -0700

    Make casing tables native
    
    These are final tables that haven't been converted to native character
    set casing.

M       perl.h
M       utfebcdic.h

commit ca6450898c31015560d6284f2c9f33c85947cf55
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 24 15:32:30 2013 -0700

    utfebcdic.h: Remove trailing spaces

M       utfebcdic.h

commit aea4d6bab0299bc476a94fba51c1a91f5b7099c6
Author: Karl Williamson <[email protected]>
Date:   Fri Feb 22 18:55:26 2013 -0700

    EBCDIC has the unicode bug too
    
    We have not had a working modern Perl on EBCDIC for some years.  When I
    started out, comments and code led me to conclude erroneously that
    natively it supported semantics for all 256 characters 0-255.  It turns
    out that I was wrong; it natively (at least on some platforms) has the
    same rules (essentially none) for the characters which don't correspond
    to ASCII onees, as the rules for these on ASCII platforms.
    
    This commit forces those rules on EBCDIC platforms (even should there be
    one that natively uses all 256).  To get all 256, the same things like
    'use feature "unicode_strings"' must now be done.

M       autodoc.pl
M       handy.h
M       pod/perlfunc.pod
M       pod/perlre.pod
M       pod/perlrecharclass.pod
M       pod/perlunicode.pod
M       pod/perlunifaq.pod

commit bc95e089407f4ec1ff8a2a1302ec2ddb426c7eeb
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:47:52 2013 -0700

    handy.h: Solve a failure to compile problem under EBCDIC
    
    handy.h is included in files that don't include perl.h, and hence not
    utf8.h.  We can't rely therefore on the ASCII/EBCDIC conversion
    macros being available to us.  The best way to cope is to use the native
    ctype functions.  Most, but not all, of the macros in this commit
    currently resolve to use those native ones, but a future commit will
    change that.

M       handy.h

commit 5ad4976f6542f50b2d157960e9aee41940e66cd5
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:35:12 2013 -0700

    handy.h: Simplify some macro definitions
    
    Now, only one of the macros relies on magic numbers (isPRINT), leading
    to clearer definitions.

M       handy.h

commit cfd1b36d93668a9ed750e169b8079eed3aeca2f3
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 21 13:26:49 2013 -0700

    handy.h: Combine macros that are same in ASCII, EBCDIC
    
    These 4 macros can have the same RHS for their ASCII and EBCDIC
    versions, so no need to duplicate their definitions
    
    This also enables the EBCDIC versions to not have undefined expansions
    when compiling without perl.h

M       handy.h

commit 16e59989ffdee243e03c9993cf3f6c1ea184d213
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:39:48 2013 -0700

    Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
    
    These macros are no longer called in the Perl core.  This commit turns
    them into functions so that they can use gcc's deprecation facility.
    
    I believe these were defective right from the beginning, and I have
    struggled to understand what's going on.  From the name, it appears
    NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
    appropriate parameter indicates that.  But that is impossible to do
    correctly from that API, as for variant characters, it needs to return
    two bytes.  It could only work correctly if ch is an I8 byte, which
    isn't native, and hence the name would be wrong.
    
    Similar arguments for ASCII_TO_NEED.
    
    The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
    does what I think NATIVE_TO_NEED intended.

M       embed.fnc
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.h
M       utfebcdic.h

commit 78928d40896552d7b9b8dafba8fb115aaa1941d8
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 10:26:43 2013 -0700

    Remove remaining calls of NATIVE_TO_NEED
    
    These calls are just copying the input to the output byte by byte.
    There is no need to worry about UTF-8 or not, as the output is just an
    exact copy of the input

M       toke.c

commit a1c1009e62f618a3d82cd35ba52dfda91ad5c9f8
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:12:15 2013 -0700

    toke.c: Remove some NATIVE_TO_NEED calls
    
    I believe NATIVE_TO_NEED is defective, and will remove it in a future
    commit.  But, just in case I'm wrong, I'm doing it in small steps so
    bisects will show the culprit.  This removes the calls to it where the
    parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
    result can't be other than just the parameter.

M       toke.c

commit d6eed1c42d47dd4de2726a80791015be521645a0
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 20 08:22:07 2013 -0700

    toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
    
    This code is attempting to deal with the problem of holes in the ranges
    a-z and A-Z in EBCDIC.  Prior to this patch, it accepeted things like A
    WITH GRAVE, etc, which shouldn't have the special processing to deal
    with the holes

M       toke.c

commit 38a90fc09b50554f2392d7ae4e3c3c4856b181e0
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 19 15:13:19 2013 -0700

    Use real illegal UTF-8 byte
    
    The code here was wrong in assuming that \xFF is not legal in UTF-8
    encoded strings.  It currently doesn't work due to a bug, but that may
    eventually be fixed: [perl #116867].  The comments are also wrong that
    all bytes are legal in UTF-EBCDIC.
    
    It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
    (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
    of an illegal overlong sequence.
    
    This creates a #define for an illegal byte using one of the real illegal
    ones, and changes the code to use that.
    
    No test is included due to #116867.

M       op.c
M       toke.c
M       utf8.h

commit 3a2ea82ab857b782d4eb5c2ea02ddce9f37b7d5a
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 14:00:13 2013 -0700

    toke.c: Don't remap \N{} for EBCDIC
    
    Everything is now in native,

M       toke.c

commit c9a01f52c7a29fa8af197ee1c589d2557e987ded
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:50:45 2013 -0700

    toke.c: Remove remapping for EBCDIC for octal
    
    The code prior to this commit converted something like \04 into its
    EBCDIC equivalent only in double-quoted strings.  This was not done in
    patterns, and so gave inconsistent results.  The correct thing to do
    should be to do the native thing, what someone who works on a platform
    would think \04 do.  Platform independent characters are available
    through \N{}, either by name or by U+.
    
    The comment changed by this was wrong, as in some cases it was native,
    and in some cases Unicode.

M       toke.c

commit 2a0e17c8b9a1fa1edf46f232eaaca05932464a91
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 13:47:13 2013 -0700

    Remove EBCDIC remappings
    
    Now that the tables are stored in native format, we shouldn't be doing
    remapping.
    
    Note that this assumes that the Latin1 casing tables are stored in
    native order; this hasn't been done yet.

M       handy.h
M       perly.c
M       pp.c
M       regcomp.c
M       regexec.c
M       utf8.c

commit 24137c4dfdf9ecdb7bb467228663abce7285ba5b
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 12:46:05 2013 -0700

    Add and use  macro to return EBCDIC
    
    The converstion from UTF-8 to code point should generally be to the
    native code point.  This adds a macro to do that, and converts the
    core calls to the existing macro to use the new one instead.  The old
    macro is retained for possible backwards compatibility, though it
    probably should be deprecated.

M       handy.h
M       pp.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c
M       utf8.h

commit 21391754a4a3743a89b1cdf8b1a67bf68934c1d4
Author: Karl Williamson <[email protected]>
Date:   Sun Feb 17 09:18:06 2013 -0700

    charnames: fix nit in comment

M       lib/_charnames.pm

commit 85ea1e7595c04436bfdb830281663eb28f4fa162
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 11:05:44 2013 -0700

    charnames: Make work in EBCDIC
    
    Now that mktables generates native tables, the only thing that was
    needed was to make U+ mean Unicode instead of native.

M       lib/_charnames.pm
M       lib/charnames.pm

commit 7b2356609ed8d14c8968a274f75b17f677b2a436
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 16 09:35:56 2013 -0700

    Unicode::UCD: Work on non-ASCII platforms
    
    Now that mktables generates native tables, it is a fairly simple matter
    to get Unicode::UCD to work on those platforms.

M       lib/Unicode/UCD.pm

commit 6c6de99130661853313f06d1a4a8762cba4823ee
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 22:16:38 2013 -0700

    mktables: Generate native code-point tables
    
    The output tables for mktables are now in the platform's native
    character set.  This means there is no change for ASCII platforms, but
    is a change for EBCDIC ones.
    
    Since we currently don't have any EBCDIC test platforms, I tested this
    by faking it out to generate EBCDIC data, and then eye-balled the
    results.
    
    Code that didn't realize there was a potential difference between EBCDIC
    and non-EBCDIC platforms will now start to work; code that tried to do
    the right thing under these circumstances will no longer work.  Fixing
    that comes in later commits.

M       lib/unicore/mktables

commit d6f27906b0b67f5aad20e210ac8da549d5da57b3
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 14 10:50:00 2013 -0700

    Fix some EBCDIC problems
    
    These spots have native code points, so should be using the macros for
    native code points, instead of Unicode ones.

M       regcomp.c
M       sv.c
M       toke.c

commit 71b84e101ac96135b96ab185a9facf0e85a81be2
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:10:19 2013 -0700

    Remove unnecessary temp variable in converting to UTF-8
    
    These areas of code included a temporary that is unnecessary.

M       inline.h
M       regcomp.c
M       sv.c

commit 63bbc715c6be080b8058c3ed5c6e6e361c874c34
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 13 22:00:55 2013 -0700

    utf8.h: Correct macros for EBCDIC
    
    These macros were incorrect for EBCDIC.  The 3 step process given in
    utfebcdic.h wasn't being followed.

M       utf8.h

commit d23be4768feed998f3771c93b4a5c0fe4d47facc
Author: Karl Williamson <[email protected]>
Date:   Sat Feb 9 21:23:30 2013 -0700

    Extract common code to an inline function
    
    This fairly short paradigm is repeated in several places; a later commit
    will improve it.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_pack.c
M       proto.h
M       sv.c
M       toke.c
M       utf8.c

commit af2ca1171d6f889425162386dc60244a798b3ed6
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 21:35:57 2013 -0700

    Don't use EBCDIC macro for a C language escape
    
    C recognizes '\a' (for BEL); just use that instead of a look-up.
    
    regen/unicode_constants.pl could be used to generate the character for
    the ESC (set in surrounding code), but I didn't do that because of
    potential bootstrapping problems when porting to an EBCDIC platform
    without a working perl.  (The other characters generated in that .pl are
    less likely to cause problems when compiling perl.)

M       regcomp.c
M       toke.c

commit 81e046fdf144d581a9489d7537e821ef244d32fe
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 19:53:38 2013 -0700

    Use byte domain EBCDIC/LATIN1 macro where appropriate
    
    The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
    whole Unicode range.  In the locations affected by this commit, it is
    known that the domain is limited to a single byte, so the simpler ones
    whose names contain LATIN1 may be used.
    
    On ASCII platforms, all the macros are null, so there is no effective
    change.

M       handy.h
M       regcomp.c
M       utf8.c

commit 05e16206fac1fdfdbd33c807513536efb3b812e8
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 14:31:09 2013 -0700

    Use new clearer named #defines
    
    This converts several areas of code to use the more clearly named macros
    introduced in a recent commit

M       op.c
M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit fb78e2499b40287cc8cb4afb771446e456cc9ad2
Author: Karl Williamson <[email protected]>
Date:   Thu Feb 7 13:52:31 2013 -0700

    utf8.h, utfebcdic.h: Create less confusing #defines
    
    This commit creates macros whose names mean something to me, and I don't
    find confusing.  The older names are retained for backwards
    compatibility.  Future commits will fix bugs I introduced from
    misunderstanding the meaning of the older names.
    
    The older names are now #defined in terms of the newer ones, and moved
    so that they are only defined once, valid for both ASCII and EBCDIC
    platforms.

M       utf8.h
M       utfebcdic.h

commit 3bff881f5347d44cfdbae1bff84f05b6409ff932
Author: Karl Williamson <[email protected]>
Date:   Mon Feb 4 14:22:02 2013 -0700

    pp_ctl.c: Use isCNTRL instead of hard-coded mask
    
    This is clearer and portable to EBCDIC.

M       pp_ctl.c

commit a904197fb9500c81d1a789857d2bcf74e7af115f
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 13:51:05 2013 -0700

    utf8.c: is_utf8_char_slow() should use native length
    
    What is passed is the actual length of the native utf8 character.  What
    this was calculating was the length it would be if it were a Unicode
    character, and then compares, apples to oranges.

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to