In perl.git, the branch khw/ebcdic has been created
<http://perl5.git.perl.org/perl.git/commitdiff/23eee4fee281d2f0c93acacba21736f3408c5758?hp=0000000000000000000000000000000000000000>
at 23eee4fee281d2f0c93acacba21736f3408c5758 (commit)
- Log -----------------------------------------------------------------
commit 23eee4fee281d2f0c93acacba21736f3408c5758
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:16:08 2013 -0600
utf8.c: Fix so UTF-16 to UTF-8 conversion works under EBCDIC
M utf8.c
commit 3d67047b8623500c820f6e39f16b9fb9069f4bd5
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:14:34 2013 -0600
utf8.h, utfebcdic.h: Add #define
M utf8.h
M utfebcdic.h
commit de7d6fe14fc4c2146e1c885372fc0e76730afdcf
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:11:25 2013 -0600
utf8.c: Use mnemonics instead of hex numbers
M utf8.c
commit a2c7fa86b9b5306d8fe1cbf0ff52193d8402cc42
Author: Karl Williamson <[email protected]>
Date: Wed Mar 20 22:15:58 2013 -0600
XXX lib/Unicode/UCD.t: allow to run under EBCDIC,
with failures expected
M lib/Unicode/UCD.t
commit 1246c4e146d3a82a5b5044df45830702fcd6179e
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 15:27:31 2013 -0600
t/op/quotemeta.t: EBCDIC fixes
M t/op/quotemeta.t
commit 6bfee59b179ab5dea21b4f819d8d4a1eed58e618
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:32:55 2013 -0600
t/re/fold_grind.t: Fixes for EBCDIC
M t/re/fold_grind.t
commit 1b18a601055def18a922861795ce99038af2bdf1
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:21:09 2013 -0600
t/lib/charnames/alias: Fix some EBCDIC problems
M t/lib/charnames/alias
commit 9cc0ba4cad7bb2bfd38af02f7ef61816dc36ca8e
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:20:24 2013 -0600
t/uni/class.t: Make work on EBCDIC
M t/uni/class.t
commit 323bd409b88104f413cf2c2d5af328e17846c815
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:16:12 2013 -0600
Unicode::UCD: Fix EBCDIC bug
M lib/Unicode/UCD.pm
commit bdc20c8ac4ab3aca12c69c78b25535ceea02381f
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:01:57 2013 -0600
feature/unicode_strings.t: Fix to work on EBCDIC
M lib/feature/unicode_strings.t
commit 95f8f8a47445914fbf291d8273399c298c1c502d
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:33:59 2013 -0600
XXX lib/strict.t: Skip for now
this is to see if this is what is causing the tests to not finish.
M lib/strict.t
commit 13b1f22f1cfc8115e0b3b2bf111bb12c0e9b355a
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:12:30 2013 -0600
regen/regcharclass.pl: make more EBCDIC friendly
One of the possible inputs to this process is a string. This clarifies
that it must be specified in Unicode characters, and adds code to
translate it to native, if necessary.
M regen/regcharclass.pl
commit a7a2eb2e1a22c28347740b0b21304091d6a67d8e
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:10:46 2013 -0600
XXX regen/regcharclass.pl: maybe temp comment out utf8_char
M regen/regcharclass.pl
commit ce89aef967a3d5eb65edc9ee4f27de8f73f99518
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 22:23:47 2013 -0600
XXX temporary regcharclass.h
This has parts that were generated on z/OS, and look good; the ones that
are still problematic are omitted;
f
M regcharclass.h
commit 5ad609c7e9694d659f6ff494b4a4db9e04890e3d
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:09:53 2013 -0600
XXX temporary comment out multi-char folds
M regcomp.c
M regen/regcharclass.pl
M regexec.c
M t/re/fold_grind.t
M t/re/reg_fold.t
commit e06df4c4f8a43abc29de177fcf0cc5010de7e390
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 22:00:29 2013 -0600
XXX temp skip perl5db.t
M lib/perl5db.t
commit 9f4964fa1c307681d97d662d13a5b982e36d840b
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:45:06 2013 -0600
pp.c: White-space only
Make a ternary operation more clear
M pp.c
commit 2ebb7110fc5b0e6e8f3c2d973e660f779f75709c
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:43:42 2013 -0600
Fix valid_utf8_to_uvchr() for EBCDIC
M utf8.c
commit 81bf1cf4f451796d9deddea8bdd1a2db358580d8
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 21:42:20 2013 -0600
t/test.pl: Add comment about EBCDIC
M t/test.pl
commit 857ee1684387fcc62efea03c9e074d47a1c6e25a
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 17:39:33 2013 -0600
XXX makedepend.SH: Why does 255 work and 250 not?
M makedepend.SH
commit fd67fede34fdf7517bf963b8e2744dc1b016d265
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:48:22 2013 -0600
XXX regen/mk_PL_charclass.pl: Make EBCDIC friendly
need more of a commit message
M regen/mk_PL_charclass.pl
commit 84a84331a722104be58207afeb293147273008db
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:44:44 2013 -0600
XXX make various things more EBCDIC friendly
Adds trailing white space errors
Need to know what to do about ^A meaning 0x1, and M-foo meaning meta
M lib/DB.pm
M lib/dumpvar.pl
M lib/perl5db.pl
M lib/sigtrap.pm
commit 83ef91540c49ba4420f5a35297259e3d80f592c8
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:57 2013 -0600
XXX charnames.t: Make more EBCDIC friendly
Why need utf8::unicode_to_native
M lib/charnames.t
commit 97392b23fa138e68316a3247ac36393d237b099b
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:15 2013 -0600
XXX: Fixup commit message.
Fix UTF8_ACUUMULATE, utf8.c
M utf8.c
M utf8.h
commit 190cab34bfb22b532c01b146a487cc6237c0751a
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 16:52:45 2013 -0600
regcomp.c: Fix bug in EBCDIC
The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
code points, but under EBCDIC those code points are 0-127.
M regcomp.c
commit 128563737568cda3ab8197bcd566d1acb16de7fa
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 12:26:15 2013 -0600
hints/os390.sh: Suppress bogus compiler message
M hints/os390.sh
commit 6bb53f2aad8c20c4ae1c4b5d3eed0c044198666c
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:57:24 2013 -0600
re/charset.t: Allow to work on EBCDIC
This just converts the hard-coded character numbers to native, so will
work on any platform.
M t/re/charset.t
commit 9db95e430a7310d64a974433550af8684f9741bc
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:50:35 2013 -0600
XS-APItest/t/handy.t: Change output message
On EBCDIC platforms, the output is not in terms of \N{U+}; change text
to \x{ }
M ext/XS-APItest/t/handy.t
commit cb0d11d054e8f64c6cc96d7464f93fffd027d5dd
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 21:44:16 2013 -0600
XXX Dumper.xs: Don't know why this stopped compiling
M dist/Data-Dumper/Dumper.xs
commit 6cbfa98f89a303db10f729c6fbaa7e37a17aed10
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:22:28 2013 -0600
toke.c: Fix an ASCII-platform dependency
M toke.c
commit a6e3f7f63e5dd6ec9f73d87d983279ea7a4ce476
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:20:23 2013 -0600
toke.c: Simplify some code
We don't have to test separately for lower vs uppercase here, as
upper/lower case A-Z and a-z are not intermixed in the gaps in A-Z and
a-z under EBCDIC.
M toke.c
commit 07350db58a71027c9350b63e20c6218c7867c06a
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:18:12 2013 -0600
genpacksizetables.pl: Correct comment typo
M genpacksizetables.pl
commit 9671d77fcf1bc7f2304984183246bed3e7cd0a6b
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:17:39 2013 -0600
APItest/t/handy.t: Make EBCDIC-friendly
M ext/XS-APItest/t/handy.t
commit 7d5279cdfd38601b68cbb17a213983538c8e2627
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:16:14 2013 -0600
Data-Dumper: Make EBCDIC-friendly
M dist/Data-Dumper/Dumper.xs
commit ad625024d1e7a923b056e45c8e653f6bb5491d01
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:14:31 2013 -0600
sv.c: Make less ASCII-centric
M sv.c
commit 2c80974552d64618181bd07bd723445386e46b60
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:07:52 2013 -0600
lib/charnames.t: Make some tests work under EBCDIC
M lib/charnames.t
commit f294757d1aabdde10f49952cdb012ad16f721630
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:05:46 2013 -0600
dump.c: Make less ASCII-centric:
This has the added advantage of being clearer as to what is going on.
M dump.c
commit 5d9fc6441e025af781cc28241a93cdbfc8275e99
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:02:52 2013 -0600
hv.c: Stop being ASCII-centric
This uses macros which work cross-platform. This has the added advantge
that it is much clearer what is going on.
M hv.c
commit 9641e807ca76c5e8e650882c5f22e7c04f155a51
Author: Karl Williamson <[email protected]>
Date: Tue Mar 12 22:34:17 2013 -0600
t/TEST: Don't bail if fails in t/base
M t/TEST
commit 763e51bce05e712943b94b625a4f8c1f2e50bbf2
Author: Karl Williamson <[email protected]>
Date: Mon Mar 11 15:11:10 2013 -0600
Added Porting/reorder_charclass_invlists.pl
This program is used too bootstrap perl onto a non-ASCII platform with
no pre-existing perl.
M MANIFEST
A Porting/reorder_charclass_invlists.pl
commit 48fd7bc12009f758064e0281f592d0eae5581cdb
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 22:17:31 2013 -0600
XXX See if changing \xE2 to \xE1 causes lex.t to work for EBCDIC
\xE2 is 'S' in EBCDIC, and so is going to be legal. \xE1 is not an
ASCII equivalent.
M t/base/lex.t
commit 17473d7902f3270db854c771029005b63322c7bb
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 11:01:32 2013 -0700
XXX EBCDIC header files
M charclass_invlists.h
M l1_char_class_tab.h
M unicode_constants.h
commit feb081c481a6fd8574c7f5aa0a6633eb4399501d
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 12:31:25 2013 -0700
XXX Temporary for z/OS long long support
M Configure
M hints/os390.sh
commit 6cc288ddb003e00b06fa4d802ce4fad277b57696
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 13:11:07 2013 -0600
XXX Temporary comment out ParseXS check
this is to get things to compile for now
M dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm
commit 3a001f2195d110b6c27148b6e4944bc1d3bab310
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 11:34:10 2013 -0600
XXX Collate, Normalize: Allow to compile under EBCDIC
M cpan/Unicode-Collate/Collate.pm
M cpan/Unicode-Collate/mkheader
M cpan/Unicode-Normalize/Normalize.pm
M cpan/Unicode-Normalize/mkheader
commit 3317fd49f23ba46bee8d1eccf50005d30f6775f8
Author: Karl Williamson <[email protected]>
Date: Sat Mar 9 21:57:38 2013 -0700
XXX dquote_static.c: Silence wrong warning on EBCDIC
Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
which should be a separate commit anyway.
This silences an inappropriate warning that doesn't happen on ASCII
platforms. CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms. But
0x14 is a C1 control on EBCDIC, a C0 on ASCII. Therefore the test that
it's a control should include both C0 and C1, which isCNTRL_L1() does.
Also has a white-space change, outdenting a line so it doesn't wrap in
an 80 column window.
M dquote_static.c
commit b866899152c5125c3125903c4c5a338fcc9f729a
Author: Karl Williamson <[email protected]>
Date: Thu Mar 7 12:08:41 2013 -0700
utfebcdic.h: Change 'unsigned char' to U8
This is for consistency with the rest of Perl
M utfebcdic.h
commit 95e365a67c842f5b50cb61e727d43e4df7f5cad9
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 08:11:38 2013 -0700
regen/regcharclass.pl: Make more EBCDIC-friendly
This commit changes the code generated by the macros so that they work
right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs. THEY
ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
onto the target platform, and regcharclass.pl can be run there,
generating macros correct UTF-8.
M regcharclass.h
M regen/regcharclass.pl
commit e040e7f654b6ef0bbf148c6d5238518434950602
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:30:01 2013 -0700
utfebcdic.h: Add (UV) cast
The operand of this macro is implicitly a UV. Make sure that it is.
M utfebcdic.h
commit 2f9c6f5e1f538b203619d91840e1c8b269f535c1
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 17:04:58 2013 -0700
handy.h: Allow bootstrapping to non-ASCII platform
This adds a bunch of macros and moves things around to support
conditional compilation when Configure is called with
-DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are
table-driven to not be used, since the table may not be valid when
bringing Perl up for the first time on a non-ASCII platform.
This allows it to compile using the platform's native C library ctype
functions, which should work enough to compile miniperl, and allow the
table to be changed to be valid. Then Configure can be re-run to not
bootstrap, and normal compilation can proceed
M handy.h
M inline.h
commit 1e86bc2e82f4870aa2dc0bdef4dc3de82c12fac5
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:43:26 2013 -0700
gv.c: Remove EBCDIC dependency
M gv.c
commit 38f63847854921b4501504d229ea09e020539b31
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:00:47 2013 -0700
toke.c: Remove EBCDIC dependency
M toke.c
commit 50fd6f7decadfe7a20953c10a92231efa82fe0d5
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:14:25 2013 -0700
toke.c: Remove character set dependency
Instead of hard-coding the bit patterns that comprise the Byte Order
Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
the current platform.
This removes some EBCDIC-only code.
M toke.c
commit 0158a4754a0e5c78d5b2c3cea60e8608c55982b9
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:10:27 2013 -0700
unicode_constants.h: Add #defines for Byte Order Mark
These will be used in future commits
M regen/unicode_constants.pl
M unicode_constants.h
commit dde13e90bd8b84fcc20767b42d9390cfe4e1e9bc
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 15:04:18 2013 -0700
XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
This macro may not be present, and is currently used exclusively in
IS_UTF8_CHAR, which itself may be undefined, and code should cope with
that. This is a work-around until a better solution is found.
M utf8.c
M utf8.h
commit 58212e1268bd3cc2b45bf96d3797a17509de3365
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 14:09:04 2013 -0700
Add Porting tool for help with non-ASCII platforms
Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
non-ASCII platform with no working Perl.
M MANIFEST
A Porting/reorder_l1_char_class_tab.pl
M regen/mk_PL_charclass.pl
commit 31289460cd3261db150949260309d8d98c53c1d1
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 13:06:58 2013 -0700
inline.h: Reorder functions
The comment implied that the functions below it in the file were
deprecated, but in fact only the next two functions were. This
clarifies that and moves them so they are the final ones in the file
M inline.h
commit 51b73c6a6dd7573b702ac746c5dd2e339bd58ddd
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:33:42 2013 -0700
utfebcdic.h: Add comment
M utfebcdic.h
commit a2e9e426a824d92fef28b0c6ef6be3666226545f
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:12:11 2013 -0700
utf8.h: Clean up START_MARK definition and use
The previous definition broke good encapsulation rules. UTF_START_MARK
should return something that fits in a byte; it shouldn't be the caller
that does this. So the mask is moved into the definition. This means
it can apply only to the portion that creates something larger than a
byte. Further, the EBCDIC version can be simplified, since 7 is the
largest possible number of bytes in an EBCDIC UTF8 character.
M utf8.h
M utfebcdic.h
commit 986363683624a2bca000967a806c3eb9d7dc0741
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:05:26 2013 -0700
utf8.h: Move #includes
These two files were only being #included for non-ebcdic compiles; they
should be included always.
M utf8.h
commit bf2b569330056e4795e9fe7361bddf4d69809a76
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 11:49:14 2013 -0700
utfebcdic.h: Remove extra parameter expansions
These two macros were improperly expanding the parameters as well as
defining the operation, leading to compile errors.
M utfebcdic.h
commit 5288333c6081bb20a63aa320ca3a68bbecbda7d6
Author: Karl Williamson <[email protected]>
Date: Fri Mar 1 08:28:52 2013 -0700
utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
UTF8_TWO_BYTE_LO. But the EIGHT_BIT versions can use the less general
and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
domain is restricted in the EIGHT_BIT. Note that on ASCII platforms,
these both expand to the same thing, so the difference matters only on
EBCDIC.
M utf8.h
commit 72bcb9611a3be7e6b41999b4a2f01e1b58f90ba8
Author: Karl Williamson <[email protected]>
Date: Thu Feb 28 09:25:27 2013 -0700
XXX temp: show makedepend cerr
M makedepend.SH
commit 3e5fbd9e1b69188ab19d03051bb9918c19cf1e62
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 21:59:11 2013 -0700
makedepend.SH: Split too long lines; properly join
I had thought that a continuation introduced a space. But no,
a continuation can happen in the middle of a token.
And this splits lines that are getting very long to avoid preprocessor
limitations.
M makedepend.SH
commit 30d437af817ea2389bf2f104da1b4869d231e8d9
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 15:51:28 2013 -0700
makedepend.SH: White-space only
Align continuation backslashes
M makedepend.SH
commit cdc48d70bdb9728027d54f4279580ff532dc3417
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:39:28 2013 -0700
makedepend.SH: Remove some unnecessary white space
Multi-line preprocessor directives are now joined into single lines.
This can create lines too long for the preprocessor to handle. This
commit removes blanks adjoining comments that get deleted. This makes
things somewhat less likely to exceed the limit.
This commit also fixes several [] which were meant to each match a tab
or a blank, but editors converted the tabs to blanks
M makedepend.SH
commit 5997657d4ae459643f8cb1c738e878d0303e9815
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:30:51 2013 -0700
makedepend.SH: Retain '/**/' comments
These comments may actually be necessary.
M makedepend.SH
commit 969751de2821df8ef2ef4496e532505953016d4c
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 08:38:19 2013 -0700
handy.h: Remove extraneous parens
M handy.h
commit da2d668f4f2a2231c9a4e777ef8b09e63fb4c39f
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 13:06:07 2013 -0500
Disable gcc-style function attributes on z/OS.
John Goodyear <[email protected]> reports that the z/OS C compiler
supports the attribute keyword, but not exactly the same as gcc.
Instead of a "warning", the compiler emits an "INFORMATIONAL" message
that Configure fails to detect. Until Configure is fixed, just disable
the attributes altogether.
John Goodyear
M hints/os390.sh
commit 417f06c056d75635dfd37ff82bfe9e855cc49996
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 09:12:13 2013 -0500
Change os390 custom cppstdin script to use fgrep.
Grep appears to be limited to 2048 characters, and truncates
the output for cppstin. Fgrep apparently doesn't have that limit.
Thanks to John Goodyear <[email protected]> for reporting this.
M hints/os390.sh
commit 67d36a6f9d7a9945bea88633a95a3fe7280e4e3d
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:45:19 2013 -0700
utf8.c: Use more clearly named macro
In the case of invariants these two macros should do the same thing,
but it seems to me that the latter name more clearly indicates what is
going on.
M utf8.c
commit 18fcf9a3c9055829e6710d7e649fea2416684320
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:35:12 2013 -0700
Add macro OFFUNISKIP
This means use official Unicode code point numbering, not native. Doing
this converts the existing UNISKIP calls in the code to refer to native
code points, which is what they meant anyway. The terminology is
somewhat ambiguous, but I don't think will cause real confusion.
NATIVESKIP is also introduced for situations where it is important to be
precise.
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 8f9c1b98ded6584cdbea409d55c6caa12ecf9425
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:22:19 2013 -0700
toke.c: white space only
M toke.c
commit ff0b1085f4172dc9ac1df34b03b8c1da097bb876
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 12:08:50 2013 -0700
utf8.c: Deprecate two functions
This is to force any code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these functions
should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation.
M embed.fnc
M embed.h
M mathoms.c
M proto.h
M toke.c
M utf8.c
M utf8.h
commit c50ad99020a696a3137b8f82a9aa5eb59522cfb6
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:26:09 2013 -0700
Deprecate uvuni_to_utf8()
Code should almost never be dealing with non-native code points
M embed.fnc
M embed.h
M proto.h
M toke.c
M utf8.c
M utf8.h
commit 6ad99a7f0f9d7f1cecc8767e42ab30234ca0aed6
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:02:33 2013 -0700
Deprecate utf8_to_uni_buf()
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
M embed.fnc
M proto.h
M utf8.c
commit 4e5e864a62e9ef154962ed77ba6fc885ec0c4760
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 09:00:18 2013 -0700
makedepend.SH: Comment out unnecessary code
This causes problems currently for z/OS. But, since we don't know why
it was there, I'm leaving it in as a placeholder.
M makedepend.SH
commit 98d6562eb35a5873bf2aa04bc74b09be8fce170e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:26:44 2013 -0700
Deprecate valid_utf8_to_uvuni()
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
M embed.fnc
M proto.h
M utf8.c
commit a90c9d707118b11afaf3ed4afdab5a8d8fbdc73d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:14:26 2013 -0700
utf8.c: Swap which fcn wraps the other
This is in preparation for the current wrapee becoming deprecated
M embed.fnc
M embed.h
M proto.h
M utf8.c
M utf8.h
commit 5c9736946ef1e1705ce14946fb2aaed6f63664b8
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:29:34 2013 -0700
utf8.c: Skip a no-op
Since the value is invariant under both UTF-8 and not, we already have
it in 'uv'; no need to do anything else to get it
M utf8.c
commit 988c877afca26720044181fc8629b2b50c7cc874
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:26:50 2013 -0700
utf8.c: Move comment to where makes more sense
M utf8.c
commit 29111d445934aca43770ecd37140c708fbc61212
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:30:10 2013 -0700
APItest: Test native code points, instead of Unicode
M ext/XS-APItest/APItest.pm
M ext/XS-APItest/APItest.xs
M ext/XS-APItest/t/utf8.t
commit f72579d5a21ff0d2b9342900e7affe3e2950501c
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:25:08 2013 -0700
XXX CPAN Normalize
This converts Unicode::Normalize to use the native tables that are used
by Perl starting in XXX, while using the Unicode-ordered ones that were
used before then.
Another alternative would be to have mktables generate just these tables
in Unicode ordering.
M cpan/Unicode-Normalize/Normalize.xs
commit a06a5cbf64981ecc4a39856be6c75f1de4b8c1e3
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:22:55 2013 -0700
XXX CPAN prob wrong Collate
This changes to implicity usenative code points. This is likely wrong,
as the module comes with its own data, that are probably in terms of
Unicode
M cpan/Unicode-Collate/Collate.xs
commit 1c4655019f75410c7bb96889ce59c1f55008f8b4
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:12:53 2013 -0700
XXX CPAN Encode.xs
Use core function if available. This will insulate this code from any
future changes.
M cpan/Encode/Encode.xs
commit 22fed8e01ea8e5e115cec3881b8779189c83089d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:04:24 2013 -0700
XXX CPAN and unsure Encode
M cpan/Encode/Encode.xs
M cpan/Encode/Unicode/Unicode.xs
commit f1d67942a238ecaafa6b44755766b353f8ab31f3
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:00:47 2013 -0700
XXX CPAN Encode.xs: fix indent
M cpan/Encode/Encode.xs
commit 15c9f46c19118642a4e0f3425f31f1b28abd703e
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 17:23:15 2013 -0700
Don't refer to U+XXXX when mean native
These messages say the output number is Unicode, but it is really
native, so change to saying is 0xXXXX.
M regen/regcharclass_multi_char_folds.pl
M regexec.c
commit fc09ecd53e1a438c3ffc76874196761fecc4bea7
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:43:59 2013 -0700
Convert some uvuni() to uvchr()
All the tables are now based on the native character set, so using
uvuni() in almost all cases is wrong.
M cygwin/cygwin.c
M doop.c
M op.c
M pp_pack.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
commit 6dddf71f33a09f1fca164ca55820d0f0f1ed1e40
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:25:47 2013 -0700
handy.h: White space only
M handy.h
commit dbb5eeedc75022fa0f1d5e6eb1246e1cf52a9c28
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:19:49 2013 -0700
t/test.pl: Allow native/latin1 string conversions to work on utf8.
These functions no longer have the hard-coded definitions in them,
but now end up resolving to internal functions, so that new encodings
could be added and these would automatically understand them.
Instead of using tr///, these now go character by character and
converting to/from ord, which is slower, but allows them to operate on
utf8 strings.
Peephole optimization should make these essentially no-ops on ascii
platforms.
M t/test.pl
commit a0b2cdff7e81c18eced9d82f30bafd7cc72bb55c
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:05:55 2013 -0700
t/test.pl: Simplify ord to/from native fcns
This commit changes these functions from converting to/from a string to
calling utf8:: functions which operate on ordinals instead.
M t/test.pl
commit b05bc866b3094bf05053debc9e0dbb8615d2a19f
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:35:38 2013 -0700
Make casing tables native
These are final tables that haven't been converted to native character
set casing.
M perl.h
M utfebcdic.h
commit cdb80266c2dce7e2327f763bbade9534d58818f3
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:32:30 2013 -0700
utfebcdic.h: Remove trailing spaces
M utfebcdic.h
commit 84f64500b7e41eae2225ec1ede470ad6352da601
Author: Karl Williamson <[email protected]>
Date: Fri Feb 22 18:55:26 2013 -0700
EBCDIC has the unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
A previous commit for 5.18 changed the docs about this issue. This
current commit forces ASCII rules on EBCDIC platforms (even should there
be one that natively uses all 256). To get all 256, the same things
like 'use feature "unicode_strings"' must now be done.
M handy.h
commit 6a868dce68f6a2365cb388eab6fa6da63c832668
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:47:52 2013 -0700
handy.h: Solve a failure to compile problem under EBCDIC
handy.h is included in files that don't include perl.h, and hence not
utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion
macros being available to us. The best way to cope is to use the native
ctype functions. Most, but not all, of the macros in this commit
currently resolve to use those native ones, but a future commit will
change that.
M handy.h
commit b9f9cb81c5d841b5af85220fe40d2f151de885db
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:35:12 2013 -0700
handy.h: Simplify some macro definitions
Now, only one of the macros relies on magic numbers (isPRINT), leading
to clearer definitions.
M handy.h
commit 963da8af5115db9b2ad4530244a307244454cebe
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:26:49 2013 -0700
handy.h: Combine macros that are same in ASCII, EBCDIC
These 4 macros can have the same RHS for their ASCII and EBCDIC
versions, so no need to duplicate their definitions
This also enables the EBCDIC versions to not have undefined expansions
when compiling without perl.h
M handy.h
commit f9cdcac7bab9b2e333bf4ba24c4506087165f4fb
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:39:48 2013 -0700
Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
M embed.fnc
M mathoms.c
M proto.h
M toke.c
M utf8.h
M utfebcdic.h
commit 28a80a1d7c321f2e1f6e2e214255de7abcb4a8da
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:26:43 2013 -0700
Remove remaining calls of NATIVE_TO_NEED
These calls are just copying the input to the output byte by byte.
There is no need to worry about UTF-8 or not, as the output is just an
exact copy of the input
M toke.c
commit 689b516cd4a92a0292da0971c09ad4d21c2c87f7
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:12:15 2013 -0700
toke.c: Remove some NATIVE_TO_NEED calls
I believe NATIVE_TO_NEED is defective, and will remove it in a future
commit. But, just in case I'm wrong, I'm doing it in small steps so
bisects will show the culprit. This removes the calls to it where the
parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
result can't be other than just the parameter.
M toke.c
commit 8441d25c448302c7b84280a94c4765d078b8825c
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:22:07 2013 -0700
toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
This code is attempting to deal with the problem of holes in the ranges
a-z and A-Z in EBCDIC. Prior to this patch, it accepeted things like A
WITH GRAVE, etc, which shouldn't have the special processing to deal
with the holes
M toke.c
commit 8527789654b06cbc75d631b81326b0a600e8257a
Author: Karl Williamson <[email protected]>
Date: Tue Feb 19 15:13:19 2013 -0700
Use real illegal UTF-8 byte
The code here was wrong in assuming that \xFF is not legal in UTF-8
encoded strings. It currently doesn't work due to a bug, but that may
eventually be fixed: [perl #116867]. The comments are also wrong that
all bytes are legal in UTF-EBCDIC.
It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
(C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
of an illegal overlong sequence.
This creates a #define for an illegal byte using one of the real illegal
ones, and changes the code to use that.
No test is included due to #116867.
M op.c
M toke.c
M utf8.h
commit 706e9c39f6d6dc931cbde4a35a8ad2cd6bed9bde
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 14:00:13 2013 -0700
toke.c: Don't remap \N{} for EBCDIC
Everything is now in native,
M toke.c
commit 92a52e82ded6e0a95a7eee8f83353f8f8a51b498
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:50:45 2013 -0700
toke.c: Remove remapping for EBCDIC for octal
The code prior to this commit converted something like \04 into its
EBCDIC equivalent only in double-quoted strings. This was not done in
patterns, and so gave inconsistent results. The correct thing to do
should be to do the native thing, what someone who works on a platform
would think \04 do. Platform independent characters are available
through \N{}, either by name or by U+.
The comment changed by this was wrong, as in some cases it was native,
and in some cases Unicode.
M toke.c
commit 1cbbcd5247dcdf1dceff39ade164afd8c991dc0c
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:47:13 2013 -0700
Remove EBCDIC remappings
Now that the tables are stored in native format, we shouldn't be doing
remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; not all of this has been done yet.
M handy.h
M perly.c
M pp.c
M regcomp.c
M regexec.c
M utf8.c
commit dec20f03b7efeecb33e5b21ca23f795137b86702
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 12:46:05 2013 -0700
Add and use macro to return EBCDIC
The conversion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
M handy.h
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
M utf8.h
commit 52a9873ded052ec5b5a743c9116c628355e0b5d7
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 09:18:06 2013 -0700
charnames: fix nit in comment
M lib/_charnames.pm
commit 97a4b15afc43af8ca2a1a6cb940b288c9904262d
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 11:05:44 2013 -0700
charnames: Make work in EBCDIC
Now that mktables generates native tables, the only thing that was
needed was to make U+ mean Unicode instead of native.
M lib/_charnames.pm
M lib/charnames.pm
commit 8c35a04270f44f40259acaa338d02792483c136b
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 09:35:56 2013 -0700
Unicode::UCD: Work on non-ASCII platforms
Now that mktables generates native tables, it is a fairly simple matter
to get Unicode::UCD to work on those platforms.
M lib/Unicode/UCD.pm
commit c1fe2aef095379bbf654c0e273dfcef2dd86e256
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 22:16:38 2013 -0700
mktables: Generate native code-point tables
The output tables for mktables are now in the platform's native
character set. This means there is no change for ASCII platforms, but
is a change for EBCDIC ones.
Since we currently don't have any EBCDIC test platforms, I tested this
by faking it out to generate EBCDIC data, and then eye-balled the
results.
Code that didn't realize there was a potential difference between EBCDIC
and non-EBCDIC platforms will now start to work; code that tried to do
the right thing under these circumstances will no longer work. Fixing
that comes in later commits.
M lib/unicore/mktables
commit 1d00821c6b276605efb54969f1bb4189f063f5d3
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 10:50:00 2013 -0700
Fix some EBCDIC problems
These spots have native code points, so should be using the macros for
native code points, instead of Unicode ones.
M regcomp.c
M sv.c
M toke.c
commit bf4a47e9ba82e67bbbcb0631e1be09bb5fd87e20
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:10:19 2013 -0700
Remove unnecessary temp variable in converting to UTF-8
These areas of code included a temporary that is unnecessary.
M inline.h
M regcomp.c
M sv.c
commit 6094d1ec65a19891fc543e9b7ecafd5bdf9dc9c0
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:00:55 2013 -0700
utf8.h: Correct macros for EBCDIC
These macros were incorrect for EBCDIC. The 3 step process given in
utfebcdic.h wasn't being followed.
M utf8.h
commit 66479abfbd82f4feddd16942efdc5133a409e5c6
Author: Karl Williamson <[email protected]>
Date: Sat Feb 9 21:23:30 2013 -0700
Extract common code to an inline function
This fairly short paradigm is repeated in several places; a later commit
will improve it.
M embed.fnc
M embed.h
M inline.h
M pp_pack.c
M proto.h
M sv.c
M toke.c
M utf8.c
commit f9a89c7ae5714f9c3dee81610d19117fda4a8ae1
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 21:35:57 2013 -0700
Don't use EBCDIC macro for a C language escape
C recognizes '\a' (for BEL); just use that instead of a look-up.
regen/unicode_constants.pl could be used to generate the character for
the ESC (set in surrounding code), but I didn't do that because of
potential bootstrapping problems when porting to an EBCDIC platform
without a working perl. (The other characters generated in that .pl are
less likely to cause problems when compiling perl.)
M regcomp.c
M toke.c
commit 20bf9f725575bd4846307b894bbb11f6a5d677f2
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 19:53:38 2013 -0700
Use byte domain EBCDIC/LATIN1 macro where appropriate
The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
whole Unicode range. In the locations affected by this commit, it is
known that the domain is limited to a single byte, so the simpler ones
whose names contain LATIN1 may be used.
On ASCII platforms, all the macros are null, so there is no effective
change.
M handy.h
M regcomp.c
M utf8.c
commit a8e0b7e9f601876845c1d17d2757d05e77999033
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 14:31:09 2013 -0700
Use new clearer named #defines
This converts several areas of code to use the more clearly named macros
introduced in a recent commit
M op.c
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit ff84e6762c6da6341f178f026a3f911b4bbc6b37
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 13:52:31 2013 -0700
utf8.h, utfebcdic.h: Create less confusing #defines
This commit creates macros whose names mean something to me, and I don't
find confusing. The older names are retained for backwards
compatibility. Future commits will fix bugs I introduced from
misunderstanding the meaning of the older names.
The older names are now #defined in terms of the newer ones, and moved
so that they are only defined once, valid for both ASCII and EBCDIC
platforms.
M utf8.h
M utfebcdic.h
commit 921e82805a59d6bac1cab032d5656c521be0bb0e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 4 14:22:02 2013 -0700
pp_ctl.c: Use isCNTRL instead of hard-coded mask
This is clearer and portable to EBCDIC.
M pp_ctl.c
commit c750bd5e1f58abafaf9162939e7b66ec72702280
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:51:05 2013 -0700
utf8.c: is_utf8_char_slow() should use native length
What is passed is the actual length of the native utf8 character. What
this was calculating was the length it would be if it were a Unicode
character, and then compares, apples to oranges.
M utf8.c
-----------------------------------------------------------------------
--
Perl5 Master Repository