In perl.git, the branch khw/ebcdic has been created
<http://perl5.git.perl.org/perl.git/commitdiff/815223679662dca53f1fe4b6e07cac2af59f103b?hp=0000000000000000000000000000000000000000>
at 815223679662dca53f1fe4b6e07cac2af59f103b (commit)
- Log -----------------------------------------------------------------
commit 815223679662dca53f1fe4b6e07cac2af59f103b
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 13:09:09 2013 -0600
XXX fix \x{too large}
M dist/IO/IO.xs
M doop.c
M inline.h
M pp.c
M pp_pack.c
M regcomp.c
M sv.c
M toke.c
M utf8.c
M utf8.h
commit c1a7516c49b747abe114294966c25223065c6289
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 17:59:59 2013 -0600
mktables: Fix typo in comment
This used a real CTRL-X, instead of $^X
M lib/unicore/mktables
commit 5857960edff45804a3eca40fbaba8a97650e5659
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:16:08 2013 -0600
utf8.c: Fix so UTF-16 to UTF-8 conversion works under EBCDIC
M utf8.c
commit 682938cb2fdb8ae9b6537bffce56d7af2ce9353e
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:14:34 2013 -0600
utf8.h, utfebcdic.h: Add #define
M utf8.h
M utfebcdic.h
commit 5ba9ab104dc7f9215335789717f261eb23ed740a
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:11:25 2013 -0600
utf8.c: Use mnemonics instead of hex numbers
M utf8.c
commit 70235ee67bf96b7c5eb90da237187685f42f433e
Author: Karl Williamson <[email protected]>
Date: Wed Mar 20 22:15:58 2013 -0600
XXX lib/Unicode/UCD.t: allow to run under EBCDIC,
with failures expected
M lib/Unicode/UCD.t
commit ce5d9ac7c65227a184b73a45f19c87554e5b4d4d
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 15:27:31 2013 -0600
t/op/quotemeta.t: EBCDIC fixes
M t/op/quotemeta.t
commit 5d554d6cda2556cb5d479c3650b12a5ceca41b58
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:32:55 2013 -0600
t/re/fold_grind.t: Fixes for EBCDIC
M t/re/fold_grind.t
commit 12b47c41699a1ac9921aff7407edddf6b5ef95f2
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:21:09 2013 -0600
t/lib/charnames/alias: Fix some EBCDIC problems
M t/lib/charnames/alias
commit 0467ebde8d3da372919beb854927d54612efe1d1
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:20:24 2013 -0600
t/uni/class.t: Make work on EBCDIC
M t/uni/class.t
commit 54d1301e89f5a2e5f2f60950ebe3664d2cc3b480
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:16:12 2013 -0600
Unicode::UCD: Fix EBCDIC bug
M lib/Unicode/UCD.pm
commit 89bdcf9bb34367a5537911d66f032a40ab95c1ab
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:01:57 2013 -0600
feature/unicode_strings.t: Fix to work on EBCDIC
M lib/feature/unicode_strings.t
commit 23a42b2a645541cae5a3c0591bcf97d9edda1073
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:33:59 2013 -0600
XXX lib/strict.t: Skip for now
this is to see if this is what is causing the tests to not finish.
M lib/strict.t
commit 4439c201af57d4e516a7c7933f69ace2682f28fb
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:12:30 2013 -0600
regen/regcharclass.pl: make more EBCDIC friendly
One of the possible inputs to this process is a string. This clarifies
that it must be specified in Unicode characters, and adds code to
translate it to native, if necessary.
M regen/regcharclass.pl
commit c0125e614a78b4f3e864d0ed2fbf3c2cbb9c37ba
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:10:46 2013 -0600
XXX regen/regcharclass.pl: maybe temp comment out utf8_char
M regen/regcharclass.pl
commit e6685869e73608b4ca3c62ae53b13f4ccfb65c03
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 22:23:47 2013 -0600
XXX temporary regcharclass.h
This has parts that were generated on z/OS, and look good; the ones that
are still problematic are omitted;
f
M regcharclass.h
commit a0e5379224a4a931d430d2cf90eb542b4af13203
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:09:53 2013 -0600
XXX temporary comment out multi-char folds
M regcomp.c
M regen/regcharclass.pl
M regexec.c
M t/re/fold_grind.t
M t/re/reg_fold.t
commit 7e0272847ba485e60ada40c557a519dbf1eb9004
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 22:00:29 2013 -0600
XXX temp skip perl5db.t
M lib/perl5db.t
commit aff4a09fb30c0031b4ecbcfd5101ad87b1d2777d
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:45:06 2013 -0600
pp.c: White-space only
Make a ternary operation more clear
M pp.c
commit a1d569b82f83e0fc83e4b8809e1332115e9b3c46
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:43:42 2013 -0600
Fix valid_utf8_to_uvchr() for EBCDIC
M utf8.c
commit d173561f840ff9ee7eb0b9991b71759a5df1b6a5
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 21:42:20 2013 -0600
t/test.pl: Add comment about EBCDIC
M t/test.pl
commit a6b3cee287bce587658e3e16fc4499cd64c78310
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 17:39:33 2013 -0600
XXX makedepend.SH: Why does 255 work and 250 not?
M makedepend.SH
commit 94032f3099854e79a4c3e8b4dfdf38c2af7ea9ed
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:48:22 2013 -0600
XXX regen/mk_PL_charclass.pl: Make EBCDIC friendly
need more of a commit message
M regen/mk_PL_charclass.pl
commit 113f75663ffde3048fd83f24bfa8ee9841836834
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:44:44 2013 -0600
XXX make various things more EBCDIC friendly
Adds trailing white space errors
Need to know what to do about ^A meaning 0x1, and M-foo meaning meta
M lib/DB.pm
M lib/dumpvar.pl
M lib/perl5db.pl
M lib/sigtrap.pm
commit b81ad8866b8a014fefa2f710894e06344bc5a0ed
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:57 2013 -0600
XXX charnames.t: Make more EBCDIC friendly
Why need utf8::unicode_to_native
M lib/charnames.t
commit 8b43ffd4b156f536279533d772bb80a063c8a7ec
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:15 2013 -0600
XXX: Fixup commit message.
Fix UTF8_ACUUMULATE, utf8.c
M utf8.c
M utf8.h
commit 3446d548c661ef8fdb7958de1438099c2f8f053f
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 16:52:45 2013 -0600
regcomp.c: Fix bug in EBCDIC
The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
code points, but under EBCDIC those code points are 0-127.
M regcomp.c
commit d4ca176621939b91bb80331529659e2d0dee5dd0
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 12:26:15 2013 -0600
hints/os390.sh: Suppress bogus compiler message
M hints/os390.sh
commit 7fdd892a616c5f3dc1b6daeab9fc26def3aa2680
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:57:24 2013 -0600
re/charset.t: Allow to work on EBCDIC
This just converts the hard-coded character numbers to native, so will
work on any platform.
M t/re/charset.t
commit 332f6b53422bc86565c28cfc3a3dac452a50e9c9
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:50:35 2013 -0600
XS-APItest/t/handy.t: Change output message
On EBCDIC platforms, the output is not in terms of \N{U+}; change text
to \x{ }
M ext/XS-APItest/t/handy.t
commit 5eb05f77e89b20bf902ed57bced827a0aec0ac77
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 21:44:16 2013 -0600
XXX Dumper.xs: Don't know why this stopped compiling
M dist/Data-Dumper/Dumper.xs
commit d42d2058bc17dc51ada101f8d2b67360dd8b8e37
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:22:28 2013 -0600
toke.c: Fix an ASCII-platform dependency
M toke.c
commit a7be3b128425424d0d805682847837df2e1f771c
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:20:23 2013 -0600
toke.c: Simplify some code
We don't have to test separately for lower vs uppercase here, as
upper/lower case A-Z and a-z are not intermixed in the gaps in A-Z and
a-z under EBCDIC.
M toke.c
commit 4a32e309741008564a83c271d30ae1b712c090b9
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:18:12 2013 -0600
genpacksizetables.pl: Correct comment typo
M genpacksizetables.pl
commit dd5a55b6fdbf8a078469044d059b920b221445a4
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:17:39 2013 -0600
APItest/t/handy.t: Make EBCDIC-friendly
M ext/XS-APItest/t/handy.t
commit dc760d4f4d7e0ecb53a0605d4136b5681058236d
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:16:14 2013 -0600
Data-Dumper: Make EBCDIC-friendly
M dist/Data-Dumper/Dumper.xs
commit a1e9796840396a669143612e462a2d037f534f62
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:14:31 2013 -0600
sv.c: Make less ASCII-centric
M sv.c
commit 3090fcaaaca19ff4b21cbce1469660d4c4201986
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:07:52 2013 -0600
lib/charnames.t: Make some tests work under EBCDIC
M lib/charnames.t
commit 6e8a62517d619be524745705fa4b372cb3b26aa1
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:05:46 2013 -0600
dump.c: Make less ASCII-centric:
This has the added advantage of being clearer as to what is going on.
M dump.c
commit b545fbf35e16a854a4fc9deefc7a17d8eaba2bb1
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:02:52 2013 -0600
hv.c: Stop being ASCII-centric
This uses macros which work cross-platform. This has the added advantge
that it is much clearer what is going on.
M hv.c
commit 0c40dc862a5add7f6356d3c72f11f3871dd1c244
Author: Karl Williamson <[email protected]>
Date: Tue Mar 12 22:34:17 2013 -0600
t/TEST: Don't bail if fails in t/base
M t/TEST
commit f181dc62f34176431050f7a58220ab64c8209fc3
Author: Karl Williamson <[email protected]>
Date: Mon Mar 11 15:11:10 2013 -0600
Added Porting/reorder_charclass_invlists.pl
This program is used too bootstrap perl onto a non-ASCII platform with
no pre-existing perl.
M MANIFEST
A Porting/reorder_charclass_invlists.pl
commit 39d9e9b64dc2daebc6e7b7cebdc2aaf518ea5f09
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 22:17:31 2013 -0600
XXX See if changing \xE2 to \xE1 causes lex.t to work for EBCDIC
\xE2 is 'S' in EBCDIC, and so is going to be legal. \xE1 is not an
ASCII equivalent.
M t/base/lex.t
commit 0c8c1ac7b9e989def45da3318739ea0ecf0777b3
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 11:01:32 2013 -0700
XXX EBCDIC header files
M charclass_invlists.h
M l1_char_class_tab.h
M unicode_constants.h
commit 4c32b094f5bd1f68ab8335ae3a4b02564a9217f3
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 12:31:25 2013 -0700
XXX Temporary for z/OS long long support
M Configure
M hints/os390.sh
commit 221aeabea45fbe3a41fa9d0a0b233b1f4070f1be
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 13:11:07 2013 -0600
XXX Temporary comment out ParseXS check
this is to get things to compile for now
M dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm
commit 336d3e4d3e321ae3a6c689a94d658f13f7fdff1a
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 11:34:10 2013 -0600
XXX Collate, Normalize: Allow to compile under EBCDIC
M cpan/Unicode-Collate/Collate.pm
M cpan/Unicode-Collate/mkheader
M cpan/Unicode-Normalize/Normalize.pm
M cpan/Unicode-Normalize/mkheader
commit 614d2ecabc91c3ecfdcb7921b6cd24737f0131ee
Author: Karl Williamson <[email protected]>
Date: Sat Mar 9 21:57:38 2013 -0700
XXX dquote_static.c: Silence wrong warning on EBCDIC
Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
which should be a separate commit anyway.
This silences an inappropriate warning that doesn't happen on ASCII
platforms. CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms. But
0x14 is a C1 control on EBCDIC, a C0 on ASCII. Therefore the test that
it's a control should include both C0 and C1, which isCNTRL_L1() does.
Also has a white-space change, outdenting a line so it doesn't wrap in
an 80 column window.
M dquote_static.c
commit fff4bbf56c73af6e38fee8290f628678f4531330
Author: Karl Williamson <[email protected]>
Date: Thu Mar 7 12:08:41 2013 -0700
utfebcdic.h: Change 'unsigned char' to U8
This is for consistency with the rest of Perl
M utfebcdic.h
commit f537c16cc369ee788ee47d48c78b44a1947d358f
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 08:11:38 2013 -0700
regen/regcharclass.pl: Make more EBCDIC-friendly
This commit changes the code generated by the macros so that they work
right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs. THEY
ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
onto the target platform, and regcharclass.pl can be run there,
generating macros correct UTF-8.
M regcharclass.h
M regen/regcharclass.pl
commit 3502f186fe84ce851338932f2cd15afbc2901b28
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:30:01 2013 -0700
utfebcdic.h: Add (UV) cast
The operand of this macro is implicitly a UV. Make sure that it is.
M utfebcdic.h
commit 4dffe023d31b65b3c5b6c309f67fa8b096e136b1
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 17:04:58 2013 -0700
handy.h: Allow bootstrapping to non-ASCII platform
This adds a bunch of macros and moves things around to support
conditional compilation when Configure is called with
-DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are
table-driven to not be used, since the table may not be valid when
bringing Perl up for the first time on a non-ASCII platform.
This allows it to compile using the platform's native C library ctype
functions, which should work enough to compile miniperl, and allow the
table to be changed to be valid. Then Configure can be re-run to not
bootstrap, and normal compilation can proceed
M handy.h
M inline.h
commit 23092c83d68d68a676aefa4272ccb8277fa60c2d
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:43:26 2013 -0700
gv.c: Remove EBCDIC dependency
M gv.c
commit 7ef4edd2485cbcfe28c2fc1d6cccd9cf2bb95b8d
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:00:47 2013 -0700
toke.c: Remove EBCDIC dependency
M toke.c
commit 793e120589900f255abd1f2386e521fecdd5ded8
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:14:25 2013 -0700
toke.c: Remove character set dependency
Instead of hard-coding the bit patterns that comprise the Byte Order
Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
the current platform.
This removes some EBCDIC-only code.
M toke.c
commit 4d4c99218002cbe48c8c5d8e37308efb05918925
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:10:27 2013 -0700
unicode_constants.h: Add #defines for Byte Order Mark
These will be used in future commits
M regen/unicode_constants.pl
M unicode_constants.h
commit 0ed2172a63c27191cdefdf970a73ec3cf647ff90
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 15:04:18 2013 -0700
XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
This macro may not be present, and is currently used exclusively in
IS_UTF8_CHAR, which itself may be undefined, and code should cope with
that. This is a work-around until a better solution is found.
M utf8.c
M utf8.h
commit c6dbc6f2a3002060644e400a2196fe052afab8bd
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 14:09:04 2013 -0700
Add Porting tool for help with non-ASCII platforms
Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
non-ASCII platform with no working Perl.
M MANIFEST
A Porting/reorder_l1_char_class_tab.pl
M regen/mk_PL_charclass.pl
commit cbc75ad703dc65a1e2d1b2a3496b8bfb7d389919
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 13:06:58 2013 -0700
inline.h: Reorder functions
The comment implied that the functions below it in the file were
deprecated, but in fact only the next two functions were. This
clarifies that and moves them so they are the final ones in the file
M inline.h
commit 24d54f4e2876ee26e17d2a98bb0d86a98cc984a7
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:33:42 2013 -0700
utfebcdic.h: Add comment
M utfebcdic.h
commit 18ac31d44401af5bee213f8911aae9ec5ea71d2b
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:12:11 2013 -0700
utf8.h: Clean up START_MARK definition and use
The previous definition broke good encapsulation rules. UTF_START_MARK
should return something that fits in a byte; it shouldn't be the caller
that does this. So the mask is moved into the definition. This means
it can apply only to the portion that creates something larger than a
byte. Further, the EBCDIC version can be simplified, since 7 is the
largest possible number of bytes in an EBCDIC UTF8 character.
M utf8.h
M utfebcdic.h
commit e55bdf2186cd9c28626a40b9a6a877380eb4c5e0
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:05:26 2013 -0700
utf8.h: Move #includes
These two files were only being #included for non-ebcdic compiles; they
should be included always.
M utf8.h
commit 956c96f4fc08932c5cd56f076acc50a6c78da60c
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 11:49:14 2013 -0700
utfebcdic.h: Remove extra parameter expansions
These two macros were improperly expanding the parameters as well as
defining the operation, leading to compile errors.
M utfebcdic.h
commit c616415bddb138ad67e40f5d48752bd15ad6d4cd
Author: Karl Williamson <[email protected]>
Date: Fri Mar 1 08:28:52 2013 -0700
utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
UTF8_TWO_BYTE_LO. But the EIGHT_BIT versions can use the less general
and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
domain is restricted in the EIGHT_BIT. Note that on ASCII platforms,
these both expand to the same thing, so the difference matters only on
EBCDIC.
M utf8.h
commit 80e26d0018d924cc9d800e40c1ee8ea10650cf60
Author: Karl Williamson <[email protected]>
Date: Thu Feb 28 09:25:27 2013 -0700
XXX temp: show makedepend cerr
M makedepend.SH
commit fb485cc443630478afe8ac6308d89072a08b2646
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 21:59:11 2013 -0700
makedepend.SH: Split too long lines; properly join
I had thought that a continuation introduced a space. But no,
a continuation can happen in the middle of a token.
And this splits lines that are getting very long to avoid preprocessor
limitations.
M makedepend.SH
commit ceaa6a6389ab25a0c39f2a0a3b6969be3a69612d
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 15:51:28 2013 -0700
makedepend.SH: White-space only
Align continuation backslashes
M makedepend.SH
commit f119a27bb869cf6d54072ea977b47f83e82cea6f
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:39:28 2013 -0700
makedepend.SH: Remove some unnecessary white space
Multi-line preprocessor directives are now joined into single lines.
This can create lines too long for the preprocessor to handle. This
commit removes blanks adjoining comments that get deleted. This makes
things somewhat less likely to exceed the limit.
This commit also fixes several [] which were meant to each match a tab
or a blank, but editors converted the tabs to blanks
M makedepend.SH
commit e159b3d362008f5d9b8a679b70aa92a40e72be5e
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:30:51 2013 -0700
makedepend.SH: Retain '/**/' comments
These comments may actually be necessary.
M makedepend.SH
commit 9ca820fb73a833d55cb7912a27876193a7e9cd28
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 08:38:19 2013 -0700
handy.h: Remove extraneous parens
M handy.h
commit 8af37c5416b89b8361db3690bafe7034aae962ae
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 13:06:07 2013 -0500
Disable gcc-style function attributes on z/OS.
John Goodyear <[email protected]> reports that the z/OS C compiler
supports the attribute keyword, but not exactly the same as gcc.
Instead of a "warning", the compiler emits an "INFORMATIONAL" message
that Configure fails to detect. Until Configure is fixed, just disable
the attributes altogether.
John Goodyear
M hints/os390.sh
commit 08afbed90056aae62fc2b1609b35854969605f79
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 09:12:13 2013 -0500
Change os390 custom cppstdin script to use fgrep.
Grep appears to be limited to 2048 characters, and truncates
the output for cppstin. Fgrep apparently doesn't have that limit.
Thanks to John Goodyear <[email protected]> for reporting this.
M hints/os390.sh
commit 94fed72293230cf1724fcdcb9eb94fb0c82d16e2
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:45:19 2013 -0700
utf8.c: Use more clearly named macro
In the case of invariants these two macros should do the same thing,
but it seems to me that the latter name more clearly indicates what is
going on.
M utf8.c
commit ad36843c38a88c6a3b9401439b37dd95f93fe76d
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:35:12 2013 -0700
Add macro OFFUNISKIP
This means use official Unicode code point numbering, not native. Doing
this converts the existing UNISKIP calls in the code to refer to native
code points, which is what they meant anyway. The terminology is
somewhat ambiguous, but I don't think will cause real confusion.
NATIVESKIP is also introduced for situations where it is important to be
precise.
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 24eb6f53a20dd1ab864b84058213e8f683ebe0a9
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:22:19 2013 -0700
toke.c: white space only
M toke.c
commit c6edba5f21a15a25dbe3345f1dd4cfc8a86227de
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 12:08:50 2013 -0700
utf8.c: Deprecate two functions
This is to force any code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these functions
should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation.
M embed.fnc
M embed.h
M mathoms.c
M proto.h
M toke.c
M utf8.c
M utf8.h
commit c87b1587357676356947660ef915bf0aaa04097d
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:26:09 2013 -0700
Deprecate uvuni_to_utf8()
Code should almost never be dealing with non-native code points
M embed.fnc
M embed.h
M proto.h
M toke.c
M utf8.c
M utf8.h
commit 959c0f9e14f3418c74263f30b7ae69e7062a062e
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:02:33 2013 -0700
Deprecate utf8_to_uni_buf()
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
M embed.fnc
M proto.h
M utf8.c
commit 99d576dd1501c2e10e27f4d11413485999e64015
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 09:00:18 2013 -0700
makedepend.SH: Comment out unnecessary code
This causes problems currently for z/OS. But, since we don't know why
it was there, I'm leaving it in as a placeholder.
M makedepend.SH
commit f26b961574643954fd707134342fb496622189ff
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:26:44 2013 -0700
Deprecate valid_utf8_to_uvuni()
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
M embed.fnc
M proto.h
M utf8.c
commit e0bc25f4b2bae6f22d030571682b10abe59c0b9d
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:14:26 2013 -0700
utf8.c: Swap which fcn wraps the other
This is in preparation for the current wrapee becoming deprecated
M embed.fnc
M embed.h
M proto.h
M utf8.c
M utf8.h
commit ef84693b293a5c61a879ee4cb920e15b1083cb37
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:29:34 2013 -0700
utf8.c: Skip a no-op
Since the value is invariant under both UTF-8 and not, we already have
it in 'uv'; no need to do anything else to get it
M utf8.c
commit 623629b2bd95eac14212abe7a163cc56f5ee0c50
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:26:50 2013 -0700
utf8.c: Move comment to where makes more sense
M utf8.c
commit 035a81b54333f45f22008b106f8f5825bdde4a86
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:30:10 2013 -0700
APItest: Test native code points, instead of Unicode
M ext/XS-APItest/APItest.pm
M ext/XS-APItest/APItest.xs
M ext/XS-APItest/t/utf8.t
commit 1711c27d3fd8a5cdb0589cc963fea0abb5bfde1b
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:25:08 2013 -0700
XXX CPAN Normalize
This converts Unicode::Normalize to use the native tables that are used
by Perl starting in XXX, while using the Unicode-ordered ones that were
used before then.
Another alternative would be to have mktables generate just these tables
in Unicode ordering.
M cpan/Unicode-Normalize/Normalize.xs
commit ee52808352e8568b7fde621dbd8eca08b821264c
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:22:55 2013 -0700
XXX CPAN prob wrong Collate
This changes to implicity usenative code points. This is likely wrong,
as the module comes with its own data, that are probably in terms of
Unicode
M cpan/Unicode-Collate/Collate.xs
commit 08b370111c57aeca3669383d26e3c4080bebb135
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:12:53 2013 -0700
XXX CPAN Encode.xs
Use core function if available. This will insulate this code from any
future changes.
M cpan/Encode/Encode.xs
commit b30b174e4362b7bae9d0b8ae0aae52ddbedc567e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:04:24 2013 -0700
XXX CPAN and unsure Encode
M cpan/Encode/Encode.xs
M cpan/Encode/Unicode/Unicode.xs
commit dbba785aed07bec7e35858a0a46526ad7b7da27e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:00:47 2013 -0700
XXX CPAN Encode.xs: fix indent
M cpan/Encode/Encode.xs
commit fca88e11a250681760fc96b42786c891a6324e48
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 17:23:15 2013 -0700
Don't refer to U+XXXX when mean native
These messages say the output number is Unicode, but it is really
native, so change to saying is 0xXXXX.
M regen/regcharclass_multi_char_folds.pl
M regexec.c
commit 53f8922321fd308df3c03ae09fe55bdd90ea41af
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:43:59 2013 -0700
Convert some uvuni() to uvchr()
All the tables are now based on the native character set, so using
uvuni() in almost all cases is wrong.
M cygwin/cygwin.c
M doop.c
M op.c
M pp_pack.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
commit a62ee910723a8273c02e840ffd173808ad211b34
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:25:47 2013 -0700
handy.h: White space only
M handy.h
commit 6bef9d65ce0d1f2cb26b5a33601d322c96733ef0
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:19:49 2013 -0700
t/test.pl: Allow native/latin1 string conversions to work on utf8.
These functions no longer have the hard-coded definitions in them,
but now end up resolving to internal functions, so that new encodings
could be added and these would automatically understand them.
Instead of using tr///, these now go character by character and
converting to/from ord, which is slower, but allows them to operate on
utf8 strings.
Peephole optimization should make these essentially no-ops on ascii
platforms.
M t/test.pl
commit dfaddc5b53b332177c8dca096dd65fa5089198af
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:05:55 2013 -0700
t/test.pl: Simplify ord to/from native fcns
This commit changes these functions from converting to/from a string to
calling utf8:: functions which operate on ordinals instead.
M t/test.pl
commit 89804c135940c0aa5900d442263e80c99bc48875
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:35:38 2013 -0700
Make casing tables native
These are final tables that haven't been converted to native character
set casing.
M perl.h
M utfebcdic.h
commit ea5b8ce2f931768ddd64c7485f4b9ec7fe2b827d
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:32:30 2013 -0700
utfebcdic.h: Remove trailing spaces
M utfebcdic.h
commit 3ebf66f4c2e853754fee951dc567f6e006ea8493
Author: Karl Williamson <[email protected]>
Date: Fri Feb 22 18:55:26 2013 -0700
EBCDIC has the unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
A previous commit for 5.18 changed the docs about this issue. This
current commit forces ASCII rules on EBCDIC platforms (even should there
be one that natively uses all 256). To get all 256, the same things
like 'use feature "unicode_strings"' must now be done.
M handy.h
commit 3146837d3952ff7ea67777170195d42989a7556d
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:47:52 2013 -0700
handy.h: Solve a failure to compile problem under EBCDIC
handy.h is included in files that don't include perl.h, and hence not
utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion
macros being available to us. The best way to cope is to use the native
ctype functions. Most, but not all, of the macros in this commit
currently resolve to use those native ones, but a future commit will
change that.
M handy.h
commit 6afceac43d6be6bf44055f42a9c0bc5cf529c537
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:35:12 2013 -0700
handy.h: Simplify some macro definitions
Now, only one of the macros relies on magic numbers (isPRINT), leading
to clearer definitions.
M handy.h
commit 29145ad4eb94d3855331734beef8a9a69d35f37b
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:26:49 2013 -0700
handy.h: Combine macros that are same in ASCII, EBCDIC
These 4 macros can have the same RHS for their ASCII and EBCDIC
versions, so no need to duplicate their definitions
This also enables the EBCDIC versions to not have undefined expansions
when compiling without perl.h
M handy.h
commit e79cfb4801b632f73469b1430d61a1775fa9aa90
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:39:48 2013 -0700
Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
M embed.fnc
M mathoms.c
M proto.h
M toke.c
M utf8.h
M utfebcdic.h
commit b9c44891bbc5b3a627f31b81b1cf9e1ac6af0c86
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:26:43 2013 -0700
Remove remaining calls of NATIVE_TO_NEED
These calls are just copying the input to the output byte by byte.
There is no need to worry about UTF-8 or not, as the output is just an
exact copy of the input
M toke.c
commit 3ffba69dd6a676c5c27a606336d4aa0674f40f46
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:12:15 2013 -0700
toke.c: Remove some NATIVE_TO_NEED calls
I believe NATIVE_TO_NEED is defective, and will remove it in a future
commit. But, just in case I'm wrong, I'm doing it in small steps so
bisects will show the culprit. This removes the calls to it where the
parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
result can't be other than just the parameter.
M toke.c
commit 718c52e035c367bf19c87612d949c2d6f3c47856
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:22:07 2013 -0700
toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
This code is attempting to deal with the problem of holes in the ranges
a-z and A-Z in EBCDIC. Prior to this patch, it accepeted things like A
WITH GRAVE, etc, which shouldn't have the special processing to deal
with the holes
M toke.c
commit 58a990d4d4fea7d28327435cb1bfa18caa6fba1f
Author: Karl Williamson <[email protected]>
Date: Tue Feb 19 15:13:19 2013 -0700
Use real illegal UTF-8 byte
The code here was wrong in assuming that \xFF is not legal in UTF-8
encoded strings. It currently doesn't work due to a bug, but that may
eventually be fixed: [perl #116867]. The comments are also wrong that
all bytes are legal in UTF-EBCDIC.
It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
(C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
of an illegal overlong sequence.
This creates a #define for an illegal byte using one of the real illegal
ones, and changes the code to use that.
No test is included due to #116867.
M op.c
M toke.c
M utf8.h
commit 42143e59fbedcc5275f97edb3035e71acbf95632
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 14:00:13 2013 -0700
toke.c: Don't remap \N{} for EBCDIC
Everything is now in native,
M toke.c
commit 3589d3e936f15b8cf550e2c832515d75a09b6192
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:50:45 2013 -0700
toke.c: Remove remapping for EBCDIC for octal
The code prior to this commit converted something like \04 into its
EBCDIC equivalent only in double-quoted strings. This was not done in
patterns, and so gave inconsistent results. The correct thing to do
should be to do the native thing, what someone who works on a platform
would think \04 do. Platform independent characters are available
through \N{}, either by name or by U+.
The comment changed by this was wrong, as in some cases it was native,
and in some cases Unicode.
M toke.c
commit 37ac7768f27acc4fdabf4de02e0336167f232490
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:47:13 2013 -0700
Remove EBCDIC remappings
Now that the tables are stored in native format, we shouldn't be doing
remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; not all of this has been done yet.
M handy.h
M perly.c
M pp.c
M regcomp.c
M regexec.c
M utf8.c
commit dd5d1aae981ff54b9a980d0344c286e7effa32b6
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 12:46:05 2013 -0700
Add and use macro to return EBCDIC
The conversion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
M handy.h
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
M utf8.h
commit 429040f3a2763069f03df4dd29e3957f557c3782
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 09:18:06 2013 -0700
charnames: fix nit in comment
M lib/_charnames.pm
commit d8a1c221f8dea3d84552e2ca73156644a9606acd
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 11:05:44 2013 -0700
charnames: Make work in EBCDIC
Now that mktables generates native tables, the only thing that was
needed was to make U+ mean Unicode instead of native.
M lib/_charnames.pm
M lib/charnames.pm
commit 11675c614c3c8ca16fb32ad3651625dff48e0dae
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 09:35:56 2013 -0700
Unicode::UCD: Work on non-ASCII platforms
Now that mktables generates native tables, it is a fairly simple matter
to get Unicode::UCD to work on those platforms.
M lib/Unicode/UCD.pm
commit fb4b59668f9e6de1128b1f68578d473aff8b5993
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 22:16:38 2013 -0700
mktables: Generate native code-point tables
The output tables for mktables are now in the platform's native
character set. This means there is no change for ASCII platforms, but
is a change for EBCDIC ones.
Since we currently don't have any EBCDIC test platforms, I tested this
by faking it out to generate EBCDIC data, and then eye-balled the
results.
Code that didn't realize there was a potential difference between EBCDIC
and non-EBCDIC platforms will now start to work; code that tried to do
the right thing under these circumstances will no longer work. Fixing
that comes in later commits.
M lib/unicore/mktables
commit f72d8f8bfc645b689fd663f3efda5f54e67443ba
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 10:50:00 2013 -0700
Fix some EBCDIC problems
These spots have native code points, so should be using the macros for
native code points, instead of Unicode ones.
M regcomp.c
M sv.c
M toke.c
commit e8baa3c0a6071df9ab725b2c12199c55cbc53dee
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:10:19 2013 -0700
Remove unnecessary temp variable in converting to UTF-8
These areas of code included a temporary that is unnecessary.
M inline.h
M regcomp.c
M sv.c
commit 2eeb1517d94bbb3d7241094c6b128796e35e8158
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:00:55 2013 -0700
utf8.h: Correct macros for EBCDIC
These macros were incorrect for EBCDIC. The 3 step process given in
utfebcdic.h wasn't being followed.
M utf8.h
commit 810c1ca1390f5d04f4856ca0dc84594301611f77
Author: Karl Williamson <[email protected]>
Date: Sat Feb 9 21:23:30 2013 -0700
Extract common code to an inline function
This fairly short paradigm is repeated in several places; a later commit
will improve it.
M embed.fnc
M embed.h
M inline.h
M pp_pack.c
M proto.h
M sv.c
M toke.c
M utf8.c
commit 8d8340254a2038c51b1d5e09dbd2b414b9191873
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 21:35:57 2013 -0700
Don't use EBCDIC macro for a C language escape
C recognizes '\a' (for BEL); just use that instead of a look-up.
regen/unicode_constants.pl could be used to generate the character for
the ESC (set in surrounding code), but I didn't do that because of
potential bootstrapping problems when porting to an EBCDIC platform
without a working perl. (The other characters generated in that .pl are
less likely to cause problems when compiling perl.)
M regcomp.c
M toke.c
commit 9aee15f54af7ddbfe971e12aceeb6f28e4b14a4d
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 19:53:38 2013 -0700
Use byte domain EBCDIC/LATIN1 macro where appropriate
The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
whole Unicode range. In the locations affected by this commit, it is
known that the domain is limited to a single byte, so the simpler ones
whose names contain LATIN1 may be used.
On ASCII platforms, all the macros are null, so there is no effective
change.
M handy.h
M regcomp.c
M utf8.c
commit a1a885e7d3da2cfab60972843b607748ee01acc4
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 14:31:09 2013 -0700
Use new clearer named #defines
This converts several areas of code to use the more clearly named macros
introduced in a recent commit
M op.c
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 25a438e686c856dd3bba7f0cab81c3532d76148d
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 13:52:31 2013 -0700
utf8.h, utfebcdic.h: Create less confusing #defines
This commit creates macros whose names mean something to me, and I don't
find confusing. The older names are retained for backwards
compatibility. Future commits will fix bugs I introduced from
misunderstanding the meaning of the older names.
The older names are now #defined in terms of the newer ones, and moved
so that they are only defined once, valid for both ASCII and EBCDIC
platforms.
M utf8.h
M utfebcdic.h
commit 1b94f2c1729bf12d85ff030b1411ac59dc9c492e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 4 14:22:02 2013 -0700
pp_ctl.c: Use isCNTRL instead of hard-coded mask
This is clearer and portable to EBCDIC.
M pp_ctl.c
commit 9e9f0504418f929aeb4052ba9bb917abb8973ff5
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:51:05 2013 -0700
utf8.c: is_utf8_char_slow() should use native length
What is passed is the actual length of the native utf8 character. What
this was calculating was the length it would be if it were a Unicode
character, and then compares, apples to oranges.
M utf8.c
-----------------------------------------------------------------------
--
Perl5 Master Repository