In perl.git, the branch khw/ebcdic has been created
<http://perl5.git.perl.org/perl.git/commitdiff/0f14fab69dfb040d266ecc5ddceba1d4c9ddda54?hp=0000000000000000000000000000000000000000>
at 0f14fab69dfb040d266ecc5ddceba1d4c9ddda54 (commit)
- Log -----------------------------------------------------------------
commit 0f14fab69dfb040d266ecc5ddceba1d4c9ddda54
Author: Karl Williamson <[email protected]>
Date: Fri Mar 29 15:22:28 2013 -0600
XXX t/op/tiehandle.t: skip for now; deep recursion
M t/op/tiehandle.t
commit b8d324a8c1301af6b8826e3579165a03dcead51f
Author: Karl Williamson <[email protected]>
Date: Fri Mar 29 14:56:16 2013 -0600
XXX better commit msg utf8.c: Avoid unnecessary UTF-8 conversions
This changes the code so that converting to UTF-8 is avoided unless
necessary. For such inputs, the conversion back from UTF-8 is also
avoided. The cost of doing this is that the first swatches are combined
into one that contains the values for all characters 0-255, instead of
having multiple swatches. That means when first calculating the swatch
it calculates all 256, instead of 128 (160 on EBCDIC).
This also fixes an EBCDIC bug in which characters in this range were
being translated twice.
M utf8.c
commit da4ea2f8133af6807de1f16d9b66d26bd5679f69
Author: Karl Williamson <[email protected]>
Date: Fri Mar 29 13:34:59 2013 -0600
utf8.c: Check for UTF-8 malformations
This code, when UTF-8 warnings are off, allows malformed UTF-8. It
really shouldn't as a safety measure, even though I think malformations
are currently caught and rejected before it gets here.
M utf8.c
commit 901410a0716089df0ca78892d63e19a408ce2751
Author: Karl Williamson <[email protected]>
Date: Fri Mar 29 12:29:46 2013 -0600
handy.h: Remove docs for non-existent macro
In commit 3c3ecf18c35ad7832c6e454d304b30b2c0fef127, I mistakenly added
documentation for a non-existent macro. It turns out that only the
variants listed exist, and not the base. Since we are in code freeze,
the solution has to be not to add the base macro, but to delete the
documentation, or change it to refer to just the extant ones. In order
to not cause an entry that is anomalous to the others, I'm just getting
rid of the documentation for this release.
M handy.h
commit ab8bbf81c1dac1c51e2361dafbdbf30db34b2363
Author: Karl Williamson <[email protected]>
Date: Thu Mar 28 19:56:39 2013 -0600
utf8.c: Remove redundant assignment.
This variable is always set just below.
M utf8.c
commit 40403551e044b5bbe8aff2210582d48bb32a59db
Author: Karl Williamson <[email protected]>
Date: Thu Mar 28 17:19:16 2013 -0600
XXX enable _invlist_dump;
M embed.fnc
M embed.h
M proto.h
commit 78bc0c048cc6bec96a7ec98975d1da55d543a418
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 11:01:32 2013 -0700
XXX EBCDIC header files
M charclass_invlists.h
M l1_char_class_tab.h
M regcharclass.h
M unicode_constants.h
commit 0862a11a53531322a454b6b84a2c074ad448c85b
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 12:26:15 2013 -0600
hints/os390.sh: Suppress bogus compiler message
M hints/os390.sh
commit 004bb0cd02bba687180568d20ba5dfc4c35ea041
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 12:31:25 2013 -0700
XXX Temporary for z/OS long long support
M Configure
M hints/os390.sh
commit a0fca36fbca648f2ab359193fc2d0f34cfd09120
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 18:17:28 2013 -0600
Add test that to/from native character set works
For non-ASCII systems, there are character set translation tables. This
makes sure the two accessible ones are inverses of each other. If not,
nothing can be expected to work right.
M MANIFEST
A t/base/translate.t
commit 834849621f67a76746644903fd0ec2ee6bb06602
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 17:04:03 2013 -0600
XXX debugging info for UCD.t
M lib/Unicode/UCD.t
commit ee69a6b4656facb3cde1138038dea10baa6591f0
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 16:55:55 2013 -0600
lib/feature/bundle: Fix some things to pass under EBCDIC
M t/lib/feature/bundle
commit bb567052ecc1f5e8b2f51c677942e137b987678a
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 16:08:04 2013 -0600
XS-APItest/t/fetch_pad_names.t: Skip if EBCDIC
This could be ported, but there's a lot of stuff to convert; would need
a function to convert byte strings that form legal UTF-8 into legal
UTF-EBCDIC
M ext/XS-APItest/t/fetch_pad_names.t
commit c379ac0c6783d801996a05240ab26ec4667d8706
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 12:11:59 2013 -0600
XXX Temporary lib/charnames.t, comment out to see if gets further
M lib/charnames.t
commit 9fbc01650facda3201b8aa7df82426be639fa6a5
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 12:05:53 2013 -0600
XXX ext/XS-APItest/t/utf8.t: Fix so passes EBCDIC
This involves skipping much of the tests. Reexamine later
M ext/XS-APItest/t/utf8.t
commit 8b4a682b97255230b73504535722cd47b38b5876
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 11:27:06 2013 -0600
ext/re/t/re_funcs_u.t: Fix to work under EBCDIC
M ext/re/t/re_funcs_u.t
commit 5428bfc9018dbcb9b53ee5239d2bb4f2e5011ab2
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 11:11:22 2013 -0600
XXX dist/IO/t/io_utf8argv.t: Temporarily skip if EBCDIC
M dist/IO/t/io_utf8argv.t
commit 2f29dbb99b973c4e697a58adda7978135948f651
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 10:33:44 2013 -0600
t/op/print.t: Skip an EBCDIC test
This could be written (the values would probably change depending on the
code page), but the code that would get exercised is unlikely to vary
depending on character set.
M t/op/print.t
commit a132618dba9c2336a25774677380a53c66507f23
Author: Karl Williamson <[email protected]>
Date: Tue Mar 26 19:51:06 2013 -0600
XXX skip folding tests
M t/re/fold_grind.t
M t/re/reg_fold.t
M t/uni/fold.t
commit 2c70a7bf9e8ce8445e65fd85b6e1d7aabcac3026
Author: Karl Williamson <[email protected]>
Date: Tue Mar 26 15:44:59 2013 -0600
XXX t/TEST: Avoid SIGPIPEs
M t/TEST
commit e133e9687d9fcbccc3876930aa10ac2b50c538d8
Author: Karl Williamson <[email protected]>
Date: Tue Mar 26 15:49:08 2013 -0600
XXX Temporarily test normalization
M cpan/Unicode-Normalize/t/fcdc.t
M cpan/Unicode-Normalize/t/form.t
M cpan/Unicode-Normalize/t/func.t
M cpan/Unicode-Normalize/t/illegal.t
M cpan/Unicode-Normalize/t/norm.t
M cpan/Unicode-Normalize/t/null.t
M cpan/Unicode-Normalize/t/partial1.t
M cpan/Unicode-Normalize/t/partial2.t
M cpan/Unicode-Normalize/t/proto.t
M cpan/Unicode-Normalize/t/split.t
M cpan/Unicode-Normalize/t/test.t
M cpan/Unicode-Normalize/t/tie.t
commit c5cca075201ebecce5a5e232c50a3e85cd8ea880
Author: Karl Williamson <[email protected]>
Date: Tue Mar 26 14:06:50 2013 -0600
op/index.t: Fix tests for EBCDIC
Commit 8a38a836 erroneously translates literals into the native
encoding, causing a double translation, which is garbage.
M t/op/index.t
commit b50fea4d78506e2f40f9d32d0573ae1d91e2169a
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 20:43:38 2013 -0600
op/chop.t: Fix for EBCDIC
One test is skipped because the code point is not representable on
EBCDIC platforms. Another test is modified to work on EBCDIC.
M t/op/chop.t
commit 6388791ed4af89a842def56eb329c50b8d592b21
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 19:56:50 2013 -0600
t/op/lc.t: Fix to work under EBCDIC
This had code that attempted this, but it was wrong. The conversion to
EBCDIC must be done before the \U, or similar.
M t/op/lc.t
commit e194abb501fc3ff5f1e800f96d881f3d042b8a2d
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 16:37:04 2013 -0600
XXX fix this later based on comment
M dist/IO/t/io_utf8argv.t
commit 3e859d0b9ea181d332fe61c175db235fb9e50637
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 15:33:55 2013 -0600
Skip some tests under EBCDIC
EBCDIC won't work on these because of inherent differences from ASCII
M t/porting/customized.t
M t/porting/manifest.t
commit 817d1631a076fb8a28f5e24060d8b8fe261bb222
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 15:04:14 2013 -0600
porting/bincompat.t: Skip under EBCDIC
because the sorting order is different
M t/porting/bincompat.t
commit 0ab6a93e9bd3ab322fd9b8881bc6df68846f484f
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 14:59:50 2013 -0600
t/re/regex_sets.t: So will pass under EBCDIC
M t/re/regex_sets.t
commit e8a1aa715acd5877e6242135b143f2a90843dab0
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 14:59:26 2013 -0600
t/porting/bincompat.t: Typo in comment
M t/porting/bincompat.t
commit 9558e3e6d0ba1e9d563c9cedee8049cb029bc634
Author: Karl Williamson <[email protected]>
Date: Mon Mar 25 13:09:09 2013 -0600
XXX fix \x{too large}
M dist/IO/IO.xs
M doop.c
M inline.h
M pp.c
M pp_pack.c
M regcomp.c
M sv.c
M toke.c
M utf8.c
M utf8.h
commit be43339f20eca36fc580d20a1a9e168420df7e46
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 17:59:59 2013 -0600
mktables: Fix typo in comment
This used a real CTRL-X, instead of $^X
M lib/unicore/mktables
commit 43022b62f6c2963cbd7441ca8d4ef3631642874c
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:16:08 2013 -0600
utf8.c: Fix so UTF-16 to UTF-8 conversion works under EBCDIC
M utf8.c
commit 4d8aa980900ad3acf9c315aa0104c6114aa9a2db
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:14:34 2013 -0600
utf8.h, utfebcdic.h: Add #define
M utf8.h
M utfebcdic.h
commit 36645248627e16adb4da315620e55e01a27fe758
Author: Karl Williamson <[email protected]>
Date: Sun Mar 24 13:11:25 2013 -0600
utf8.c: Use mnemonics instead of hex numbers
M utf8.c
commit 04a508b9fb844ea97961a96baa71e5a62cba8901
Author: Karl Williamson <[email protected]>
Date: Wed Mar 20 22:15:58 2013 -0600
lib/Unicode/UCD.t: Allow to run under EBCDIC,
M lib/Unicode/UCD.t
commit fc6b029fadae1890832ad9f76b87ad80c0631ca9
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 15:27:31 2013 -0600
t/op/quotemeta.t: EBCDIC fixes
M t/op/quotemeta.t
commit f46f080d6137a8d83d39a68623a850f1297709a4
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:32:55 2013 -0600
t/re/fold_grind.t: Fixes for EBCDIC
M t/re/fold_grind.t
commit 5e4280a56d78ef1c952dbb736bf685b9500931c2
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:21:09 2013 -0600
t/lib/charnames/alias: Fix some EBCDIC problems
M t/lib/charnames/alias
commit cb2b553b2390a483795bfab391e13935028b27b3
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:20:24 2013 -0600
t/uni/class.t: Make work on EBCDIC
M t/uni/class.t
commit 8444560b7c029e2a5ceb11fcd0d481ecc68d7211
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 11:01:57 2013 -0600
feature/unicode_strings.t: Fix to work on EBCDIC
M lib/feature/unicode_strings.t
commit 55d9ff359792ec4a7a42388f21fb54c6d3514f8f
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:12:30 2013 -0600
regen/regcharclass.pl: make more EBCDIC friendly
One of the possible inputs to this process is a string. This clarifies
that it must be specified in Unicode characters, and adds code to
translate it to native, if necessary.
M regen/regcharclass.pl
commit f7c2b3ec3762a9719fc6ebbb33b4f0b33c58e8e4
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:10:46 2013 -0600
XXX regen/regcharclass.pl: maybe temp comment out utf8_char
M regen/regcharclass.pl
commit 31edcbbf395283f9c3c2e482f14b903782926f62
Author: Karl Williamson <[email protected]>
Date: Tue Mar 19 10:09:53 2013 -0600
XXX temporary comment out multi-char folds
M regcomp.c
M regen/regcharclass.pl
M regexec.c
M t/re/fold_grind.t
M t/re/reg_fold.t
commit 3092372de120c344fdf42e5a2646f508369e6808
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 22:00:29 2013 -0600
XXX temp skip perl5db.t
M lib/perl5db.t
commit d2284354e76864e37379cd892f901112f61dcc9f
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:45:06 2013 -0600
pp.c: White-space only
Make a ternary operation more clear
M pp.c
commit a8e366a73d1bdf1ef519e5e61e529ef1378066aa
Author: Karl Williamson <[email protected]>
Date: Mon Mar 18 11:43:42 2013 -0600
Fix valid_utf8_to_uvchr() for EBCDIC
M utf8.c
commit ddaf047fd3a48f36c35d5b070c6053fe80bb1a09
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 21:42:20 2013 -0600
t/test.pl: Add comment about EBCDIC
M t/test.pl
commit 128f5fa468d78b30531d24a86dc59fd82117a482
Author: Karl Williamson <[email protected]>
Date: Sun Mar 17 17:39:33 2013 -0600
XXX makedepend.SH: Why does 255 work and 250 not?
M makedepend.SH
commit 8b306e5f62cc424f1008bf496e276ed3b814612b
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:48:22 2013 -0600
XXX regen/mk_PL_charclass.pl: Make EBCDIC friendly
need more of a commit message
M regen/mk_PL_charclass.pl
commit 379034bc38fcb9ad5e05907dc6c2cb79d751233e
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:44:44 2013 -0600
XXX make various things more EBCDIC friendly
Adds trailing white space errors
Need to know what to do about ^A meaning 0x1, and M-foo meaning meta
M lib/DB.pm
M lib/dumpvar.pl
M lib/perl5db.pl
M lib/sigtrap.pm
commit 58e71200f0f43ef8e5765565299c13e4f7481e6a
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:57 2013 -0600
XXX charnames.t: Make more EBCDIC friendly
Why need utf8::unicode_to_native
M lib/charnames.t
commit e69cf8618c591c1ea140e35801665fdd281962f6
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 22:41:15 2013 -0600
XXX: Fixup commit message.
Fix UTF8_ACUUMULATE, utf8.c
M utf8.c
M utf8.h
commit 2a4bec48efa0fd0c5dd165eb0f2a76ffd3e32cf9
Author: Karl Williamson <[email protected]>
Date: Sat Mar 16 16:52:45 2013 -0600
regcomp.c: Fix bug in EBCDIC
The POSIXA and NPOSIXA regnodes need to set the bits on only the ASCII
code points, but under EBCDIC those code points are 0-127.
M regcomp.c
commit 0b47f0a7c1379cf897591c41f223ffc2624ff51e
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:57:24 2013 -0600
re/charset.t: Allow to work on EBCDIC
This just converts the hard-coded character numbers to native, so will
work on any platform.
M t/re/charset.t
commit 6f07f345656e6358ceed8d3b79df2c875f10f208
Author: Karl Williamson <[email protected]>
Date: Fri Mar 15 11:50:35 2013 -0600
XS-APItest/t/handy.t: Change output message
On EBCDIC platforms, the output is not in terms of \N{U+}; change text
to \x{ }
M ext/XS-APItest/t/handy.t
commit 984a3fcca05678e7c6f90462fb7119f09fa5bded
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 21:44:16 2013 -0600
XXX Dumper.xs: Don't know why this stopped compiling
M dist/Data-Dumper/Dumper.xs
commit 5c08c2035985f45a417c8a171d5025a2a3523ec8
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:22:28 2013 -0600
toke.c: Fix an ASCII-platform dependency
M toke.c
commit 33921d0bc3aabec1231c08b8fe925034056d0ecb
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:20:23 2013 -0600
toke.c: Simplify some code
We don't have to test separately for lower vs uppercase here, as
upper/lower case A-Z and a-z are not intermixed in the gaps in A-Z and
a-z under EBCDIC.
M toke.c
commit b1cabf98c21c26f4439c56362eb39e1185b63b9e
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:18:12 2013 -0600
genpacksizetables.pl: Correct comment typo
M genpacksizetables.pl
commit 2723ffe9c5cac536677e234fb4bfc2212c22ffe2
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:17:39 2013 -0600
APItest/t/handy.t: Make EBCDIC-friendly
M ext/XS-APItest/t/handy.t
commit a0d3c44ece8732413248bf46c95ee40c9b264ef9
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:16:14 2013 -0600
Data-Dumper: Make EBCDIC-friendly
M dist/Data-Dumper/Dumper.xs
commit d4679a99862cfc2e9b1f9c66f7e8e492ef8e784f
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:14:31 2013 -0600
sv.c: Make less ASCII-centric
M sv.c
commit f5230f9fccbd150d05c3fc98066186bb38517fa7
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:07:52 2013 -0600
lib/charnames.t: Make some tests work under EBCDIC
M lib/charnames.t
commit e49a139738f895cfcc428bc9a025b25f9894c906
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:05:46 2013 -0600
dump.c: Make less ASCII-centric:
This has the added advantage of being clearer as to what is going on.
M dump.c
commit f393ca83e85de0e5250d0bb53c382c74dbeab70f
Author: Karl Williamson <[email protected]>
Date: Wed Mar 13 16:02:52 2013 -0600
hv.c: Stop being ASCII-centric
This uses macros which work cross-platform. This has the added advantge
that it is much clearer what is going on.
M hv.c
commit a8239dddf6a91d316d4275aa916a642394aea2f2
Author: Karl Williamson <[email protected]>
Date: Tue Mar 12 22:34:17 2013 -0600
t/TEST: Don't bail if fails in t/base unless minitest
In order to completely compile Perl, many modules must have been parsed
and compiled, so if there is a full perl, we know that things basically
work. The purpose of bailing out is that if these supposedly very base
level functionality tests don't work, there's no point in continuing.
But over the years, tests of more esoteric functionality have been
added here, and if one of them doesn't work, it still could be that Perl
pretty much does work.
I believe it would be best to move such non-basic tests elsewhere, but
that's work, and hasn't bitten us much so far; this change lessens the
severity of the biting even more. Where it will really bite is if
things are so bad that a full perl binary can't be compiled, and we are
trying to figure out why using minitest.
M t/TEST
commit 984b90db55c76bce7e839b24a0ddc0ab62c48397
Author: Karl Williamson <[email protected]>
Date: Mon Mar 11 15:11:10 2013 -0600
Added Porting/reorder_charclass_invlists.pl
This program is used too bootstrap perl onto a non-ASCII platform with
no pre-existing perl.
M MANIFEST
A Porting/reorder_charclass_invlists.pl
commit 70a8b8fefbd2409125a0d63b26a049d6bc750545
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 22:17:31 2013 -0600
t/base/lex.t: Use char suitable for both ASCII and EBCDIC
\xE2 is 'S' in EBCDIC, and so is going to be legal. \xDF is an alpha
which has no ASCII equivalent in either character set
M t/base/lex.t
commit a32ae670ae9479f2038557791df8d0f0fbc43b0b
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 13:11:07 2013 -0600
XXX Temporary comment out ParseXS check
this is to get things to compile for now
M dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm
commit edf9bbc84d35c4efc2dae847d68dc38c1e73defe
Author: Karl Williamson <[email protected]>
Date: Sun Mar 10 11:34:10 2013 -0600
XXX Collate, Normalize: Allow to compile under EBCDIC
M cpan/Unicode-Collate/Collate.pm
M cpan/Unicode-Collate/mkheader
M cpan/Unicode-Normalize/Normalize.pm
M cpan/Unicode-Normalize/mkheader
commit 668186f1d46bc65ff2476f8722b5c36cdc86bdb2
Author: Karl Williamson <[email protected]>
Date: Sat Mar 9 21:57:38 2013 -0700
XXX dquote_static.c: Silence wrong warning on EBCDIC
Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
which should be a separate commit anyway.
This silences an inappropriate warning that doesn't happen on ASCII
platforms. CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms. But
0x14 is a C1 control on EBCDIC, a C0 on ASCII. Therefore the test that
it's a control should include both C0 and C1, which isCNTRL_L1() does.
Also has a white-space change, outdenting a line so it doesn't wrap in
an 80 column window.
M dquote_static.c
commit c7e47b692f68d5b6badd92580eb78ec0bdf0f89c
Author: Karl Williamson <[email protected]>
Date: Thu Mar 7 12:08:41 2013 -0700
utfebcdic.h: Change 'unsigned char' to U8
This is for consistency with the rest of Perl
M utfebcdic.h
commit 4601ae3b0aacfa307503cf89fd406db427dd0ad9
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 08:11:38 2013 -0700
regen/regcharclass.pl: Make more EBCDIC-friendly
This commit changes the code generated by the macros so that they work
right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs. THEY
ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
onto the target platform, and regcharclass.pl can be run there,
generating macros correct UTF-8.
M regcharclass.h
M regen/regcharclass.pl
commit 1e193bb3ada764921b674ad4ba3aae3de3baf041
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:30:01 2013 -0700
utfebcdic.h: Add (UV) cast
The operand of this macro is implicitly a UV. Make sure that it is.
M utfebcdic.h
commit 8509ada089a1d417ae1027f18e9bc702d3153cb5
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 17:04:58 2013 -0700
handy.h: Allow bootstrapping to non-ASCII platform
This adds a bunch of macros and moves things around to support
conditional compilation when Configure is called with
-DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are
table-driven to not be used, since the table may not be valid when
bringing Perl up for the first time on a non-ASCII platform.
This allows it to compile using the platform's native C library ctype
functions, which should work enough to compile miniperl, and allow the
table to be changed to be valid. Then Configure can be re-run to not
bootstrap, and normal compilation can proceed
M handy.h
M inline.h
commit ad2eb79f2d050cd37c87971e451e3e26f100665f
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:43:26 2013 -0700
gv.c: Remove EBCDIC dependency
M gv.c
commit fbb52c2eee4b53e0cfb15cdbc72ef61b9b497508
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:00:47 2013 -0700
toke.c: Remove EBCDIC dependency
M toke.c
commit 67e12f7ccdeee161338aa0ad97c07911f8c31f04
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:14:25 2013 -0700
toke.c: Remove character set dependency
Instead of hard-coding the bit patterns that comprise the Byte Order
Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
the current platform.
This removes some EBCDIC-only code.
M toke.c
commit 9052cd2235fbb85e062b594597ec755ba7326a2e
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:10:27 2013 -0700
unicode_constants.h: Add #defines for Byte Order Mark
These will be used in future commits
M regen/unicode_constants.pl
M unicode_constants.h
commit 2afec14622bd157c3bcdf56e8b3fe7a07d0c7761
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 15:04:18 2013 -0700
XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
This macro may not be present, and is currently used exclusively in
IS_UTF8_CHAR, which itself may be undefined, and code should cope with
that. This is a work-around until a better solution is found.
M utf8.c
M utf8.h
commit 7ef67c39eb46d2aa84940354a66cbeab1b26c65f
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 14:09:04 2013 -0700
Add Porting tool for help with non-ASCII platforms
Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
non-ASCII platform with no working Perl.
M MANIFEST
A Porting/reorder_l1_char_class_tab.pl
M regen/mk_PL_charclass.pl
commit d8fa10df92f5848ebf0175f0afe6aad54657f3fd
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 13:06:58 2013 -0700
inline.h: Reorder functions
The comment implied that the functions below it in the file were
deprecated, but in fact only the next two functions were. This
clarifies that and moves them so they are the final ones in the file
M inline.h
commit 1dbeb6aa3698b8bd51ad10f27781d48f6f8e6b33
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:33:42 2013 -0700
utfebcdic.h: Add comment
M utfebcdic.h
commit 82015792433f268f4fcd0e35d38bc1c64d3bbeb5
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:12:11 2013 -0700
utf8.h: Clean up START_MARK definition and use
The previous definition broke good encapsulation rules. UTF_START_MARK
should return something that fits in a byte; it shouldn't be the caller
that does this. So the mask is moved into the definition. This means
it can apply only to the portion that creates something larger than a
byte. Further, the EBCDIC version can be simplified, since 7 is the
largest possible number of bytes in an EBCDIC UTF8 character.
M utf8.h
M utfebcdic.h
commit b898e378fb56ca4f61cc470fe8c5f4f59448028d
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:05:26 2013 -0700
utf8.h: Move #includes
These two files were only being #included for non-ebcdic compiles; they
should be included always.
M utf8.h
commit 096da7e9e6e4275f990e8d8ed3c72d418c3f1b57
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 11:49:14 2013 -0700
utfebcdic.h: Remove extra parameter expansions
These two macros were improperly expanding the parameters as well as
defining the operation, leading to compile errors.
M utfebcdic.h
commit 454b080ec835ef6b312aa58f42528d5a66423faf
Author: Karl Williamson <[email protected]>
Date: Fri Mar 1 08:28:52 2013 -0700
utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
UTF8_TWO_BYTE_LO. But the EIGHT_BIT versions can use the less general
and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
domain is restricted in the EIGHT_BIT. Note that on ASCII platforms,
these both expand to the same thing, so the difference matters only on
EBCDIC.
M utf8.h
commit fc858af969844f6ec50e4713e84239c63e5095b1
Author: Karl Williamson <[email protected]>
Date: Thu Feb 28 09:25:27 2013 -0700
XXX temp: show makedepend cerr
M makedepend.SH
commit 1aae1e484a1fb66adf3a3c92b4b3fd7f2fad0ff8
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 21:59:11 2013 -0700
makedepend.SH: Split too long lines; properly join
I had thought that a continuation introduced a space. But no,
a continuation can happen in the middle of a token.
And this splits lines that are getting very long to avoid preprocessor
limitations.
M makedepend.SH
commit ed0dde6c49e6fdadd70b50e94cff7946cd6875d8
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 15:51:28 2013 -0700
makedepend.SH: White-space only
Align continuation backslashes
M makedepend.SH
commit 462c6f5868bbf09c2be670c318fbac196a69db04
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:39:28 2013 -0700
makedepend.SH: Remove some unnecessary white space
Multi-line preprocessor directives are now joined into single lines.
This can create lines too long for the preprocessor to handle. This
commit removes blanks adjoining comments that get deleted. This makes
things somewhat less likely to exceed the limit.
This commit also fixes several [] which were meant to each match a tab
or a blank, but editors converted the tabs to blanks
M makedepend.SH
commit 1cb42550dfc94132a5cdcc0d0e46681cb41a3e05
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:30:51 2013 -0700
makedepend.SH: Retain '/**/' comments
These comments may actually be necessary.
M makedepend.SH
commit 65d58f0b62bc661c77779a9364263e64b4085f63
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 08:38:19 2013 -0700
handy.h: Remove extraneous parens
M handy.h
commit 464c087a4e2b30ab78faf56d0f102467bd8598ee
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 13:06:07 2013 -0500
Disable gcc-style function attributes on z/OS.
John Goodyear <[email protected]> reports that the z/OS C compiler
supports the attribute keyword, but not exactly the same as gcc.
Instead of a "warning", the compiler emits an "INFORMATIONAL" message
that Configure fails to detect. Until Configure is fixed, just disable
the attributes altogether.
John Goodyear
M hints/os390.sh
commit b27cddefcadf1cd45bea397c3a7bd7e12b9b95f7
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 09:12:13 2013 -0500
Change os390 custom cppstdin script to use fgrep.
Grep appears to be limited to 2048 characters, and truncates
the output for cppstin. Fgrep apparently doesn't have that limit.
Thanks to John Goodyear <[email protected]> for reporting this.
M hints/os390.sh
commit 13f61e587e136bb86c8fa3fb353c7dc64f17a2f7
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:45:19 2013 -0700
utf8.c: Use more clearly named macro
In the case of invariants these two macros should do the same thing,
but it seems to me that the latter name more clearly indicates what is
going on.
M utf8.c
commit 51a619d3e2a8e8ca149855f243617e1977d3c067
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:35:12 2013 -0700
Add macro OFFUNISKIP
This means use official Unicode code point numbering, not native. Doing
this converts the existing UNISKIP calls in the code to refer to native
code points, which is what they meant anyway. The terminology is
somewhat ambiguous, but I don't think will cause real confusion.
NATIVESKIP is also introduced for situations where it is important to be
precise.
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit bfa7215f3a3366c9a431a2dc797c56ddc9bf0086
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:22:19 2013 -0700
toke.c: white space only
M toke.c
commit 798704d334774120b71dd3505b8aecf93cc50341
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 12:08:50 2013 -0700
utf8.c: Deprecate two functions
This is to force any code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these functions
should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation.
M embed.fnc
M embed.h
M mathoms.c
M proto.h
M toke.c
M utf8.c
M utf8.h
commit 507817d9a37a2d2e3e555fbbb1e723f3b6de4f44
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:26:09 2013 -0700
Deprecate uvuni_to_utf8()
Code should almost never be dealing with non-native code points
M embed.fnc
M embed.h
M proto.h
M toke.c
M utf8.c
M utf8.h
commit a7ae8c101ef60dc6262aa312d9d03c0184465b78
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:02:33 2013 -0700
Deprecate utf8_to_uni_buf()
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
M embed.fnc
M proto.h
M utf8.c
commit 0d1c2766f6d079e4ca48a8ee6e3454489ec0e5cd
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 09:00:18 2013 -0700
makedepend.SH: Comment out unnecessary code
This causes problems currently for z/OS. But, since we don't know why
it was there, I'm leaving it in as a placeholder.
M makedepend.SH
commit 7f2dbefedc23e0c62a955f8b55af4be7ea76ed39
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:26:44 2013 -0700
Deprecate valid_utf8_to_uvuni()
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
M embed.fnc
M proto.h
M utf8.c
commit 0a846fcd5350a008e83c9512e070cd1acfeaf5d3
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:14:26 2013 -0700
utf8.c: Swap which fcn wraps the other
This is in preparation for the current wrapee becoming deprecated
M embed.fnc
M embed.h
M proto.h
M utf8.c
M utf8.h
commit 47d97ebd4609b89042320e0f294d1b1e4f5b3695
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:29:34 2013 -0700
utf8.c: Skip a no-op
Since the value is invariant under both UTF-8 and not, we already have
it in 'uv'; no need to do anything else to get it
M utf8.c
commit 1be08bcf66d9f35a1969bd276a0fde302c1d0cb1
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:26:50 2013 -0700
utf8.c: Move comment to where makes more sense
M utf8.c
commit cc3f2078b8eb921a3a7ef48416381053b551cad2
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:30:10 2013 -0700
APItest: Test native code points, instead of Unicode
M ext/XS-APItest/APItest.pm
M ext/XS-APItest/APItest.xs
M ext/XS-APItest/t/utf8.t
commit af95e36608ad22c7b7573b957c438558b4c5337f
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:25:08 2013 -0700
XXX CPAN Normalize
This converts Unicode::Normalize to use the native tables that are used
by Perl starting in XXX, while using the Unicode-ordered ones that were
used before then.
Another alternative would be to have mktables generate just these tables
in Unicode ordering.
M cpan/Unicode-Normalize/Normalize.xs
commit fd33971b61d5027ac9f058cc71fb6836ac70f92b
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:22:55 2013 -0700
XXX CPAN prob wrong Collate
This changes to implicity usenative code points. This is likely wrong,
as the module comes with its own data, that are probably in terms of
Unicode
M cpan/Unicode-Collate/Collate.xs
commit ade8bf815fbe0ac178faf801d791870b1c46b5f6
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:12:53 2013 -0700
XXX CPAN Encode.xs
Use core function if available. This will insulate this code from any
future changes.
M cpan/Encode/Encode.xs
commit 6383bc4a637ace3b0eac4d021ce27a6cc3861524
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:04:24 2013 -0700
XXX CPAN and unsure Encode
M cpan/Encode/Encode.xs
M cpan/Encode/Unicode/Unicode.xs
commit 276fa6119a9dd67d1b1d7288c71f0125bfc58a9e
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:00:47 2013 -0700
XXX CPAN Encode.xs: fix indent
M cpan/Encode/Encode.xs
commit 3fc3246859faed7654248211008d03cdcdf4952c
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 17:23:15 2013 -0700
Don't refer to U+XXXX when mean native
These messages say the output number is Unicode, but it is really
native, so change to saying is 0xXXXX.
M regen/regcharclass_multi_char_folds.pl
M regexec.c
commit eb3a5a33866a258272210b7814879dd3e4768c5d
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:43:59 2013 -0700
Convert some uvuni() to uvchr()
All the tables are now based on the native character set, so using
uvuni() in almost all cases is wrong.
M cygwin/cygwin.c
M doop.c
M op.c
M pp_pack.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
commit 526c978f62f4ba550086aee94ec4c0387b205204
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:25:47 2013 -0700
handy.h: White space only
M handy.h
commit 7775892d06898681dfae38be43a21f3991644b56
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:19:49 2013 -0700
t/test.pl: Allow native/latin1 string conversions to work on utf8.
These functions no longer have the hard-coded definitions in them,
but now end up resolving to internal functions, so that new encodings
could be added and these would automatically understand them.
Instead of using tr///, these now go character by character and
converting to/from ord, which is slower, but allows them to operate on
utf8 strings.
Peephole optimization should make these essentially no-ops on ascii
platforms.
M t/test.pl
commit d911c7ead752fdffadfb447315794e812ec199d7
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:05:55 2013 -0700
t/test.pl: Simplify ord to/from native fcns
This commit changes these functions from converting to/from a string to
calling utf8:: functions which operate on ordinals instead.
M t/test.pl
commit fb272f14f750ae07557e0025a4c0bb415da41800
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:35:38 2013 -0700
Make casing tables native
These are final tables that haven't been converted to native character
set casing.
M perl.h
M utfebcdic.h
commit 592b52e5d292e78c78057596c7a54e6fd58c1142
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:32:30 2013 -0700
utfebcdic.h: Remove trailing spaces
M utfebcdic.h
commit 322e98bf262240cbc859682260712336a2f66d6b
Author: Karl Williamson <[email protected]>
Date: Fri Feb 22 18:55:26 2013 -0700
EBCDIC has the unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
A previous commit for 5.18 changed the docs about this issue. This
current commit forces ASCII rules on EBCDIC platforms (even should there
be one that natively uses all 256). To get all 256, the same things
like 'use feature "unicode_strings"' must now be done.
M handy.h
commit d817ead12edbc33a0da523540e5f0aab184a95ae
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:47:52 2013 -0700
handy.h: Solve a failure to compile problem under EBCDIC
handy.h is included in files that don't include perl.h, and hence not
utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion
macros being available to us. The best way to cope is to use the native
ctype functions. Most, but not all, of the macros in this commit
currently resolve to use those native ones, but a future commit will
change that.
M handy.h
commit f09ffbc0ad55f7066951b50ddcd7140844f4ba43
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:35:12 2013 -0700
handy.h: Simplify some macro definitions
Now, only one of the macros relies on magic numbers (isPRINT), leading
to clearer definitions.
M handy.h
commit b7712ca8b82d991629ce57890d728f365ca9d950
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:26:49 2013 -0700
handy.h: Combine macros that are same in ASCII, EBCDIC
These 4 macros can have the same RHS for their ASCII and EBCDIC
versions, so no need to duplicate their definitions
This also enables the EBCDIC versions to not have undefined expansions
when compiling without perl.h
M handy.h
commit a824e75d3c2d7f4154b79d281da6303f68680cf0
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:39:48 2013 -0700
Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
M embed.fnc
M mathoms.c
M proto.h
M toke.c
M utf8.h
M utfebcdic.h
commit a728db65b084963b99b1ea2e7f13bbf0bc1c7604
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:26:43 2013 -0700
Remove remaining calls of NATIVE_TO_NEED
These calls are just copying the input to the output byte by byte.
There is no need to worry about UTF-8 or not, as the output is just an
exact copy of the input
M toke.c
commit d2a22cc04dbba77c4f42d065b9ebc536ba94c86d
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:12:15 2013 -0700
toke.c: Remove some NATIVE_TO_NEED calls
I believe NATIVE_TO_NEED is defective, and will remove it in a future
commit. But, just in case I'm wrong, I'm doing it in small steps so
bisects will show the culprit. This removes the calls to it where the
parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
result can't be other than just the parameter.
M toke.c
commit 5f1ca3128671c7749c1639db2e856df96214dc55
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:22:07 2013 -0700
toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
This code is attempting to deal with the problem of holes in the ranges
a-z and A-Z in EBCDIC. Prior to this patch, it accepeted things like A
WITH GRAVE, etc, which shouldn't have the special processing to deal
with the holes
M toke.c
commit 4923944e1966e29409ad3dce7c5a8071962e2a59
Author: Karl Williamson <[email protected]>
Date: Tue Feb 19 15:13:19 2013 -0700
Use real illegal UTF-8 byte
The code here was wrong in assuming that \xFF is not legal in UTF-8
encoded strings. It currently doesn't work due to a bug, but that may
eventually be fixed: [perl #116867]. The comments are also wrong that
all bytes are legal in UTF-EBCDIC.
It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
(C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
of an illegal overlong sequence.
This creates a #define for an illegal byte using one of the real illegal
ones, and changes the code to use that.
No test is included due to #116867.
M op.c
M toke.c
M utf8.h
commit 2491a0ea85adc0e4ca3f939bac0cd834c7853ad0
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 14:00:13 2013 -0700
toke.c: Don't remap \N{} for EBCDIC
Everything is now in native,
M toke.c
commit 3cfb3ce9ef337bc89446fcdd8d63e965758941c2
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:50:45 2013 -0700
toke.c: Remove remapping for EBCDIC for octal
The code prior to this commit converted something like \04 into its
EBCDIC equivalent only in double-quoted strings. This was not done in
patterns, and so gave inconsistent results. The correct thing to do
should be to do the native thing, what someone who works on a platform
would think \04 do. Platform independent characters are available
through \N{}, either by name or by U+.
The comment changed by this was wrong, as in some cases it was native,
and in some cases Unicode.
M toke.c
commit a53a5a69a5d28930f930676a0cf59250a9f570cd
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:47:13 2013 -0700
Remove EBCDIC remappings
Now that the tables are stored in native format, we shouldn't be doing
remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; not all of this has been done yet.
M handy.h
M perly.c
M pp.c
M regcomp.c
M regexec.c
M utf8.c
commit 216e441d4814ee9d0ceece2ce6633f714d98b9a1
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 12:46:05 2013 -0700
Add and use macro to return EBCDIC
The conversion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
M handy.h
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
M utf8.h
commit 062844006bb6c54fbe32214e212f450033297a79
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 09:18:06 2013 -0700
charnames: fix nit in comment
M lib/_charnames.pm
commit 897d888ebbc20ce74a7941ea15104589a6e32c27
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 11:05:44 2013 -0700
charnames: Make work in EBCDIC
Now that mktables generates native tables, the only thing that was
needed was to make U+ mean Unicode instead of native.
M lib/_charnames.pm
M lib/charnames.pm
commit 0a8b9b6f253635b76dc2f0fff6b410e8ce861c5f
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 09:35:56 2013 -0700
Unicode::UCD: Work on non-ASCII platforms
Now that mktables generates native tables, it is a fairly simple matter
to get Unicode::UCD to work on those platforms.
M lib/Unicode/UCD.pm
commit 465eb0a77b8c0797961f2323ea7b290a8099fb36
Author: Karl Williamson <[email protected]>
Date: Wed Mar 27 17:01:24 2013 -0600
Unicode::UCD: Typo in comment
M lib/Unicode/UCD.pm
commit 5d143416e49edd3ea19a4ef60ce9d9b00f412ae0
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 22:16:38 2013 -0700
mktables: Generate native code-point tables
The output tables for mktables are now in the platform's native
character set. This means there is no change for ASCII platforms, but
is a change for EBCDIC ones.
Since we currently don't have any EBCDIC test platforms, I tested this
by faking it out to generate EBCDIC data, and then eye-balled the
results.
Code that didn't realize there was a potential difference between EBCDIC
and non-EBCDIC platforms will now start to work; code that tried to do
the right thing under these circumstances will no longer work. Fixing
that comes in later commits.
M lib/unicore/mktables
commit 6cc8923be9f0ee4aa0b9c4a7f519c88ba79ec20b
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 10:50:00 2013 -0700
Fix some EBCDIC problems
These spots have native code points, so should be using the macros for
native code points, instead of Unicode ones.
M regcomp.c
M sv.c
M toke.c
commit c0b6d5bbd979c79d115894bd9a7427f8751a75a8
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:10:19 2013 -0700
Remove unnecessary temp variable in converting to UTF-8
These areas of code included a temporary that is unnecessary.
M inline.h
M regcomp.c
M sv.c
commit 3775e8da8514da0e9c02e787a80f0fdf948ec09d
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:00:55 2013 -0700
utf8.h: Correct macros for EBCDIC
These macros were incorrect for EBCDIC. The 3 step process given in
utfebcdic.h wasn't being followed.
M utf8.h
commit bb39733fdf9ddb445ff6c69881b2f430f5de8f4c
Author: Karl Williamson <[email protected]>
Date: Sat Feb 9 21:23:30 2013 -0700
Extract common code to an inline function
This fairly short paradigm is repeated in several places; a later commit
will improve it.
M embed.fnc
M embed.h
M inline.h
M pp_pack.c
M proto.h
M sv.c
M toke.c
M utf8.c
commit b257911234179f47a717d1e68d831a00318a00c1
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 21:35:57 2013 -0700
Don't use EBCDIC macro for a C language escape
C recognizes '\a' (for BEL); just use that instead of a look-up.
regen/unicode_constants.pl could be used to generate the character for
the ESC (set in surrounding code), but I didn't do that because of
potential bootstrapping problems when porting to an EBCDIC platform
without a working perl. (The other characters generated in that .pl are
less likely to cause problems when compiling perl.)
M regcomp.c
M toke.c
commit 481e3d0c76e96e2025f6d97807cd543f047d85b6
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 19:53:38 2013 -0700
Use byte domain EBCDIC/LATIN1 macro where appropriate
The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
whole Unicode range. In the locations affected by this commit, it is
known that the domain is limited to a single byte, so the simpler ones
whose names contain LATIN1 may be used.
On ASCII platforms, all the macros are null, so there is no effective
change.
M handy.h
M regcomp.c
M utf8.c
commit 5fa2790ec793b6af8ac792e7c4ed2d0d8a2ee304
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 14:31:09 2013 -0700
Use new clearer named #defines
This converts several areas of code to use the more clearly named macros
introduced in a recent commit
M op.c
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 6e676870e3f112dcc8d4bd9a2492950959bb63ad
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 13:52:31 2013 -0700
utf8.h, utfebcdic.h: Create less confusing #defines
This commit creates macros whose names mean something to me, and I don't
find confusing. The older names are retained for backwards
compatibility. Future commits will fix bugs I introduced from
misunderstanding the meaning of the older names.
The older names are now #defined in terms of the newer ones, and moved
so that they are only defined once, valid for both ASCII and EBCDIC
platforms.
M utf8.h
M utfebcdic.h
commit c0a3903ba69c222dce050fbbb226903e6929e4b4
Author: Karl Williamson <[email protected]>
Date: Mon Feb 4 14:22:02 2013 -0700
pp_ctl.c: Use isCNTRL instead of hard-coded mask
This is clearer and portable to EBCDIC.
M pp_ctl.c
commit 01b08064a2855b72f9dce1435d1d5c4ef943feaa
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:51:05 2013 -0700
utf8.c: is_utf8_char_slow() should use native length
What is passed is the actual length of the native utf8 character. What
this was calculating was the length it would be if it were a Unicode
character, and then compares, apples to oranges.
M utf8.c
-----------------------------------------------------------------------
--
Perl5 Master Repository