In perl.git, the branch khw/ebcdic has been created
<http://perl5.git.perl.org/perl.git/commitdiff/b7bbfb310ac62a1a0c752915e996d3ba744f73bb?hp=0000000000000000000000000000000000000000>
at b7bbfb310ac62a1a0c752915e996d3ba744f73bb (commit)
- Log -----------------------------------------------------------------
commit b7bbfb310ac62a1a0c752915e996d3ba744f73bb
Author: Karl Williamson <[email protected]>
Date: Wed Aug 26 17:06:35 2015 -0600
Revert "Revert "XXX Run Unicode's official normalization tests""
M MANIFEST
M Makefile.SH
M charclass_invlists.h
A lib/Unicode/testnorm.t
A lib/unicore/NormTest.txt
M regcharclass.h
commit 720be04e09d5fa2ec8be7c90051e7f9e85d8a6ae
Author: Karl Williamson <[email protected]>
Date: Wed Aug 26 15:40:17 2015 -0600
XXX Temporarily add sort debugging
M sv.c
M t/op/sort.t
commit af791c73c1da5791b9179c7cece4b7751231aeac
Author: Karl Williamson <[email protected]>
Date: Wed Aug 26 15:38:39 2015 -0600
Fix potential flaw in 2 EBCDIC macros.
It occurred to me in code reading that it was possible for these macros
to not give the correct result if passed a signed argument.
M utfebcdic.h
commit 28752c1564c7abf7f30bdd5ece584e23bd4af1e9
Author: Karl Williamson <[email protected]>
Date: Wed Aug 26 15:35:05 2015 -0600
utf8.h, utfebcdic.h: Add some assertions
These will detect a array bounds error that occurs on EBCDIC machines,
and by including the assert on non-EBCDIC, we verify that the code
wouldn't fail when built on EBCDIC.
M utf8.h
M utfebcdic.h
commit 007f9ddbcc20b4d11547247847c9219e69ed8409
Author: Karl Williamson <[email protected]>
Date: Tue Aug 25 22:55:31 2015 -0600
ext/XS-APItest/t/svcat.t: Generalize to run on EBCDIC
M ext/XS-APItest/t/svcat.t
commit e43950ab3ce7e5739fbc52c675b6beb8bb529557
Author: Karl Williamson <[email protected]>
Date: Sat Aug 15 12:53:57 2015 -0600
XXX Make EBCDIC cmp work when both operands are UTF-8
XXX fix indent
M sv.c
commit 9023167d674e73c6978231e77856862c57f6c0e2
Author: Karl Williamson <[email protected]>
Date: Sat Aug 15 12:53:17 2015 -0600
utf8.h: Add comment; white space changes
M utf8.h
commit ce4fc1606a42cc8e6c874d936a97e995c8090b30
Author: Karl Williamson <[email protected]>
Date: Thu Aug 13 20:15:53 2015 -0600
Revert "XXX Run Unicode's official normalization tests"
M MANIFEST
M Makefile.SH
M charclass_invlists.h
D lib/Unicode/testnorm.t
D lib/unicore/NormTest.txt
M regcharclass.h
commit 798971624776221ea55d5a106f1d4344b0687e5e
Author: Karl Williamson <[email protected]>
Date: Mon Aug 3 22:00:53 2015 -0600
XXX experimental: op/tr.t
M t/op/tr.t
commit a9431eab1455836c52bf68bb58ff4afe3c952dfb
Author: Karl Williamson <[email protected]>
Date: Mon Aug 3 10:35:26 2015 -0600
XXX temporary
M cpan/Encode/Encode.xs
commit 6e0d6238cf783a48c112cdbcb845b97d621b4839
Author: Karl Williamson <[email protected]>
Date: Mon Aug 3 10:17:08 2015 -0600
XXX op/sort.t: Add more tests
probably comment. These should fail currently on ebcdic
M t/op/sort.t
commit d3d81cc290b1eef2af23f596b8d9730b4a828489
Author: Karl Williamson <[email protected]>
Date: Sun Aug 2 22:18:10 2015 -0600
XXX Test Unicode::Collate and Unicode::Normalize
M t/TEST
commit 6f506c9083dda385885ecbbfb65d864d6ca26063
Author: Karl Williamson <[email protected]>
Date: Sun Aug 2 21:20:44 2015 -0600
offuni
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 6553249c00b459620611814eeea1d8d08ab4d903
Author: Karl Williamson <[email protected]>
Date: Sun Aug 2 21:21:25 2015 -0600
invariant
M utf8.h
commit a05f79ccd2c0082f08edba2857d00035e3ef7dff
Author: Karl Williamson <[email protected]>
Date: Sat Aug 1 22:15:18 2015 -0600
Change EBCDIC macro definition
This changes the definition of isUTF8_POSSIBLY_PROBLEMATIC() on EBCDIC
platforms to use PL_charclass[] instead of PL_e2a[]. The new array is
more likely to be in the memory cache.
M handy.h
M l1_char_class_tab.h
M regen/mk_PL_charclass.pl
M utf8.h
M utfebcdic.h
commit 87e37c23a633037baf67887244a2ea2a0e011d72
Author: Karl Williamson <[email protected]>
Date: Sun Aug 2 09:02:51 2015 -0600
Change EBCDIC macro definition
Prior to this commit UVCHR_SKIP() was defined the same in both ASCII and
EBCDIC, but they expanded to different things. Now, they are defined
separately -- to what they expand to, and the EBCDIC version is changed
when all expanded out to use PL_charclass[] instead of PL_e2a[]. The
new array is more likely to be in the memory cache.
M utf8.h
M utfebcdic.h
commit 9d27bff715e8da33374181111618b35476d26b50
Author: Karl Williamson <[email protected]>
Date: Sat May 16 10:43:40 2015 -0600
Change EBCDIC macro definition
Prior to this commit UVCHR_IS_INVARIANT() was defined the same in both
ASCII and EBCDIC, but they expanded to different things. Now, they are
defined separately to what they expand to, and the EBCDIC version is
changed when all expanded out to use PL_charclass[] instead of PL_e2a[].
The new array is more likely to be in the memory cache.
M utf8.h
M utfebcdic.h
commit f6c5209934358bdd0d912190673384726f585b97
Author: Karl Williamson <[email protected]>
Date: Sat May 16 10:31:19 2015 -0600
utf8.h: Change defn of UNI_IS_INVARIANT
This changes it to be isASCII(), instead of repeating the "special"
number 0x80.
M utf8.h
commit 940aead84eb021f35bfeaadc38b51d707b6bec5b
Author: Karl Williamson <[email protected]>
Date: Fri May 15 14:49:21 2015 -0600
Remove no longer used #define
The previous commit removed all uses of this non-public #define.
M regen/unicode_constants.pl
M unicode_constants.h
commit cd26620d1d41b2c2719e144f460c314172f1dce9
Author: Karl Williamson <[email protected]>
Date: Fri May 15 14:48:23 2015 -0600
Change filter of problematic code points for EBCDIC
There are three classes of problematic Unicode code points that may
require special handling. Which code points are problematic is fairly
complicated, requiring lots of branches. However, the smallest of them
is 0xD800, which means that most code points in modern use are below
them all, and a single test can be used to exclude just about everything
likely to be encountered. The problem was that the way this test was
done on EBCDIC caused way too many things to pass and have to be checked
with the more complicated branches. The digits 0-9 and some capital
letters were not filtered out. This commit changes the EBCDIC test to
transform into I8 (an array lookup), and this fixes it to exclude things
that shouldn't have passed before.
M utf8.c
M utf8.h
commit 2f663aba60ad58bffc884ccfd466606aaa7458bb
Author: Karl Williamson <[email protected]>
Date: Fri May 15 14:35:45 2015 -0600
Change some UTF-EBCDIC macro handling defns
This commit changes the definitions of some macros for UTF-8 handling on
EBCDIC platforms. The previous definitions transformed the bytes into
I8 and did tests on the transformed values. The change is to use
previously unused bits in l1_char_class_tab.h so the transform isn't
needed, and generally only one branch is. These macros are called from
the inner loops of, for example, regex backtracking.
M l1_char_class_tab.h
M regen/mk_PL_charclass.pl
M utfebcdic.h
commit f08691bd6ddb1a2e4ae0b57d0ba4c3e70e949700
Author: Karl Williamson <[email protected]>
Date: Fri May 15 14:23:12 2015 -0600
l1_char_class_tab.h: Add bits for UTF-EBCDIC
This is for the next commit.
M handy.h
M l1_char_class_tab.h
M regen/mk_PL_charclass.pl
commit 672f00be255910ba6f8cf67f69118176d3545be1
Author: Karl Williamson <[email protected]>
Date: Fri May 15 14:21:25 2015 -0600
regen/mk_PL_charclass.pl: Refactor a print
This is in preparation for the next commits.
M regen/mk_PL_charclass.pl
commit c909957098c094870c8652b8892655df171673e6
Author: Karl Williamson <[email protected]>
Date: Fri May 15 10:59:54 2015 -0600
Add macro for converting Latin1 to UTF-8, and use it
This adds a macro that converts a code point in the ASCII 128-255 range
to UTF-8, and changes existing code to use it when the range is known to
be restricted to this one, rather than the previous macro which accepted
a wider range (any code point representable by 2 bytes), but had an
extra test on EBCDIC platforms, hence was larger than necessary and
slightly slower.
M handy.h
M hv.c
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
M utf8.h
commit 22ff8d6fee31a605f9c62c51f2a594a8f82d0539
Author: Karl Williamson <[email protected]>
Date: Fri May 15 10:55:30 2015 -0600
utf8.h: Add assertions to macro
M utf8.h
commit 2bcd915a6afd3e4018dc06f97f7ead6dc868e69a
Author: Karl Williamson <[email protected]>
Date: Wed May 13 17:38:08 2015 -0600
Change to use UVCHR_SKIP over UNI_SKIP
UNI_SKIP is somewhat ambiguous. Perl has long used 'uvchr' as part of a
name to mean the unsigned values using the native character set plus
Unicode values for those above 255.
This also changes two calls (one in dquote_static.c and one in
dquote_inline.h) to use UVCHR_SKIP; they should not have been OFFUNI, as
they are dealing with native values.
M dquote.c
M dquote_inline.h
M op.c
M perl.c
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
commit fe06c70efeda91494e8e9ede0a598ddbd748255d
Author: Karl Williamson <[email protected]>
Date: Sat Aug 1 08:52:52 2015 -0600
XXX Run Unicode's official normalization tests
M MANIFEST
M Makefile.SH
M charclass_invlists.h
A lib/Unicode/testnorm.t
A lib/unicore/NormTest.txt
M regcharclass.h
commit 24d8c50fa9818453adaad8ebbc5d0bacfdeddcd6
Author: Karl Williamson <[email protected]>
Date: Mon May 18 10:45:10 2015 -0600
XXX t/uni/lex_utf8.t: Do some of the tests on EBCDIC
XXX prob. the \xA2 and \377 will fail
M t/uni/lex_utf8.t
commit ef8b39d5e6a5be35b422811cd0a75415e7fe00b8
Author: Karl Williamson <[email protected]>
Date: Mon May 18 10:24:11 2015 -0600
XXX experimental t/op/tr.t
M t/op/tr.t
commit df9ee02ecab14462e95db03d4fda6a5a62ae4a8b
Author: Karl Williamson <[email protected]>
Date: Mon May 18 09:52:59 2015 -0600
XXX t/io/utf8.t: Experimental
M t/io/utf8.t
commit af29e72bb1e952f5364f0b6799b33c15e5112d89
Author: Karl Williamson <[email protected]>
Date: Mon May 18 08:49:37 2015 -0600
XXX japh/abigail.t
Experiment with running on EBCDIC, and using test.pl's skip()
M t/japh/abigail.t
commit ca09623c0f0c5db7cdeea03c0b89602135e94968
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:35:12 2015 -0600
perlapi: Nits
M sv.c
M util.c
commit ef0f470fd20f136cd8f2ec14141674f915437310
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:25:33 2015 -0600
XXX look for more has X bit set
M pad.c
M sv.c
commit 282c313df15c7d7098840199a97bd077eac5b596
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:22:32 2015 -0600
XXX look for more perlapi: Add L<>
M op.h
commit f598621de9b898f4f4f763563f827177a852fa84
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:21:51 2015 -0600
perlapi: Add link
M hv.c
commit 6cf80912d96fb661b1fe7a75b1426e34844b6db3
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:21:17 2015 -0600
XXX look for more perlapi UTF-8
M gv.c
M hv.c
M sv.h
commit 8e8ccd0a57b7b8ac520b74686de07527a77a0ce8
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:19:13 2015 -0600
XXX look for more eg to e.g.
M cv.h
M mg.c
commit 94dae751531e7efa7c22f08c8af07999ca27eeaf
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:17:48 2015 -0600
XXX check and look more to come Add S<>
M av.c
M mg.c
M op.c
M pad.c
M sv.c
M utf8.h
M util.h
commit d8a2df69b661a5e4318a223fc91c03da36d25382
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:12:55 2015 -0600
vutil.c Nits, C<> L<>, XXX cpan upstream
M vutil.c
commit 5c7dd3e11d4db7015b0a53bdbc3490918fe8a439
Author: Karl Williamson <[email protected]>
Date: Fri May 8 21:10:16 2015 -0600
XXX C<> for mro.xs
M ext/mro/mro.xs
commit 1d3077b6bad31899fc3770a49e8a99218a1b402b
Author: Karl Williamson <[email protected]>
Date: Thu May 7 10:58:54 2015 -0600
XXX perlapi: Add C<> around
Look through the code again, like for NUL(L)?
Removes 'the' in front of parameter in some instances.
M XSUB.h
M av.c
M dump.c
M gv.c
M handy.h
M hv.c
M hv.h
M inline.h
M intrpvar.h
M mathoms.c
M mg.c
M mro_core.c
M numeric.c
M op.c
M op.h
M pad.c
M pad.h
M perl.c
M pp_ctl.c
M pp_pack.c
M pp_sort.c
M pp_sys.c
M regexp.h
M sv.c
M sv.h
M utf8.c
M util.c
-----------------------------------------------------------------------
--
Perl5 Master Repository