In perl.git, the branch smoke-me/khw-ebcdic has been created
<https://perl5.git.perl.org/perl.git/commitdiff/1d8613eb85d0b932bc15c9651cca4a4a9f12d720?hp=0000000000000000000000000000000000000000>
at 1d8613eb85d0b932bc15c9651cca4a4a9f12d720 (commit)
- Log -----------------------------------------------------------------
commit 1d8613eb85d0b932bc15c9651cca4a4a9f12d720
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 22:34:37 2019 -0600
intrpvar.h: Add variable for use in tr///
This is part of this branch of changes.
commit 5b189c008bc66dd73c9e4774f558edbd22b03c91
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 15:29:05 2019 -0600
Allow core to work with code points above IV_MAX
Higher has been reserved for core use, and a future commit will want to
finally do this.
commit c457a1c25f8324a25f646bfdb8c14c1276cabfd1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 22:04:12 2019 -0600
utfebcdic.h: Add comments
commit 17fc5f69a3458f1e052329cf73846a08686d0de5
Author: Karl Williamson <[email protected]>
Date: Fri Sep 20 09:51:13 2019 -0600
Move Perl_regnext to regexec.c
This function is moved to the file that calls it incessantly in real
time from regcomp.c that uses it in compilation, which experience has
shown can be less efficient and doesn't affect the overall performance.
Now the compiler has full knowledge of this function in the translation
unit that performance is critical in, and can hopefully perform better
optimizations.
commit 454a76a2f3e3e50928ae7f31e61d39f3518f61f6
Author: Karl Williamson <[email protected]>
Date: Fri Sep 20 09:45:29 2019 -0600
regnext: Add some branch predictor hints
commit b476eb6fbd2059d0a15e741de8ba072a49b99bb1
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 22:18:02 2019 -0600
Change data lookup from a macro to a function
commit 7c709529866d854dbde36a20315021eca5c4ef6a
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 21:54:03 2019 -0600
regen/regcomp.pl: Enforce all lonj nodes being last
commit e2ce2e65b98f2aae49f2b42a7b6f8019213aa813
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 20:34:17 2019 -0600
regcomp.sym: Move regnodes to end that don't use next_off
Most regnodes use the next_off field in a regnode structure, to link to
the next one in the chain. But some require more than the 16 bits it
contains, so they use a different, 32 bit, field.
Currently, there is a lookup array to distinguish between the types, but
that becomes unnecessary if all of one sort are grouped before or after
all of the other.
commit 2b6a348c0c269b48e0622e3af18ade88f102b0bf
Author: Karl Williamson <[email protected]>
Date: Sat Sep 21 09:51:52 2019 -0600
Add ANYOFRb regnode
This is like the ANYOFR regnode added in the previous commit, but all
code points in the range it matches are known to have the same first
UTF-8 start byte. That means it can't match UTF-8 invariant characters,
like ASCII, because the "start" byte is different on each one, so it
could only match a range of 1, and the compiler wouldn't generate this
node for that; instead using an EXACT.
Pattern matching can rule out most code points by looking at the first
character of their UTF-8 representation, before having to convert from
UTF-8.
On ASCII this rules out all but 64 2-byte UTF-8 characters from this
simple comparison. 3-byte it's up to 4096, and 4-byte, 2**18, so the
test is less effective for higher code points.
I believe that most UTF-8 patterns that otherwise would compile to
ANYOFR will instead compile to this, as I can't envision real life
applications wanting to match large single ranges. Even the 2048
surrogates all have the same first byte.
commit 8fcfe7aa8b0d7beb151f43131ac4d7a884ca0830
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 16:03:04 2019 -0600
Add ANYOFR regnode
This matches a single range of code points. It is both faster and
smaller than other ANYOF-type nodes, requiring, after set-up, a single
subtraction and conditional branch.
The vast majority of Unicode properties match a single range, though
most of these are not likely to be used in real world applications. But
things like [ij] are a single range, and those are quite commonly
encountered. This matches them more efficiently than a bitmap would,
and doesn't require the space for one either.
The flags field is used to store the minimum matchable start byte for
UTF-8 strings, and is ignored for non-UTF-8 targets. This, like ANYOFH
nodes which have the same mechanism, allows for quick weeding out of
many possible matches without having to convert the UTF-8 to its
corresponding code point.
This regnode packs the 32 bit argument with 20 bits for the minimum code
point the node matches, and 12 bits for the maximum range. Values
outside those simply won't compile to this regnode, instead going to one
of the ANYOFH flavors. This is sufficient to match all of Unicode
except for the final (private use) 65K plane.
commit 1f66929141c1546a92e8f14caf1a64a72042e75b
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 16:04:03 2019 -0600
regexec.c: Rmv some unnecessary casts
The called macro does the cast, and this makes it more legibile
commit 07f6a2858b23e5296e2079e5f403c5c5e03bb211
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 15:47:51 2019 -0600
regcomp.c: Use variables initialized to macro results
instead of the macros. This is in preparation for the next commit.
commit 1ac0615a383fb03372b3ae981f5c0d40f7dec0b9
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 14:20:59 2019 -0600
regcomp.c: Add parameter to static function
This further decouples this function from knowing details of the calling
structure, by passing this detail in.
commit e7e3c2bac261ba4213c8f26b9287e93160e497f2
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 13:20:42 2019 -0600
t/re/anyof.t: Add a test
This makes sure a non-folding above-Latin1 character is tested.
commit f6ec52c0aa40d59b291ba49d156e1d873bdf78ad
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 14:38:39 2019 -0600
regcomp.c: Comments/white-space
Included is outdenting code whose enclosing block was removed in the
previous commit.
commit 19cee5a38157fb9d28f0cb0e12657bdfe7497562
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 13:12:51 2019 -0600
XXX warning tests,Prefer EXACTish regnodes to ANYOFH nodes
ANYOFH nodes (that match code points above 255) are smaller than regular
ANYOF nodes because they don't have a 256-bit bitmap. But the
disadvantage of them over EXACT nodes is that the characters encountered
must first be converted from UTF-8 to code point. The difference is
less clearcut with /i, because typically, currently, the UTF-8 must also
be converted to code point in order to fold them. But the EXACTFish
node doesn't have an inversion list to do lookup in, and occupies
less space, because it doesn't have inversion list data attached to it.
Also there is a bug in using ANYOFH under /l, as wide character warnings
should be emitted if the locale isn't a UTF-8 one.
The reason this change hasn't been made before (by me anyway) is that
the old way avoided upgrading the pattern to UTF-8. But having thought
about this for a long time, to match this node, the target string must
be in UTF-8 anyway, and having a UTF8ness mismatch slows down pattern
matching, as things have to be continually converted, and reconverted
after backtracking.
commit 61abf2a6735c2097d76bfa29442a53c18a4d2198
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:45:55 2019 -0600
t/re/anyof.t: Fix highest range tests
Previously we had infinity minus 1, but infinity should be beyond the
range, and the highest isn't infinity - 1, but the highest legal code
point.
commit 3ab95b5f760c8f54935043fc9d6c13a8f73f816f
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:41:41 2019 -0600
t/re/anyof.t: Remove duplicate test
These are covered by the single code point tests.
commit 6951daf3a03edc88df5fbba0b4146f6e8f7a17e7
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:34:23 2019 -0600
t/re/anyof.t: Remove invalid test
One shouldn't be able to specify an infinite code point. The tests have
the conceit that one can specify a range's upper limit as infinity, but
that is just shorthand for the range being unbounded.
commit 3f6c2208a11bfe634378cae164cae7aeed82db03
Author: Karl Williamson <[email protected]>
Date: Sat Sep 21 10:00:40 2019 -0600
t/re/anyof.t: Revise test
to make it correspond more with the test that precedes it
commit 312c4f08c723b082854cee6dbda7493d58ed42d7
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:31:11 2019 -0600
re/anyof.t: Clarify failing message
When a test fails, an extra test is run to output debugging info; this
will cause the planned number of tests to be wrong, which will output an
extra, confusing message. This adds an explanation that the number is
expected to be wrong, hence not to worry.
commit d2702cc3a0c56f93dd46392eae18535a12119732
Author: Karl Williamson <[email protected]>
Date: Thu Sep 12 20:19:07 2019 -0600
Allow some optimizations of qr/(?[...])/
Prior to this commit, this construct always returned an ANYOF node, even
if it could be optimized into something else.
commit 89a7a41e9c04bfdeb40c390ea9f2c4fc62157c66
Author: Karl Williamson <[email protected]>
Date: Thu May 30 20:57:27 2019 -0600
regcomp.c: Add invlist_lowest()
This function hides the invlist implementation from the calling code,
and will be called in more than one place in the future.
commit ac28a3933729bfaa95dc685e104c6979cabc71cb
Author: Karl Williamson <[email protected]>
Date: Thu Sep 12 21:06:45 2019 -0600
regcomp.c: Code for qr/(?[...]) handle restart
There is an existing mechanism for code to realize it needs to restart
parsing from the beginning, say because it needs to upgrade to UTF-8.
The code for /(?[...])/ did not participate in this. Currently I don't
know of any case where it needs to, though perhaps some very hard to
reproduce case when branch instructions need to start needing to handle
more than 16 bits, but I kind of doubt it. Anyway, the next few commits
introduce the possibility.
commit aa57d3ddae14a6dc02407c346ad26d60c60a1b1d
Author: Karl Williamson <[email protected]>
Date: Sat Sep 7 09:18:49 2019 -0600
malloc.c: Use isDIGIT macro instead of hand-rolling it
The macro is more efficient
commit acc4b28ce1effe1d5e263050c3154e659e6be59b
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 10:25:26 2019 -0600
doio.c: Use inRANGE macro
commit 7aa73dd377880890d0e8a3e6bada1c0bfbe9d5fc
Author: Karl Williamson <[email protected]>
Date: Tue Oct 1 22:34:25 2019 -0600
util.c: Use inRANGE macro
commit ea95e463e87b059e225b37e42db7f282c64f7eb2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:54:57 2019 -0600
t/op/tr_latin1.t: Skip ASCII-centric tests on EBCDIC
commit c5ad5972efe10e9349f833ec4698f98758e9afe2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:54:12 2019 -0600
dist/Data-Dumper/t/dumper.t: Skip ASCII-centric tests on EBCDIC
commit 0877c5838ad0e43d7feccb1700a8be5063f676ae
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 10:23:26 2019 -0600
t/re/regexp.t: Only convert to EBCDIC once
Some tests get added as we go along, and those added tests have already
been converted to EBCDIC if necessary. Don't reconvert, which messes
things up.
commit eeecb3a547f995cdf13caade237fc8bc629d1ba3
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 09:49:41 2019 -0600
re/regexp.t: Change variable name to be more meaningful
commit c8d5f250daaff03be7900cf8a21a63543b4162b2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:20:42 2019 -0600
Configure klude about none optimize
commit 335abf19f226d3a6aff78dfda53d76a31f558858
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:19:59 2019 -0600
XXX regexec.c: debugging prints
commit 886ca8a5e4306a0dcc85c35e2a6b5e61e37252cc
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:19:22 2019 -0600
regcomp.c: Use inRANGE macro
This is faster and clearer
commit 70dbbf206f00a7c10e25451b086d8af9b099e195
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:17:23 2019 -0600
lib/ExtUtils/t/Embed.t: Skip on EBCDIC
This is not currently implemented for EBCDIC
commit d8a24d110b7f8b9557cb1b82da434a50868a9f9b
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:16:42 2019 -0600
XXX Pod-Simple
commit 1010d3172f060118b1abf5a510a3c0c1a4b8108c
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:16:14 2019 -0600
XXX Encode
commit 38ecb0af36e514ff6ec7ad8a330cb47a5be3e6b0
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:12:24 2019 -0600
dist/Storable/t/regexp.t: Mark some tests as ASCII-only
These tests are ASCII centric
commit 67bcdd7fb16418e7e9d3e6b942bb56ff108a07e0
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:11:14 2019 -0600
ext/DynaLoader/dl_aix.xs: Use isDIGIT macro
which is more efficient
commit 857721c54b69b1886adc2a332a72f9959d816378
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:10:36 2019 -0600
pp_pack.c: Use inRANGE macro
which is more efficient
commit f1a8849bd4bb794ade578b7164fe1e0396bcefb9
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:09:24 2019 -0600
t/op/die.t: 'use utf8'
This file is encoded in UTF-8, even though it didn't say it was.
commit 10e34d7a9e654c6be69a0ac625a7c98acee0bb57
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:08:46 2019 -0600
t/op/qr.t: Don't use fancy apostrophe
when the ASCII one will do.
commit 46421741601ced21a9ebe794367fd15974016aa1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:07:50 2019 -0600
t/op/threads-dirh.t: Add ability to skip on memory constrained
This ran out of memory on a very limited smoker; add a check for
environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
it.
commit 872a33940f9e9732dba6dcb5ba61f19022183be2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:06:50 2019 -0600
t/re/bigfuzzy_not_utf8.t: Add ability to skip on memory constrained
This test blew the memory on a very limited smoker; add a check
for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
these.
commit 712f71782745887d1362cc28b994982d629ad17d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:05:02 2019 -0600
t/re/pat.t: Add ability to skip on memory constrained
A few tests were blowin the memory on a very limited smoker; add a check
for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
these.
commit ec1e3c9c7723698e3489e0557e2c2b7c7cb335e2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:03:54 2019 -0600
t/re/re_tests: Skip ASCII-centric test for EBCDIC
Add a similar one for EBCDIC
commit 658f02c416eeb56dd71b0795feeafc0e2d238f2b
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:02:44 2019 -0600
win32/vdir.h: Use inRANGE macro
which is more efficient.
commit cca9123b8b9031d092171c95a3cc66e8877f1390
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:02:01 2019 -0600
win32/win32.c: Use inRANGE macro
which is more efficient.
commit b7b4203bbac29c50a4ff5276700028dfbbd0c399
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:01:21 2019 -0600
win32/win32io.c: Use inRANGE macro
which is more efficient.
commit 7bd697f75efec009b6a17cc1cddaf6bb05457569
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:58:15 2019 -0600
caretx.c: Use inRANGE()
This is more efficient
commit 6b2187c2aabf19061f272dd86f92e30661360eb3
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:43:49 2019 -0600
l1_char_class_tab.h: Remove some special EBCDIC cases
These are no longer needed.
commit 6e921b381e47fafae9cb2e220d8b8a70b0c94cd5
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:42:19 2019 -0600
utfebcdic.h: Move some #defines
commit 62a684ba66f0368ebdbb4c97611f181cec7ad51c
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:08:24 2019 -0600
Make defn of UTF_IS_CONTINUED common
This can be derived from other values, removing an EBCDIC dependency
commit 0aff96ddbde22883b56689cea35388fc82b30c59
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:37:17 2019 -0600
Make defn of UVCHR_IS_INVARIANT common
This can be derived from other values, removing an EBCDIC dependency
commit 3f467e3479860974d067a126779e4bc4831759c1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 18:08:32 2019 -0600
Make defn of OFFUNI_IS_INVARIANT common
This can be derived from other values, removing an EBCDIC dependency
commit 8ca088d986b20be98d1e97f3408704db8d0d5063
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 18:03:26 2019 -0600
Make defn of UTF8_IS_DOWNGRADEABLE_START common
This can be derived from other values, removing an EBCDIC dependency
commit 888f8b0c6f183a92fc9839ede4663d125d869135
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:56:01 2019 -0600
Make defn of UTF_IS_ABOVE_LATIN1 common
This can be derived from other values, removing an EBCDIC dependency
commit d304db0ec06173513838cfcb6753303c8033b22a
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:52:34 2019 -0600
Make defn of UTF8_IS_START common
This can be derived from other values, removing an EBCDIC dependency
commit 98ddec29964c0349e887cc89d592575c510f728d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:13:31 2019 -0600
Make defn of UTF8_IS_CONTINUATION common
This can be derived from other values, removing an EBCDIC dependency
commit a440e9da578fb5293fea8cbfc1508eb7ee223ecc
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:07:50 2019 -0600
Make defn of UTF_CONTINUATION_MARK common
This can be derived from other values, removing an EBCDIC dependency
commit a416326d5f0aea540a75b9dddeba2990c0ec2291
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:48:38 2019 -0600
Make UTF_IS_CONTINUATION_MASK common
This variable can be defined from the same base in both UTF-8 and
UTF-EBCDIC, and doing so eliminates an EBCDIC dependency.
commit 90ed005930d214fc659416661d87776c9bdae5a4
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:43:50 2019 -0600
utf8.h: Add comment
commit 8055be4b30507dab0e91b4c158f62624b624ef86
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:40:44 2019 -0600
utf8.h: Remove redundant cast
The called macro does the cast already
commit 20fefbe119d08187cfdc8befc2beb9253c852c5d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:37:17 2019 -0600
utf8.h: Make sure macros not called with a ptr
By doing an '| 0' with a parameter in a macro expansion, a C syntax
error will be generated. This is free protection.
commit cf7064c166d2561f124a842d5236328a3b7bfde1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:23:39 2019 -0600
t/TEST: Test most of CPAN on EBCDIC
CPAN was mostly skipped before because so many distros raised errors,
but that is no longer true, so just skip about 10 that have big
problems, and test the rest
commit a085c39094b20ef3b3f0c59079577703e8fbb200
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:13:31 2019 -0600
mktables: Fix Named Sequences for EBCDIC
This table wasn't being translated into native code points
commit fea2e6ba7b3268b31ee4fb148322740f80367486
Author: Karl Williamson <[email protected]>
Date: Wed Jun 26 13:02:35 2019 -0600
XXX Configure
commit 010b6f9fb4d660c376568cb298ab3165b2977050
Author: Karl Williamson <[email protected]>
Date: Fri Aug 30 10:31:51 2019 -0600
ebcdic bridge alphas
-----------------------------------------------------------------------
--
Perl5 Master Repository