In perl.git, the branch smoke-me/khw-ebcdic has been created
<https://perl5.git.perl.org/perl.git/commitdiff/b68728aa1fa18100e00885c14faead9e0a84613d?hp=0000000000000000000000000000000000000000>
at b68728aa1fa18100e00885c14faead9e0a84613d (commit)
- Log -----------------------------------------------------------------
commit b68728aa1fa18100e00885c14faead9e0a84613d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 22:34:37 2019 -0600
intrpvar.h: Add variable for use in tr///
This is part of this branch of changes.
commit 6baddda157b548e8ccafc9caa52fbe4284b6c6cc
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 15:29:05 2019 -0600
Allow core to work with code points above IV_MAX
Higher has been reserved for core use, and a future commit will want to
finally do this.
commit 7be70f378d1537352483db59ccc5610dcfbc0eb5
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 22:04:12 2019 -0600
utfebcdic.h: Add comments
commit aa66debe59fcb1e3e724d416fa85b74f3066231b
Author: Karl Williamson <[email protected]>
Date: Fri Sep 20 09:51:13 2019 -0600
Move Perl_regnext to regexec.c
This function is moved to the file that calls it incessantly in real
time from regcomp.c that uses it in compilation, which experience has
shown can be less efficient and doesn't affect the overall performance.
Now the compiler has full knowledge of this function in the translation
unit that performance is critical in, and can hopefully perform better
optimizations.
commit 279a8ecd6df5a5b9458c4cfdd16fbeb7bb1c6b4f
Author: Karl Williamson <[email protected]>
Date: Fri Sep 20 09:45:29 2019 -0600
regnext: Add some branch predictor hints
commit 8b2c5c4665fd4ae84b578c748f24a13b344cf3dc
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 22:18:02 2019 -0600
Change data lookup from a macro to a function
commit 2edf274e5bdd94a33433fb171c95cc445bec7aac
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 21:54:03 2019 -0600
regen/regcomp.pl: Enforce all lonj nodes being last
commit da0f852b1261fefaae0dc71ed2a9efb18337a971
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 20:34:17 2019 -0600
regcomp.sym: Move regnodes to end that don't use next_off
Most regnodes use the next_off field in a regnode structure, to link to
the next one in the chain. But some require more than the 16 bits it
contains, so they use a different, 32 bit, field.
Currently, there is a lookup array to distinguish between the types, but
that becomes unnecessary if all of one sort are grouped before or after
all of the other.
commit b66fee5ca6a7ef65a91207b8ac53320f0f313fc7
Author: Karl Williamson <[email protected]>
Date: Sat Sep 21 09:51:52 2019 -0600
Add ANYOFRb regnode
This is like the ANYOFR regnode added in the previous commit, but all
code points in the range it matches are known to have the same first
UTF-8 start byte. That means it can't match UTF-8 invariant characters,
like ASCII, because the "start" byte is different on each one, so it
could only match a range of 1, and the compiler wouldn't generate this
node for that; instead using an EXACT.
Pattern matching can rule out most code points by looking at the first
character of their UTF-8 representation, before having to convert from
UTF-8.
On ASCII this rules out all but 64 2-byte UTF-8 characters from this
simple comparison. 3-byte it's up to 4096, and 4-byte, 2**18, so the
test is less effective for higher code points.
I believe that most UTF-8 patterns that otherwise would compile to
ANYOFR will instead compile to this, as I can't envision real life
applications wanting to match large single ranges. Even the 2048
surrogates all have the same first byte.
commit 951c76af412102ad78ce037727bec523bc9027d6
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 16:03:04 2019 -0600
Add ANYOFR regnode
This matches a single range of code points. It is both faster and
smaller than other ANYOF-type nodes, requiring, after set-up, a single
subtraction and conditional branch.
The vast majority of Unicode properties match a single range, though
most of these are not likely to be used in real world applications. But
things like [ij] are a single range, and those are quite commonly
encountered. This matches them more efficiently than a bitmap would,
and doesn't require the space for one either.
The flags field is used to store the minimum matchable start byte for
UTF-8 strings, and is ignored for non-UTF-8 targets. This, like ANYOFH
nodes which have the same mechanism, allows for quick weeding out of
many possible matches without having to convert the UTF-8 to its
corresponding code point.
This regnode packs the 32 bit argument with 20 bits for the minimum code
point the node matches, and 12 bits for the maximum range. Values
outside those simply won't compile to this regnode, instead going to one
of the ANYOFH flavors. This is sufficient to match all of Unicode
except for the final (private use) 65K plane.
commit 4a5972f17b0c63abf4bfdd1e44bcad264793360e
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 16:04:03 2019 -0600
regexec.c: Rmv some unnecessary casts
The called macro does the cast, and this makes it more legibile
commit e1a04aef35ca838aee68ad26845bce0476a4ad90
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 15:47:51 2019 -0600
regcomp.c: Use variables initialized to macro results
instead of the macros. This is in preparation for the next commit.
commit a73d5a16e9a5ccafeb44035a88ef5e67408c0a6f
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 14:20:59 2019 -0600
regcomp.c: Add parameter to static function
This further decouples this function from knowing details of the calling
structure, by passing this detail in.
commit 80e45f2b92d27e80207b601530c5bbb96ce06f38
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 13:20:42 2019 -0600
t/re/anyof.t: Add a test
This makes sure a non-folding above-Latin1 character is tested.
commit 567ce18e6521f1969f2436f703072a1197dc2cba
Author: Karl Williamson <[email protected]>
Date: Thu Sep 19 14:38:39 2019 -0600
regcomp.c: Comments/white-space
Included is outdenting code whose enclosing block was removed in the
previous commit.
commit f0fe8dd77686b1db50609732d77cad87bbea183b
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 13:12:51 2019 -0600
XXX warning tests,Prefer EXACTish regnodes to ANYOFH nodes
ANYOFH nodes (that match code points above 255) are smaller than regular
ANYOF nodes because they don't have a 256-bit bitmap. But the
disadvantage of them over EXACT nodes is that the characters encountered
must first be converted from UTF-8 to code point. The difference is
less clearcut with /i, because typically, currently, the UTF-8 must also
be converted to code point in order to fold them. But the EXACTFish
node doesn't have an inversion list to do lookup in, and occupies
less space, because it doesn't have inversion list data attached to it.
Also there is a bug in using ANYOFH under /l, as wide character warnings
should be emitted if the locale isn't a UTF-8 one.
The reason this change hasn't been made before (by me anyway) is that
the old way avoided upgrading the pattern to UTF-8. But having thought
about this for a long time, to match this node, the target string must
be in UTF-8 anyway, and having a UTF8ness mismatch slows down pattern
matching, as things have to be continually converted, and reconverted
after backtracking.
commit 2fa05d80418a0aa3feb1310550be877ce6d629b3
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:45:55 2019 -0600
t/re/anyof.t: Fix highest range tests
Previously we had infinity minus 1, but infinity should be beyond the
range, and the highest isn't infinity - 1, but the highest legal code
point.
commit 2f219c7edd72ddf9186bde7a8b389bf333975b1a
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:41:41 2019 -0600
t/re/anyof.t: Remove duplicate test
These are covered by the single code point tests.
commit 561d56c097eb2a70d29ace331a35f0716e1677ef
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:34:23 2019 -0600
t/re/anyof.t: Remove invalid test
One shouldn't be able to specify an infinite code point. The tests have
the conceit that one can specify a range's upper limit as infinity, but
that is just shorthand for the range being unbounded.
commit 6ab283c902eaf797e130902ba2b3fe25770e61ab
Author: Karl Williamson <[email protected]>
Date: Sat Sep 21 10:00:40 2019 -0600
t/re/anyof.t: Revise test
to make it correspond more with the test that precedes it
commit dc2747c0107f55c808879448ece90da38d33b7d4
Author: Karl Williamson <[email protected]>
Date: Wed Sep 18 12:31:11 2019 -0600
re/anyof.t: Clarify failing message
When a test fails, an extra test is run to output debugging info; this
will cause the planned number of tests to be wrong, which will output an
extra, confusing message. This adds an explanation that the number is
expected to be wrong, hence not to worry.
commit 0f3e53356ae5da6fb343e5db7b90c92369f8ef10
Author: Karl Williamson <[email protected]>
Date: Thu Sep 12 20:19:07 2019 -0600
Allow some optimizations of qr/(?[...])/
Prior to this commit, this construct always returned an ANYOF node, even
if it could be optimized into something else.
commit a0afc45f67e3c1a98850094df25850e882b66533
Author: Karl Williamson <[email protected]>
Date: Thu May 30 20:57:27 2019 -0600
regcomp.c: Add invlist_lowest()
This function hides the invlist implementation from the calling code,
and will be called in more than one place in the future.
commit 328893707f9b1a60d2d76ae446445c38a32bf2a6
Author: Karl Williamson <[email protected]>
Date: Thu Sep 12 21:06:45 2019 -0600
regcomp.c: Code for qr/(?[...]) handle restart
There is an existing mechanism for code to realize it needs to restart
parsing from the beginning, say because it needs to upgrade to UTF-8.
The code for /(?[...])/ did not participate in this. Currently I don't
know of any case where it needs to, though perhaps some very hard to
reproduce case when branch instructions need to start needing to handle
more than 16 bits, but I kind of doubt it. Anyway, the next few commits
introduce the possibility.
commit a6706a92a186e79e5ea446b62591644e24ff5c3d
Author: Karl Williamson <[email protected]>
Date: Sat Sep 7 09:18:49 2019 -0600
malloc.c: Use isDIGIT macro instead of hand-rolling it
The macro is more efficient
commit 9d513ed30c437e6cbb4b0a6e999fd6cd38bf8108
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 10:25:26 2019 -0600
doio.c: Use inRANGE macro
commit a8e1c66fe956e639dfb8d691b3af821555ef97db
Author: Karl Williamson <[email protected]>
Date: Tue Oct 1 22:34:25 2019 -0600
util.c: Use inRANGE macro
commit cd82dbb8f9b43b80e3167f465308fbc7eff8885f
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:54:57 2019 -0600
t/op/tr_latin1.t: Skip ASCII-centric tests on EBCDIC
commit 69d38f5936a78bfec97019ad78e1e723dd285e7f
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:54:12 2019 -0600
dist/Data-Dumper/t/dumper.t: Skip ASCII-centric tests on EBCDIC
commit d1d511cf1f6a544a06f7f260cd6daa69dd804e93
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 10:23:26 2019 -0600
t/re/regexp.t: Only convert to EBCDIC once
Some tests get added as we go along, and those added tests have already
been converted to EBCDIC if necessary. Don't reconvert, which messes
things up.
commit bbfbb659b59b21d4674ebf0a1d6635cb724dd14b
Author: Karl Williamson <[email protected]>
Date: Fri Sep 6 09:49:41 2019 -0600
re/regexp.t: Change variable name to be more meaningful
commit 9a8c183cd121ab02b266dfd56066688da219530a
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:20:42 2019 -0600
Configure klude about none optimize
commit 81881f0431d001e61402fd25bf8585a91d3f6629
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:19:59 2019 -0600
XXX regexec.c: debugging prints
commit 8299d9a65aa908797d995dcfc8c75cee303e3bde
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:19:22 2019 -0600
regcomp.c: Use inRANGE macro
This is faster and clearer
commit 41e5e8301f0e597790e8acd745d6d82b5b92b467
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:17:23 2019 -0600
lib/ExtUtils/t/Embed.t: Skip on EBCDIC
This is not currently implemented for EBCDIC
commit fb1b2d8ca83a56eda2fcb6c8951b87b79db73a71
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:16:42 2019 -0600
XXX Pod-Simple
commit 8f3a4f2627a4313a8ab35638c82db70a7dce846d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:16:14 2019 -0600
XXX Encode
commit aabcdcfd42a3fe08f73749835f652e8af1d8596c
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:12:24 2019 -0600
dist/Storable/t/regexp.t: Mark some tests as ASCII-only
These tests are ASCII centric
commit ba7e5a52b4b5fcc0e6609d083373b2892e9a2d6b
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:11:14 2019 -0600
ext/DynaLoader/dl_aix.xs: Use isDIGIT macro
which is more efficient
commit 62c7270cb33948bf6b4f04106a275136a27dfbe0
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:10:36 2019 -0600
pp_pack.c: Use inRANGE macro
which is more efficient
commit ad7e6a0fcea9eb84eeff7d9409675433f08c2a03
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:09:24 2019 -0600
t/op/die.t: 'use utf8'
This file is encoded in UTF-8, even though it didn't say it was.
commit 9e3afc13ac13bff3c52b38393dc781daa6e38f27
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:08:46 2019 -0600
t/op/qr.t: Don't use fancy apostrophe
when the ASCII one will do.
commit 5cacf99d51ecb8ba6a2ebcc7a05cca517973dd6e
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:07:50 2019 -0600
t/op/threads-dirh.t: Add ability to skip on memory constrained
This ran out of memory on a very limited smoker; add a check for
environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
it.
commit 5d340c0dff1c0ef03e2acb06d83e91a94b0849e2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:06:50 2019 -0600
t/re/bigfuzzy_not_utf8.t: Add ability to skip on memory constrained
This test blew the memory on a very limited smoker; add a check
for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
these.
commit e59982955e95d22f70ad1119f8e072150165ea2f
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:05:02 2019 -0600
t/re/pat.t: Add ability to skip on memory constrained
A few tests were blowin the memory on a very limited smoker; add a check
for environment variable PERL_SKIP_BIG_MEM_TESTS being non-zero to skip
these.
commit 2ed9cd9019ab3c408b5b7e8dad64da76f619e1de
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:03:54 2019 -0600
t/re/re_tests: Skip ASCII-centric test for EBCDIC
Add a similar one for EBCDIC
commit cfda69717ede221d7a3df0974deac79553cffa40
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:02:44 2019 -0600
win32/vdir.h: Use inRANGE macro
which is more efficient.
commit 96c2cf42708b270b5a62cd18b474a782804cc89b
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:02:01 2019 -0600
win32/win32.c: Use inRANGE macro
which is more efficient.
commit 50541fc8a6adbf36f781171da7f292a7a3885fec
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 21:01:21 2019 -0600
win32/win32io.c: Use inRANGE macro
which is more efficient.
commit 15c7e4eceb39b412fe3d05157403a60f20dc165d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:58:15 2019 -0600
caretx.c: Use inRANGE()
This is more efficient
commit ce1b88ff3b80f9807193931a38eb4b283a82d203
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:43:49 2019 -0600
l1_char_class_tab.h: Remove some special EBCDIC cases
These are no longer needed.
commit 692e610fac8138f8229c805aedf34527d48b24c3
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:42:19 2019 -0600
utfebcdic.h: Move some #defines
commit 0cbd7e30d8496db35967e60b2a4b72b557e88e2f
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:08:24 2019 -0600
Make defn of UTF_IS_CONTINUED common
This can be derived from other values, removing an EBCDIC dependency
commit c142cb67e904426bd573736c1b5ae67d367cbc23
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 20:37:17 2019 -0600
Make defn of UVCHR_IS_INVARIANT common
This can be derived from other values, removing an EBCDIC dependency
commit f897e6ffc7abc9f7b9d0fd70b4c81be8be86f518
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 18:08:32 2019 -0600
Make defn of OFFUNI_IS_INVARIANT common
This can be derived from other values, removing an EBCDIC dependency
commit 9fee57bcbab0361d204b56075aeb8847a8cca1c3
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 18:03:26 2019 -0600
Make defn of UTF8_IS_DOWNGRADEABLE_START common
This can be derived from other values, removing an EBCDIC dependency
commit 3f0d59ea8b3b6a7feaa981db742c77f07bcccdd3
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:56:01 2019 -0600
Make defn of UTF_IS_ABOVE_LATIN1 common
This can be derived from other values, removing an EBCDIC dependency
commit f73ae121e048236a32b5eb31d60a31ba575fd2f8
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:52:34 2019 -0600
Make defn of UTF8_IS_START common
This can be derived from other values, removing an EBCDIC dependency
commit 76c9de5b3ee08e5562f26eaeeee87570ab9fa8b4
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:13:31 2019 -0600
Make defn of UTF8_IS_CONTINUATION common
This can be derived from other values, removing an EBCDIC dependency
commit 216873c6dbaf59a0baa8d877bd295745a6129a3b
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 17:07:50 2019 -0600
Make defn of UTF_CONTINUATION_MARK common
This can be derived from other values, removing an EBCDIC dependency
commit e7bfc74a64d3bfcab96ff66f245de63fc6344ba5
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:48:38 2019 -0600
Make UTF_IS_CONTINUATION_MASK common
This variable can be defined from the same base in both UTF-8 and
UTF-EBCDIC, and doing so eliminates an EBCDIC dependency.
commit cf6e138a71378c5a1fdc931600ae14d33e1ff3a5
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:43:50 2019 -0600
utf8.h: Add comment
commit f53aa87dcae4a0d1b4e054d5c9c4eeb60afa8eb2
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:40:44 2019 -0600
utf8.h: Remove redundant cast
The called macro does the cast already
commit 26dbade501a3fa222d3c17974c69826039360ee7
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:37:17 2019 -0600
utf8.h: Make sure macros not called with a ptr
By doing an '| 0' with a parameter in a macro expansion, a C syntax
error will be generated. This is free protection.
commit e81cf5797705f445ffd76676cb49a034c56e88f1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:23:39 2019 -0600
t/TEST: Test most of CPAN on EBCDIC
CPAN was mostly skipped before because so many distros raised errors,
but that is no longer true, so just skip about 10 that have big
problems, and test the rest
commit acc6895ab412287d0440ebf2747655f6d30490c1
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:19:49 2019 -0600
lib/charnames.t: Fix Named Sequence test for EBCDIC
The file from Unicode needs to be translated to native
commit 8534deafabafbbab7dbaf76afdea645483c4bc56
Author: Karl Williamson <[email protected]>
Date: Wed Oct 2 16:13:31 2019 -0600
mktables: Fix Named Sequences for EBCDIC
This table wasn't being translated into native code points
commit 8225f4a0c60b41fdb1bd4617c0f0e385afef8a2a
Author: Karl Williamson <[email protected]>
Date: Wed Jun 26 13:02:35 2019 -0600
XXX Configure
commit 4ae00defd5122bfe845f6071a97d7adddfcdc95c
Author: Karl Williamson <[email protected]>
Date: Fri Aug 30 10:31:51 2019 -0600
ebcdic bridge alphas
-----------------------------------------------------------------------
--
Perl5 Master Repository