Branch: refs/heads/smoke-me/khw-regexec Home: https://github.com/Perl/perl5 Commit: 02036a61eaae3ae36c38c2782259799b99f6b7f8 https://github.com/Perl/perl5/commit/02036a61eaae3ae36c38c2782259799b99f6b7f8 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020)
Changed paths: M cpan/Test-Harness/lib/App/Prove.pm M cpan/Test-Harness/lib/App/Prove/State.pm M cpan/Test-Harness/lib/App/Prove/State/Result.pm M cpan/Test-Harness/lib/App/Prove/State/Result/Test.pm M cpan/Test-Harness/lib/TAP/Base.pm M cpan/Test-Harness/lib/TAP/Formatter/Base.pm M cpan/Test-Harness/lib/TAP/Formatter/Color.pm M cpan/Test-Harness/lib/TAP/Formatter/Console.pm M cpan/Test-Harness/lib/TAP/Formatter/Console/ParallelSession.pm M cpan/Test-Harness/lib/TAP/Formatter/Console/Session.pm M cpan/Test-Harness/lib/TAP/Formatter/File.pm M cpan/Test-Harness/lib/TAP/Formatter/File/Session.pm M cpan/Test-Harness/lib/TAP/Formatter/Session.pm M cpan/Test-Harness/lib/TAP/Harness.pm M cpan/Test-Harness/lib/TAP/Harness/Env.pm M cpan/Test-Harness/lib/TAP/Object.pm M cpan/Test-Harness/lib/TAP/Parser.pm M cpan/Test-Harness/lib/TAP/Parser/Aggregator.pm M cpan/Test-Harness/lib/TAP/Parser/Grammar.pm M cpan/Test-Harness/lib/TAP/Parser/Iterator.pm M cpan/Test-Harness/lib/TAP/Parser/Iterator/Array.pm M cpan/Test-Harness/lib/TAP/Parser/Iterator/Process.pm M cpan/Test-Harness/lib/TAP/Parser/Iterator/Stream.pm M cpan/Test-Harness/lib/TAP/Parser/IteratorFactory.pm M cpan/Test-Harness/lib/TAP/Parser/Multiplexer.pm M cpan/Test-Harness/lib/TAP/Parser/Result.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Bailout.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Comment.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Plan.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Pragma.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Test.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Unknown.pm M cpan/Test-Harness/lib/TAP/Parser/Result/Version.pm M cpan/Test-Harness/lib/TAP/Parser/Result/YAML.pm M cpan/Test-Harness/lib/TAP/Parser/ResultFactory.pm M cpan/Test-Harness/lib/TAP/Parser/Scheduler.pm M cpan/Test-Harness/lib/TAP/Parser/Scheduler/Job.pm M cpan/Test-Harness/lib/TAP/Parser/Scheduler/Spinner.pm M cpan/Test-Harness/lib/TAP/Parser/Source.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Executable.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/File.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Handle.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Perl.pm M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/RawTAP.pm M cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm M cpan/Test-Harness/lib/TAP/Parser/YAMLish/Writer.pm M cpan/Test-Harness/lib/Test/Harness.pm Log Message: ----------- TAP::Harness: Move timer initialization Prior to this commit, the timers for counting elapsed time and CPU usage were begun when a job's first output appears. This yields inaccurate results. These results are saved in t/test_state for future runs so that they can start the longest-running tests first, which leads to using the available cores more efficiently. (If you start a long running test after everything else is nearly done, you have to wait for it to finish before the suite as a whole is; if you start the long ones first, and the shortest last, you don't have to wait very long for any stragglers to complete.) Inaccurate results here lead to this situation, which we were often seeing in the podcheck.t test. The worst case is if there is heavy computation at the beginning of the test being run. podcheck, for example, examines all the pods in the directory structure to find which links to other pods do or do not have corresponding anchors. Output doesn't happen until the analysis is complete. On my system, this takes over 30 seconds, but prior to this commit, what was noted was just the time required to do the output, about 200 milliseconds. The result was that podcheck was viewed as being one of the shortest tests run, so was started late in the process, and generally held up the completion of it. This commit by itself doesn't improve the test completion very much, because, test tests are run a whole directory at a time, and the directory podcheck is in, for example, is run last. The next commit addresses that. Commit: 9a3eb2657e35eff13204624da8751c993bcb4ede https://github.com/Perl/perl5/commit/9a3eb2657e35eff13204624da8751c993bcb4ede Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M t/harness Log Message: ----------- XXX env name: t/harness: Add option for faster test suite execution This commit adds an environment variable, XXX, which if set to non-zero increases the parallelism in the execution of the test suite, speeding it up on systems with multiple cores. Normally, there are two main test sections, one for core and the second for non-core tests, and the testing of the non-core one doesn't begin until the first is complete. Within each section, there are a number of test categories, like 're' for regular expressions, and 'JSON::PP' for the pure perl implementation of JSON. Within each category, there are various single .t test files. Some categories can have those be tested in parallel; some require them to be done in a particular order, say because an earlier .t does setup for subsequent ones. We already have this capability. Completion of all the tests in a category is not needed before those of another category can be started. This is how it already works. However, the core section categories are ordered so that they begin in a logical order for someone trying to get perl to work. First to start are the basic sanity tests, then by roughly decreasing order of widespread use in perl programs in the wild, with the final two categories, porting and perf, being mainly of use to perl5 porters. These two categories aren't started until all the tests in the earlier categories are started. We have some long running tests in those two categories, and generally they delay the start of the entire second section. If those long running tests could be started sooner, shorter tests in the first section could be run in parallel with them, increasing the average CPU utilization, and the second section could begin (and hence end) earlier, shortening the total elapsed execution time of the entire suite. The second section has some very long running tests. JSON-PP is one of them. If it could run in parallel with tests from the first section, that would also speed up the completion of the suite. The environment variable added by this commit does both things. The basic sanity test categories in the first section continue to be started before anything else. But then all other tests are run in decreasing order of elapsed time they take to run, removing the boundaries between some categories, and between the two sections. The gain from this increases as the number of jobs run in parallel does; slower high core platforms have the highest increase. On the old dromedary with 24 cores, the gain is 20%, almost 2 minutes. On my more modern box with 12 cores, it is 8%. Commit: f7a5dcfa401b56f9fc28fa8446b3200f70331218 https://github.com/Perl/perl5/commit/f7a5dcfa401b56f9fc28fa8446b3200f70331218 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regexp.h Log Message: ----------- regexp.h: White-space only Indent preprocessor lines for clarity of program structure Commit: 32d6fac25622f4ed54e16b70e613204307b1d16c https://github.com/Perl/perl5/commit/32d6fac25622f4ed54e16b70e613204307b1d16c Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regen/unicode_constants.pl M unicode_constants.h Log Message: ----------- regen/unicode_constants.pl: Add a couple constants which will be needed in a future commit Commit: d7fb133f87023fb4eaee9ae0b72407428277dd48 https://github.com/Perl/perl5/commit/d7fb133f87023fb4eaee9ae0b72407428277dd48 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcomp.c Log Message: ----------- regcomp.c: Clarify comment Commit: 620d4b2023f9bf90fef91f7a71956fae823b1933 https://github.com/Perl/perl5/commit/620d4b2023f9bf90fef91f7a71956fae823b1933 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M pod/perldebguts.pod M regcomp.sym M regnodes.h Log Message: ----------- regcomp.sym: Update node comments Commit: bac230f5b9a9debb882822e6af75f5612a3ccee4 https://github.com/Perl/perl5/commit/bac230f5b9a9debb882822e6af75f5612a3ccee4 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M pod/perldebguts.pod M regcomp.sym M regnodes.h Log Message: ----------- regcomp.sym: Make adjacent opcodes for 2 similar regnodes These are often tested together. By making them adjacent we can use inRANGE. Commit: 9101236bad2fb028bc7c4ea6a88f589a82540718 https://github.com/Perl/perl5/commit/9101236bad2fb028bc7c4ea6a88f589a82540718 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcomp.c Log Message: ----------- regcomp.c: Simplify The previous commit made the opcodes for two regops adjacent, so that we can refer to them by a single range. This commit takes advantage of that change. Commit: de1af24502acd3a3745f617691ecae58b4ea57f2 https://github.com/Perl/perl5/commit/de1af24502acd3a3745f617691ecae58b4ea57f2 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M globvar.sym M pod/perldebguts.pod M regcomp.sym M regen/regcomp.pl M regnodes.h Log Message: ----------- regnodes.h: Add two convenience bit masks These categorize the many types of EXACT nodes, so that code can refer to a particular subset of such nodes without having to list all of them out. This simplifies some 'if' statements, and makes updating things easier. Commit: cd13dde67b36e66479bfb49edc80d8a6230c7745 https://github.com/Perl/perl5/commit/cd13dde67b36e66479bfb49edc80d8a6230c7745 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcomp.c M regexec.c Log Message: ----------- regcomp.c,regexec.c: Simplify This commit uses the new macros from the previous commit to simply come code. Commit: ffcee89f2d31e05e76006937a0af324474a6f455 https://github.com/Perl/perl5/commit/ffcee89f2d31e05e76006937a0af324474a6f455 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcomp.c Log Message: ----------- regcomp.c: Simplify This was a case statement of every type of EXACTish node. Instead, there is a simple way to see if something is EXACTish. Commit: 4cb782acc36464022756dd79966b152527cb629d https://github.com/Perl/perl5/commit/4cb782acc36464022756dd79966b152527cb629d Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass.pl Log Message: ----------- regen/regcharclass.pl: Change member to method This will allow more flexibility in future commits to instead of using a static format, to use one based on the input value. The only non-white space change from this commit, is the reordering of a couple tests; I'm not sure why that happened. Commit: be056bf42f74ce7dcfd902f0ef0d968a15d38770 https://github.com/Perl/perl5/commit/be056bf42f74ce7dcfd902f0ef0d968a15d38770 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass.pl Log Message: ----------- regen/regcharclass.pl: Move parameter to caller This commit changes a sub in this file to be passed a new parameter. This is in preparation for the value to be used in the caller. No need to derive it twice. Commit: ba162872a9013d5671fc12091275daf4f1d39b58 https://github.com/Perl/perl5/commit/ba162872a9013d5671fc12091275daf4f1d39b58 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass.pl Log Message: ----------- regen/regcharclass.pl: Use char instead of hex This changes the generated macros to use a printable character or mnemonic instead of a hex value. This makes the macros easier to read. Commit: 99797d920b81670d4566bde637e990bd0de756aa https://github.com/Perl/perl5/commit/99797d920b81670d4566bde637e990bd0de756aa Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regcomp.c M regen/regcharclass_multi_char_folds.pl Log Message: ----------- regcharclass.h: multi-folds: Add some unfoldeds Prior to this commit, the generated macros for dealing with multi-char folds in UTF-8 strings only recognized completely folded strings. This commit changes that to add the uppercase for characters in the Latin1 range. Hopefully an example will clarify. The fold for U+0130: LATIN CAPITAL LETTER I WITH DOT ABOVE is 'i' followed by U+0307: COMBINING DOT ABOVE. But since we are doing /i matching, an 'I' followed by U+307 should also match. This commit changes the macros to know this. Before this, if the fold were entirely ASCII, the macros would know all the possible combinations. This commit extends that to all code points < 256. (Since there are no folds to the upper latin1 range), that really means all code points below 128. But making it general means it wouldn't have to be revised if a fold were ever added to the upper half range.) The reason to make this change is that it makes some future code less complicated. And it adds very little complexity to the generated macros; less than the code it will save. I originally thought it would be more complext than it now turns out to be. Much of that is because the infrastructure has advanced since that decision. I couldn't find any current places that this change will allow to be simplified. There could be if the macros were extended to do this on all code points, not just the low ones. I tried that, but the generated macros were at least 400 lines longer than before. That does add significant complexity, so I backed that out. Commit: 524ab4373f08411ffaa5cfc30184e2b3fda05344 https://github.com/Perl/perl5/commit/524ab4373f08411ffaa5cfc30184e2b3fda05344 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass_multi_char_folds.pl Log Message: ----------- regen/regcharclass_multi_char_folds.pl: White space, comment only Outdent and remove lines from changes in the previous commit. Commit: 116b6a448b8ef39d38b7f12d977578f0edcd40af https://github.com/Perl/perl5/commit/116b6a448b8ef39d38b7f12d977578f0edcd40af Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass_multi_char_folds.pl Log Message: ----------- regen/regcharclass_multi_char_folds.pl: Use case fold Prior to this commit, only the upper case of Latin1 characters was dealt with. But we really want case folding, and there are a few other characters that fold to Latin1. This commit acknowledges them. Commit: 3750a4d09efbc0025ff925a5a93fc45494825f43 https://github.com/Perl/perl5/commit/3750a4d09efbc0025ff925a5a93fc45494825f43 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass.pl Log Message: ----------- regen/regcharclass.pl: Rmv unused macro Commit: eccf49a45f3ed41a272d48da3c11a2c0ed54afdb https://github.com/Perl/perl5/commit/eccf49a45f3ed41a272d48da3c11a2c0ed54afdb Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcharclass.h M regen/regcharclass.pl Log Message: ----------- regen/regcharclass.pl: White space only This does some line wrapping, etc Commit: f3c196e32d4bdcbe02b14089b9eace4e241447dd https://github.com/Perl/perl5/commit/f3c196e32d4bdcbe02b14089b9eace4e241447dd Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M charclass_invlists.h M lib/unicore/uni_keywords.pl M regen/mk_invlists.pl M uni_keywords.h Log Message: ----------- charclass_invlists.h: Add some inverse folds. The MICRO SIGN folds to above the Latin1 range, the only character that does so in Unicode (or ever likely to). This requires special handling. This commit reduces some of the need for that handling by creating the inversion map for it, which can be used in certain instances in pattern matching, without having to have a special case. The actual use of this will come in a future commit. Commit: e26f33fe67b1a550702845eeb50507faae1ee1d5 https://github.com/Perl/perl5/commit/e26f33fe67b1a550702845eeb50507faae1ee1d5 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regexec.c Log Message: ----------- regexec.c: Rename local variable; change type I found myself getting confused, as this most likely was named before UTF-8 came along. It actually is just a byte, plus an out-of-bounds value. While I'm at it, I'm also changing the type from I32, to the perl equivalent of the C99 'int_fast16_t', as it doesn't need to be 32 bits, and we should let the compiler choose what size is the most efficient that still meets our needs. Commit: 6cc0a560e4c002e1492901e360d92132e6bfb802 https://github.com/Perl/perl5/commit/6cc0a560e4c002e1492901e360d92132e6bfb802 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regexec.c Log Message: ----------- regexec.c: Change variable name in a function This makes it like a corresponding variable. Commit: 22678b3e7ab696a79143b7099bf54ef7887e03ab https://github.com/Perl/perl5/commit/22678b3e7ab696a79143b7099bf54ef7887e03ab Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regexec.c Log Message: ----------- regexec.c: Store expression in a variable This makes the text look cleaner, and prepares for a future commit, where we will want to change the variable (which can't be done with the expression). Commit: 5c2d63db95126d94e2032bbe18087a6885b88d73 https://github.com/Perl/perl5/commit/5c2d63db95126d94e2032bbe18087a6885b88d73 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M regcomp.c Log Message: ----------- regcomp.c: Do some extra folding Generally we have to wait until runtime to do folding for regnodes that are locale dependent, because we don't know what the locale at runtime will be, and hence what the folds will be. But UTF-8 locales all have the same folding behavior, no matter what the locale is, with the exception of two fold pairs in Turkish. (Lithuanian too, but Perl doesn't support that language's special folding rules.) UTF-8 is the only locale type that Perl supports that can represent code points above 255. Therefore we do know at compile time what the above-255 folds are (again excepting the two in Turkish), and so we can do the folding then. But only if both the components are above 255. There are a few folds that cross the 255/256 boundary, and they must be deferred. However, there are two instances where there are three characters that fold together in which two of them are above 255, and the third isn't. That the two high ones are equivalent under /i is known at compile time, and so that equivalence can be stated then. Commit: 6971bbdab53ad1403c9c84522e1f2c5c09ee8426 https://github.com/Perl/perl5/commit/6971bbdab53ad1403c9c84522e1f2c5c09ee8426 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-15 (Thu, 15 Oct 2020) Changed paths: M cv.h M ext/B/Makefile.PL M op.h Log Message: ----------- XXX Fix B broken C parser Should these defines not be EXT? Compare: https://github.com/Perl/perl5/compare/3e2dd33664dd...6971bbdab53a