Branch: refs/heads/smoke-me/khw-regexec
  Home:   https://github.com/Perl/perl5
  Commit: 02036a61eaae3ae36c38c2782259799b99f6b7f8
      
https://github.com/Perl/perl5/commit/02036a61eaae3ae36c38c2782259799b99f6b7f8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M cpan/Test-Harness/lib/App/Prove.pm
    M cpan/Test-Harness/lib/App/Prove/State.pm
    M cpan/Test-Harness/lib/App/Prove/State/Result.pm
    M cpan/Test-Harness/lib/App/Prove/State/Result/Test.pm
    M cpan/Test-Harness/lib/TAP/Base.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Base.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Color.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Console.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Console/ParallelSession.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Console/Session.pm
    M cpan/Test-Harness/lib/TAP/Formatter/File.pm
    M cpan/Test-Harness/lib/TAP/Formatter/File/Session.pm
    M cpan/Test-Harness/lib/TAP/Formatter/Session.pm
    M cpan/Test-Harness/lib/TAP/Harness.pm
    M cpan/Test-Harness/lib/TAP/Harness/Env.pm
    M cpan/Test-Harness/lib/TAP/Object.pm
    M cpan/Test-Harness/lib/TAP/Parser.pm
    M cpan/Test-Harness/lib/TAP/Parser/Aggregator.pm
    M cpan/Test-Harness/lib/TAP/Parser/Grammar.pm
    M cpan/Test-Harness/lib/TAP/Parser/Iterator.pm
    M cpan/Test-Harness/lib/TAP/Parser/Iterator/Array.pm
    M cpan/Test-Harness/lib/TAP/Parser/Iterator/Process.pm
    M cpan/Test-Harness/lib/TAP/Parser/Iterator/Stream.pm
    M cpan/Test-Harness/lib/TAP/Parser/IteratorFactory.pm
    M cpan/Test-Harness/lib/TAP/Parser/Multiplexer.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Bailout.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Comment.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Plan.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Pragma.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Test.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Unknown.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/Version.pm
    M cpan/Test-Harness/lib/TAP/Parser/Result/YAML.pm
    M cpan/Test-Harness/lib/TAP/Parser/ResultFactory.pm
    M cpan/Test-Harness/lib/TAP/Parser/Scheduler.pm
    M cpan/Test-Harness/lib/TAP/Parser/Scheduler/Job.pm
    M cpan/Test-Harness/lib/TAP/Parser/Scheduler/Spinner.pm
    M cpan/Test-Harness/lib/TAP/Parser/Source.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Executable.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/File.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Handle.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/Perl.pm
    M cpan/Test-Harness/lib/TAP/Parser/SourceHandler/RawTAP.pm
    M cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm
    M cpan/Test-Harness/lib/TAP/Parser/YAMLish/Writer.pm
    M cpan/Test-Harness/lib/Test/Harness.pm

  Log Message:
  -----------
  TAP::Harness: Move timer initialization

Prior to this commit, the timers for counting elapsed time and CPU usage
were begun when a job's first output appears.  This yields inaccurate
results.  These results are saved in t/test_state for future runs so
that they can start the longest-running tests first, which leads to
using the available cores more efficiently.  (If you start a long running
test after everything else is nearly done, you have to wait for it to
finish before the suite as a whole is; if you start the long ones first,
and the shortest last, you don't have to wait very long for any
stragglers to complete.)  Inaccurate results here lead to this
situation, which we were often seeing in the podcheck.t test.

The worst case is if there is heavy computation at the beginning of the
test being run.  podcheck, for example, examines all the pods in the
directory structure to find which links to other pods do or do not have
corresponding anchors.  Output doesn't happen until the analysis is
complete.  On my system, this takes over 30 seconds, but prior to this
commit, what was noted was just the time required to do the output,
about 200 milliseconds.  The result was that podcheck was viewed as
being one of the shortest tests run, so was started late in the process,
and generally held up the completion of it.

This commit by itself doesn't improve the test completion very much,
because, test tests are run a whole directory at a time, and the
directory podcheck is in, for example, is run last.  The next commit
addresses that.


  Commit: 9a3eb2657e35eff13204624da8751c993bcb4ede
      
https://github.com/Perl/perl5/commit/9a3eb2657e35eff13204624da8751c993bcb4ede
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M t/harness

  Log Message:
  -----------
  XXX env name: t/harness: Add option for faster test suite execution

This commit adds an environment variable, XXX, which if set to non-zero
increases the parallelism in the execution of the test suite, speeding
it up on systems with multiple cores.

Normally, there are two main test sections, one for core and the second
for non-core tests, and the testing of the non-core one doesn't begin
until the first is complete.  Within each section, there are a number of
test categories, like 're' for regular expressions, and 'JSON::PP' for
the pure perl implementation of JSON.

Within each category, there are various single .t test files.  Some
categories can have those be tested in parallel; some require them to be
done in a particular order, say because an earlier .t does setup for
subsequent ones.  We already have this capability.

Completion of all the tests in a category is not needed before those of
another category can be started.  This is how it already works.

However, the core section categories are ordered so that they begin in a
logical order for someone trying to get perl to work.  First to start
are the basic sanity tests, then by roughly decreasing order of
widespread use in perl programs in the wild, with the final two
categories, porting and perf, being mainly of use to perl5 porters.
These two categories aren't started until all the tests in the earlier
categories are started.  We have some long running tests in those two
categories, and generally they delay the start of the entire second section.

If those long running tests could be started sooner, shorter tests in
the first section could be run in parallel with them, increasing the
average CPU utilization, and the second section could begin (and hence
end) earlier, shortening the total elapsed execution time of the entire
suite.

The second section has some very long running tests.  JSON-PP is one of
them.  If it could run in parallel with tests from the first section,
that would also speed up the completion of the suite.

The environment variable added by this commit does both things.  The
basic sanity test categories in the first section continue to be started
before anything else.  But then all other tests are run in decreasing
order of elapsed time they take to run, removing the boundaries between
some categories, and between the two sections.

The gain from this increases as the number of jobs run in parallel does;
slower high core platforms have the highest increase.  On the old
dromedary with 24 cores, the gain is 20%, almost 2 minutes.  On my more
modern box with 12 cores, it is 8%.


  Commit: f7a5dcfa401b56f9fc28fa8446b3200f70331218
      
https://github.com/Perl/perl5/commit/f7a5dcfa401b56f9fc28fa8446b3200f70331218
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regexp.h

  Log Message:
  -----------
  regexp.h: White-space only

Indent preprocessor lines for clarity of program structure


  Commit: 32d6fac25622f4ed54e16b70e613204307b1d16c
      
https://github.com/Perl/perl5/commit/32d6fac25622f4ed54e16b70e613204307b1d16c
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regen/unicode_constants.pl
    M unicode_constants.h

  Log Message:
  -----------
  regen/unicode_constants.pl: Add a couple constants

which will be needed in a future commit


  Commit: d7fb133f87023fb4eaee9ae0b72407428277dd48
      
https://github.com/Perl/perl5/commit/d7fb133f87023fb4eaee9ae0b72407428277dd48
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: Clarify comment


  Commit: 620d4b2023f9bf90fef91f7a71956fae823b1933
      
https://github.com/Perl/perl5/commit/620d4b2023f9bf90fef91f7a71956fae823b1933
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M pod/perldebguts.pod
    M regcomp.sym
    M regnodes.h

  Log Message:
  -----------
  regcomp.sym: Update node comments


  Commit: bac230f5b9a9debb882822e6af75f5612a3ccee4
      
https://github.com/Perl/perl5/commit/bac230f5b9a9debb882822e6af75f5612a3ccee4
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M pod/perldebguts.pod
    M regcomp.sym
    M regnodes.h

  Log Message:
  -----------
  regcomp.sym: Make adjacent opcodes for 2 similar regnodes

These are often tested together.  By making them adjacent we can use
inRANGE.


  Commit: 9101236bad2fb028bc7c4ea6a88f589a82540718
      
https://github.com/Perl/perl5/commit/9101236bad2fb028bc7c4ea6a88f589a82540718
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: Simplify

The previous commit made the opcodes for two regops adjacent, so that we
can refer to them by a single range.  This commit takes  advantage of
that change.


  Commit: de1af24502acd3a3745f617691ecae58b4ea57f2
      
https://github.com/Perl/perl5/commit/de1af24502acd3a3745f617691ecae58b4ea57f2
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M globvar.sym
    M pod/perldebguts.pod
    M regcomp.sym
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regnodes.h: Add two convenience bit masks

These categorize the many types of EXACT nodes, so that code can refer
to a particular subset of such nodes without having to list all of them
out.  This simplifies some 'if' statements, and makes updating things
easier.


  Commit: cd13dde67b36e66479bfb49edc80d8a6230c7745
      
https://github.com/Perl/perl5/commit/cd13dde67b36e66479bfb49edc80d8a6230c7745
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcomp.c
    M regexec.c

  Log Message:
  -----------
  regcomp.c,regexec.c: Simplify

This commit uses the new macros from the previous commit to simply come
code.


  Commit: ffcee89f2d31e05e76006937a0af324474a6f455
      
https://github.com/Perl/perl5/commit/ffcee89f2d31e05e76006937a0af324474a6f455
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: Simplify

This was a case statement of every type of EXACTish node.  Instead,
there is a simple way to see if something is EXACTish.


  Commit: 4cb782acc36464022756dd79966b152527cb629d
      
https://github.com/Perl/perl5/commit/4cb782acc36464022756dd79966b152527cb629d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass.pl

  Log Message:
  -----------
  regen/regcharclass.pl: Change member to method

This will allow more flexibility in future commits to instead of using a
static format, to use one based on the input value.

The only non-white space change from this commit, is the reordering of a
couple tests; I'm not sure why that happened.


  Commit: be056bf42f74ce7dcfd902f0ef0d968a15d38770
      
https://github.com/Perl/perl5/commit/be056bf42f74ce7dcfd902f0ef0d968a15d38770
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass.pl

  Log Message:
  -----------
  regen/regcharclass.pl: Move parameter to caller

This commit changes a sub in this file to be passed a new parameter.
This is in preparation for the value to be used in the caller.  No need
to derive it twice.


  Commit: ba162872a9013d5671fc12091275daf4f1d39b58
      
https://github.com/Perl/perl5/commit/ba162872a9013d5671fc12091275daf4f1d39b58
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass.pl

  Log Message:
  -----------
  regen/regcharclass.pl: Use char instead of hex

This changes the generated macros to use a printable character or
mnemonic instead of a hex value.  This makes the macros easier to read.


  Commit: 99797d920b81670d4566bde637e990bd0de756aa
      
https://github.com/Perl/perl5/commit/99797d920b81670d4566bde637e990bd0de756aa
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regcomp.c
    M regen/regcharclass_multi_char_folds.pl

  Log Message:
  -----------
  regcharclass.h: multi-folds: Add some unfoldeds

Prior to this commit, the generated macros for dealing with multi-char
folds in UTF-8 strings only recognized completely folded strings.  This
commit changes that to add the uppercase for characters in the Latin1
range.  Hopefully an example will clarify.

The fold for U+0130: LATIN CAPITAL LETTER I WITH DOT ABOVE is 'i'
followed by U+0307: COMBINING DOT ABOVE.  But since we are doing /i
matching, an 'I' followed by U+307 should also match.  This commit
changes the macros to know this.  Before this, if the fold were entirely
ASCII, the macros would know all the possible combinations.  This commit
extends that to all code points < 256.  (Since there are no folds to the
upper latin1 range), that really means all code points below 128.  But
making it general means it wouldn't have to be revised if a fold were
ever added to the upper half range.)

The reason to make this change is that it makes some future code less
complicated.  And it adds very little complexity to the generated
macros; less than the code it will save.  I originally thought it would
be more complext than it now turns out to be.  Much of that is because
the infrastructure has advanced since that decision.

I couldn't find any current places that this change will allow to be
simplified.  There could be if the macros were extended to do this on
all code points, not just the low ones.  I tried that, but the generated
macros were at least 400 lines longer than before.  That does add
significant complexity, so I backed that out.


  Commit: 524ab4373f08411ffaa5cfc30184e2b3fda05344
      
https://github.com/Perl/perl5/commit/524ab4373f08411ffaa5cfc30184e2b3fda05344
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass_multi_char_folds.pl

  Log Message:
  -----------
  regen/regcharclass_multi_char_folds.pl: White space, comment only

Outdent and remove lines from changes in the previous commit.


  Commit: 116b6a448b8ef39d38b7f12d977578f0edcd40af
      
https://github.com/Perl/perl5/commit/116b6a448b8ef39d38b7f12d977578f0edcd40af
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass_multi_char_folds.pl

  Log Message:
  -----------
  regen/regcharclass_multi_char_folds.pl: Use case fold

Prior to this commit, only the upper case of Latin1 characters was dealt
with.  But we really want case folding, and there are a few other
characters that fold to Latin1.  This commit acknowledges them.


  Commit: 3750a4d09efbc0025ff925a5a93fc45494825f43
      
https://github.com/Perl/perl5/commit/3750a4d09efbc0025ff925a5a93fc45494825f43
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass.pl

  Log Message:
  -----------
  regen/regcharclass.pl: Rmv unused macro


  Commit: eccf49a45f3ed41a272d48da3c11a2c0ed54afdb
      
https://github.com/Perl/perl5/commit/eccf49a45f3ed41a272d48da3c11a2c0ed54afdb
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcharclass.h
    M regen/regcharclass.pl

  Log Message:
  -----------
  regen/regcharclass.pl: White space only

This does some line wrapping, etc


  Commit: f3c196e32d4bdcbe02b14089b9eace4e241447dd
      
https://github.com/Perl/perl5/commit/f3c196e32d4bdcbe02b14089b9eace4e241447dd
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  charclass_invlists.h: Add some inverse folds.

The MICRO SIGN folds to above the Latin1 range, the only character that
does so in Unicode (or ever likely to).  This requires special handling.
This commit reduces some of the need for that handling by creating the
inversion map for it, which can be used in certain instances in pattern
matching, without having to have a special case.  The actual use of this
will come in a future commit.


  Commit: e26f33fe67b1a550702845eeb50507faae1ee1d5
      
https://github.com/Perl/perl5/commit/e26f33fe67b1a550702845eeb50507faae1ee1d5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Rename local variable; change type

I found myself getting confused, as this most likely was named before
UTF-8 came along.  It actually is just a byte, plus an out-of-bounds
value.

While I'm at it, I'm also changing the type from I32, to the perl
equivalent of the C99 'int_fast16_t', as it doesn't need to be 32 bits,
and we should let the compiler choose what size is the most efficient
that still meets our needs.


  Commit: 6cc0a560e4c002e1492901e360d92132e6bfb802
      
https://github.com/Perl/perl5/commit/6cc0a560e4c002e1492901e360d92132e6bfb802
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Change variable name in a function

This makes it like a corresponding variable.


  Commit: 22678b3e7ab696a79143b7099bf54ef7887e03ab
      
https://github.com/Perl/perl5/commit/22678b3e7ab696a79143b7099bf54ef7887e03ab
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c:  Store expression in a variable

This makes the text look cleaner, and prepares for a future commit,
where we will want to change the variable (which can't be done with the
expression).


  Commit: 5c2d63db95126d94e2032bbe18087a6885b88d73
      
https://github.com/Perl/perl5/commit/5c2d63db95126d94e2032bbe18087a6885b88d73
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c: Do some extra folding

Generally we have to wait until runtime to do folding for regnodes that
are locale dependent, because we don't know what the locale at runtime
will be, and hence what the folds will be.

But UTF-8 locales all have the same folding behavior, no matter what the
locale is, with the exception of two fold pairs in Turkish.  (Lithuanian
too, but Perl doesn't support that language's special folding rules.)
UTF-8 is the only locale type that Perl supports that can represent code
points above 255.  Therefore we do know at compile time what the
above-255 folds are (again excepting the two in Turkish), and so we can
do the folding then.  But only if both the components are above 255.
There are a few folds that cross the 255/256 boundary, and they must be
deferred.

However, there are two instances where there are three characters that
fold together in which two of them are above 255, and the third isn't.
That the two high ones are equivalent under /i is known at compile time,
and so that equivalence can be stated then.


  Commit: 6971bbdab53ad1403c9c84522e1f2c5c09ee8426
      
https://github.com/Perl/perl5/commit/6971bbdab53ad1403c9c84522e1f2c5c09ee8426
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-15 (Thu, 15 Oct 2020)

  Changed paths:
    M cv.h
    M ext/B/Makefile.PL
    M op.h

  Log Message:
  -----------
  XXX Fix B broken C parser

Should these defines not be EXT?


Compare: https://github.com/Perl/perl5/compare/3e2dd33664dd...6971bbdab53a

Reply via email to