Branch: refs/heads/smoke-me/khw-pr_case
  Home:   https://github.com/Perl/perl5
  Commit: c6068af01ae4d99d2b4aae8de7ef1d97c750bb25
      
https://github.com/Perl/perl5/commit/c6068af01ae4d99d2b4aae8de7ef1d97c750bb25
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-02 (Fri, 02 Oct 2020)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl: Generate #defines for UTF8ness

This causes #defines to be generated for regexec.c to use in switch
statements, so that for each opcode there that is a case: there are
actually 4 cases, for the the target being UTF-8 or not, combined with
the pattern being UTF-8 or not.

This will be used in future commits to simplify things.


  Commit: 4c094b098b411e2bbe6715f29fef19e3d60766dd
      
https://github.com/Perl/perl5/commit/4c094b098b411e2bbe6715f29fef19e3d60766dd
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-02 (Fri, 02 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: S_find_byclass(): utf8ness in switch()

This uses the #defines created in the previous commit to make the switch
statement in this function incorporate the UTF8ness of both the pattern
and the target string.

The reason for this is that the first statement in nearly every case of
the switch is to test if the target string being matched is UTF-8 or
not.  By putting that information into the the case number, those
conditionals can be eliminated, leading to cleaner, more modular code.
I had hoped that this would also improve performance since there are
fewer conditionals, but Sergey Aleynikov did performance testing of this
change for me, and found no real noticeable gain nor loss.

Further, the cases involving matching EXACTish nodes have to also test
if the pattern is UTF-8 or not before doing anything else.  I added that
information as well to the case number, so that those conditionals can
be eliminated.  For the non-EXACTish nodes, it simply means that that
two case statements execute the same code.

This is an intermediate commit, which only does the expansion of the
current cases into four for each.  The refactoring that takes advantage
of this is in the following commit.


  Commit: 3a5975da09ade4744dbb27d8b7d9123bb6345d11
      
https://github.com/Perl/perl5/commit/3a5975da09ade4744dbb27d8b7d9123bb6345d11
  Author: Karl Williamson <k...@cpan.org>
  Date:   2020-10-02 (Fri, 02 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: find_byclass(): Restructure

This is a follow-on to the previous commit.  The case number of the main
switch statement now includes three things: the regnode op, the UTF8ness
of the target, and the UTF8ness of the pattern.

This allows the conditionals within the previous cases (which only
encoded the op), to be removed, and things to be moved around so that
there is more fall throughs and fewer gotos, and the macros that are
called no longer have to test for UTF8ness; so I teased the UTF8 ones
apart from the non_UTF8 ones.


Compare: https://github.com/Perl/perl5/compare/c6068af01ae4%5E...3a5975da09ad

Reply via email to