Branch: refs/heads/smoke-me/khw-pr_case Home: https://github.com/Perl/perl5 Commit: c6068af01ae4d99d2b4aae8de7ef1d97c750bb25 https://github.com/Perl/perl5/commit/c6068af01ae4d99d2b4aae8de7ef1d97c750bb25 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-02 (Fri, 02 Oct 2020)
Changed paths: M regen/regcomp.pl M regnodes.h Log Message: ----------- regen/regcomp.pl: Generate #defines for UTF8ness This causes #defines to be generated for regexec.c to use in switch statements, so that for each opcode there that is a case: there are actually 4 cases, for the the target being UTF-8 or not, combined with the pattern being UTF-8 or not. This will be used in future commits to simplify things. Commit: 4c094b098b411e2bbe6715f29fef19e3d60766dd https://github.com/Perl/perl5/commit/4c094b098b411e2bbe6715f29fef19e3d60766dd Author: Karl Williamson <k...@cpan.org> Date: 2020-10-02 (Fri, 02 Oct 2020) Changed paths: M regexec.c Log Message: ----------- regexec.c: S_find_byclass(): utf8ness in switch() This uses the #defines created in the previous commit to make the switch statement in this function incorporate the UTF8ness of both the pattern and the target string. The reason for this is that the first statement in nearly every case of the switch is to test if the target string being matched is UTF-8 or not. By putting that information into the the case number, those conditionals can be eliminated, leading to cleaner, more modular code. I had hoped that this would also improve performance since there are fewer conditionals, but Sergey Aleynikov did performance testing of this change for me, and found no real noticeable gain nor loss. Further, the cases involving matching EXACTish nodes have to also test if the pattern is UTF-8 or not before doing anything else. I added that information as well to the case number, so that those conditionals can be eliminated. For the non-EXACTish nodes, it simply means that that two case statements execute the same code. This is an intermediate commit, which only does the expansion of the current cases into four for each. The refactoring that takes advantage of this is in the following commit. Commit: 3a5975da09ade4744dbb27d8b7d9123bb6345d11 https://github.com/Perl/perl5/commit/3a5975da09ade4744dbb27d8b7d9123bb6345d11 Author: Karl Williamson <k...@cpan.org> Date: 2020-10-02 (Fri, 02 Oct 2020) Changed paths: M regexec.c Log Message: ----------- regexec.c: find_byclass(): Restructure This is a follow-on to the previous commit. The case number of the main switch statement now includes three things: the regnode op, the UTF8ness of the target, and the UTF8ness of the pattern. This allows the conditionals within the previous cases (which only encoded the op), to be removed, and things to be moved around so that there is more fall throughs and fewer gotos, and the macros that are called no longer have to test for UTF8ness; so I teased the UTF8 ones apart from the non_UTF8 ones. Compare: https://github.com/Perl/perl5/compare/c6068af01ae4%5E...3a5975da09ad