In perl.git, the branch smoke-me/khw-masked has been created <https://perl5.git.perl.org/perl.git/commitdiff/b8d3d6be94068ca0adf73db3e909ab2d6c1ca694?hp=0000000000000000000000000000000000000000>
at b8d3d6be94068ca0adf73db3e909ab2d6c1ca694 (commit) - Log ----------------------------------------------------------------- commit b8d3d6be94068ca0adf73db3e909ab2d6c1ca694 Author: Karl Williamson <k...@cpan.org> Date: Sun Feb 4 19:43:00 2018 -0700 regcomp.c: Under/i segregate folding vs non-folding characters For matching sequences of characters, the regex compiler generates various flavors of EXACT-type nodes. The optimizer uses those nodes to look for sequences that must be in the matched string. In this way, the pattern matching engine may be able to quickly rule out any possible match altogether, or to narrow down the places in the target string that might match. Under /i matching, this generally has not been possible, because there is no fixed string that the optimizer can grab onto, as something can match, say, either 'A' or 'a', etc. However, in many patterns that contain text, there are characters that are fixed even under /i. Things like tabs, space, and punctuation, for example. This commit segregates such folding vs non-folding characters into separate nodes. I proposed this 7 months ago: http://nntp.perl.org/group/perl.perl5.porters/245342 and in talking with Yves recently, decided to go ahead with it. In the proposal of July, I suggested that a new node type be used to mark those nodes which are under /i but contain no characters that match other than themselves under /i. These nodes could be joined with either a plain EXACT node, or an EXACTFish one to create longer nodes. The reason for joining is that there is overhead in the engine whenever we switch to the next node. But the reason for not doing it is that it is slower to match /i nodes than ones that an memEQ can be used on. I suppose we could join short nodes, and leave longer ones separate, but that decision can be deferred based on real-world experience. This patch also consolidates into one place the handling of the Latin Sharp S, in order to avoid extra #ifdefs, and cause the logic to be linearly shown. ----------------------------------------------------------------------- -- Perl5 Master Repository