In perl.git, the branch smoke-me/khw-anyof has been created <https://perl5.git.perl.org/perl.git/commitdiff/abce1e48ab079baffb02ae6c44a73dd56ca00b7d?hp=0000000000000000000000000000000000000000>
at abce1e48ab079baffb02ae6c44a73dd56ca00b7d (commit) - Log ----------------------------------------------------------------- commit abce1e48ab079baffb02ae6c44a73dd56ca00b7d Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 17:05:50 2018 -0700 f commit 0883c204eb7c5f979db4ef57ecc3502317d8bb29 Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 17:05:20 2018 -0700 f commit 4d2b534b9bdf5352eb504a7301cc0a14fb0874b0 Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 16:57:17 2018 -0700 regen/mk_invlists.pl: Add new table This table contains all the code points that are in any multi-character fold (not the folded-from character, but what that character folds to). It will be used in a future commit. commit f1b6826efbd73bffc5c521ceb595c1da4cbf13b2 Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 16:53:23 2018 -0700 regen/mk_invlists.pl: Rmv no longer used array commit 2530835317281049140c0ee7cdcde6e3312117d5 Author: Karl Williamson <k...@cpan.org> Date: Mon Nov 26 20:16:09 2018 -0700 XXX need to do process; figure name Configure Fix alignment needed probe commit b2e4f00ee2ccc0db0570a9d3b89c042be4bcb7ae Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 13:53:20 2018 -0700 regcomp.c: Remove no longer used static function commit deff9afede8e5d9f0cd20482df155856b5511224 Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 13:25:13 2018 -0700 f later commit f3f885228835ec8cebfb781cdc0d62ff869a4f97 Author: Karl Williamson <k...@cpan.org> Date: Thu Dec 6 09:28:42 2018 -0700 change engine size commit 760f2beeb0d46964d30041c62e5426d899612f55 Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 28 08:50:06 2018 -0700 XXX tests: Revamp compile optimizations of /[bar]/ commit d37cbc3506fc7249c9cdb92ed42d22c0008aa187 Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 28 08:40:29 2018 -0700 regcomp.c: White-space, comments only commit 364ecd24b6149eb96ed12bd6efb5e1addb1bc55b Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 28 08:13:31 2018 -0700 regcomp.c: Add variable that is an OR of several This makes the code easier to read, as it summarizes the purposes of the three commit 3be5bee79248f9c0932eba14c472dbaf9f7e24a0 Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 12:15:56 2018 -0700 regcomp.c: White space only Indent after the previous commit created a new outer loop commit 45e229cc58918a54fce7056693af52816b608b9a Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 11:59:03 2018 -0700 regcomp.c: Refactor looking for POSIX optimizations Instead of repeating the code, slightly modified, this uses a loop. This is in preparation for a future commit where a third instance would have been required commit 1ba2862a7eb424fe835e70d2da7dcb868dfaf012 Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 11:20:56 2018 -0700 regcomp.c: Rename a variable The new name more accurately expresses the usage, as what gets generated may not actually be an ANYOFD. commit 48cf5a25d86bd5d146782c4e61519e8df1633d1c Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 11:12:15 2018 -0700 regcomp.c: Consolidate common code These flags can be set in one place, rather than in multiple ones. commit e2d735e014bfa07bf4852e3284f88400b644aaa6 Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 11:05:34 2018 -0700 regcomp.c: Simplify ANYOFM node generation This refactors the code somewhat. When we discover a deal-breaker code point we can just break out of the loop (using a goto) instead of setting a flag, continuing, and later testing it. commit e16fed43284022b984fa0f1f33f848f210ddd9e8 Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 27 10:51:46 2018 -0700 regcomp.c: Don't zap larger scope variables It doesn't matter currently, but it's best to declare more limited scope variables for doing limited scope work, rather than using the more global variable, which someday might want to be used later, outside the block that zapped it, and would lead to a surprise. commit 9316900986eda67f57f831e6bf72e5d88919044d Author: Karl Williamson <k...@cpan.org> Date: Sat Nov 17 12:45:24 2018 -0700 Remove ASCII/NASCII regnodes The ANYOFM/NANYOFM regnodes are generalizations of these. They have more masks and shifts than the removed nodes, but not more branches, so are effectively the same speed. Remove the ASCII/NASCII nodes in favor of having less code to maintain. commit 9132356d52e6368d0d3597429cb629edf0bb9813 Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 20 22:22:56 2018 -0700 regcomp.c: Prefer ANYOF/NANYOFM regnodes These two regnodes are faster than regular /[[:posix:]]/ ones, and some of the latter are equivalent to some of the former. So try the faster optimizations first. This commit just swaps the two blocks of code, and outdents appropriately commit 09f7773dc48ca02dfac98669155f4eb36e6d8874 Author: Karl Williamson <k...@cpan.org> Date: Tue Dec 4 09:58:13 2018 -0700 regcomp.c: Allow more EXACTFish nodes to be trieable The previous two commits fixed bugs where it would be possible during optimization to join two EXACTFish nodes together, and the result would not work properly with LATIN SMALL LETTER SHARP S. But by doing so, the commits caused all non-UTF-8 EXACTFU nodes that begin or end with [Ss] from being trieable. This commit changes things so that the only the ones that are non-trieable are the ones that, when joined, have the sequence [Ss][Ss] in them. To do so, I created three new node types that indicate if the node begins with [Ss] or ends with them, or both. These preclude having to examine the node contents at joining to determine this. And since there are plenty of node types available, it seemed the best choice. But other options would be available should we run out of nodes. Examining the first and final characters of a node is not expensive, for example. commit a11df35ea2a3530611a9ddaa088ab26e56015716 Author: Karl Williamson <k...@cpan.org> Date: Fri Nov 30 09:31:46 2018 -0700 regcomp.c: Make sure /di nodes begining in 's' are EXACTF This is defensive coding. The previous commit changed things so under /di a node ending in [Ss] doesn't get made an EXACTFU. This commit does the same for nodes that begin with [Ss]. This isn't actually necessary as one needs two EXACTFU nodes in a row for the problem to occur, and the previous commit appears to remove the possibility for the first node being an EXACTFU. But I'm leery of relying on this. So this commit makes sure that a node beginning with 'S' or 's' under /di remains EXACTF commit d55a68e43bd28f297c456c915a73d6cbc9caa28e Author: Karl Williamson <k...@cpan.org> Date: Thu Nov 29 20:43:32 2018 -0700 regcomp.c: Make sure /di nodes ending in 's' are EXACTF Prior to this commit only nodes that filled fully were guaranteed not to be upgraded to EXACTFU. EXACTF nodes are used when /u rules aren't to be in effect unless the string being matched again is in UTF-8. EXACTFU nodes are used when the /u rules are to be used always. The regex compilation keeps track of what's in an EXACTFish node, and if possible uses EXACTFU even under /d. It does this because EXACTFU nodes are trieable, and are faster at runtime due to not having to check the UTF-8ness of the target string, and that it also folds the pattern at compile time, avoiding that step at runtime. If what a node matches is the precise same thing under /d and /u, whether the node is EXACTF or EXACTFU is irrelevant, so is changed to the more desirable EXACTFU. The sequences 'ss', 'SS', 'Ss', 'sS' are very tricky, for several reasons. For this commit, the trickiness lies in that they are the only sequences that can match a target string differently under /ui rules than /di, with both not being encoded in UTF-8. In all other cases, one or the other must be UTF-8 for there to be a difference. The code has long taken special care for these sequences, but overlooked two fairly obscure cases where it matters. This commit addresses one of those cases; the next commit, the other. Because these are sequences, it might be possible for it to be split across two EXACTFish nodes, so that the first 's' or 'S' is the last thing in the first node, and the second 's' or 'S' is the first thing in the second. Should these nodes get joined during optimization, the sequence beccomes obvious. The code has long recognized this possibility if the first node gets filled up, necessitating a split, and it doesn't make the first node EXACTFU if it ends in 's' or 'S'. But we don't fill those nodes completely, and optimization can join two together. (It would require some extra work to fully pack them, which is possible to do; but hasn't ever been done. They are more fully packed than they used to be.) But future commits in the pipeline will join nodes in more cases during optimization, and so, we need to not create an EXACTFU for trailing 's' or 'S' always, not just if the first node fills up. This commit moves the code that accomplishes this so it always gets executed at node end of /di nodes. commit d408e6259242a82865faaa3dbde383183dda6be8 Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 12:27:46 2018 -0700 regcomp.c: Add assertion commit 9be035a8b11b2250cb6604216761e74bf3b6221e Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 17:39:05 2018 -0700 regcomp.c: Simplify a bit of code By using a macro with a slightly different API, we don't have to mess with the parse pointer. commit 05a8b02e6855967844bcba239521c76df805b0f0 Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 12:38:27 2018 -0700 Remove one use of static function Previous commits in 5.29 have removed all but two call to this function, and the remaining ones take radically different paths in it, with very little common code. It simplifies things if we expand each call to the code that gets evaluated. This commit does one; the next commit, the other. The need for an #ifdef is removed by adding a flag and setting it in an existing #ifdef. commit 408ca8acc70549ddeeee3384f5eb5c5d02dc195d Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 13:05:47 2018 -0700 regcomp.c: Use simpler variable name as long as possible This just extends the use of a variable name a little longer, as it's easier to read than the nested macro calls that eventually have to be used. commit de083f8cc63f528e8fa39427b19949cacce9f479 Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 12:22:39 2018 -0700 regcomp.c: Prefer one of similarly named vars These two variables are similarly named, but have slightly different purposes. Comment out one of them, and convert to using the other consistently commit 7f0f8d404d73e61160a688348f5ddffff2e9705a Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 11:46:07 2018 -0700 t/re/anyof.t: Remove duplicate test case commit 99620b744afc9a04dd8efe25fc76f27db5b570f9 Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 11:17:26 2018 -0700 Use consistent spelling in qr// dumping Under -Dr (or use re 'Debug') the compiled regex engine program is displayed. I noticed that it used two different spellings for 'infinity'. This commit changes so only one is used, the one that has been in the field the longest. commit 5ed704648680590f57b5e6278a2750427c901ce4 Author: Karl Williamson <k...@cpan.org> Date: Mon Dec 3 11:33:49 2018 -0700 regcomp.c: Can join certain EXACTish node types The optimization phase of regular expression pattern compilation looks for adjacent EXACTish nodes and joins them if they are the same flavor of EXACT. Commits a9f8c7ac75c364c3e05305718f38c5f8ccd935d8 and f6b4b99d2e584fbcd85eeed475eea10b87858e54 introduced two new nodes that are so close to existing flavors that they are joinable with their respective flavor. This commit does that. commit 22bd677ff47c7a032784728e90dd41b283a22d9f Author: Karl Williamson <k...@cpan.org> Date: Sun Dec 2 17:45:06 2018 -0700 regcomp.c: Move clause of while() conditional into loop This is in preparation for making the conditional more complicated than can be easily done in the condition. commit 3932608dbb1b96ec4ded560508a23bb714e8101a Author: Karl Williamson <k...@cpan.org> Date: Tue Dec 4 18:22:05 2018 -0700 regcomp.c: Clarify comment commit 7b102b2b79dbdf17c0bc18e8206ed2b3514191a9 Author: Karl Williamson <k...@cpan.org> Date: Tue Dec 26 18:09:08 2017 -0700 XXX don't push, khw customization for bench.pl ----------------------------------------------------------------------- -- Perl5 Master Repository