Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: 145531d596acd6392a32c8fbd47fba2b6356cd64 https://github.com/Perl/perl5/commit/145531d596acd6392a32c8fbd47fba2b6356cd64 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025)
Changed paths: M t/porting/regen.t Log Message: ----------- Temporarily skip regen porting test in this branch The digest numbers keep changing in this branch. Turn this test off until near its end. Commit: c67acbf1623afc3236358ebff71f0238caa1e9f0 https://github.com/Perl/perl5/commit/c67acbf1623afc3236358ebff71f0238caa1e9f0 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Temporarily don't generate some porting info This series of commits has dozens of commits that would otherwise require much more work to generate. This commit temporarily turns off generating EBCDIC tables, and the tables that only change when a new Unicode release happens. Bisecting on an ASCII machine is unaffected Commit: f56978c93aad2b855e07b020116228909d9f3300 https://github.com/Perl/perl5/commit/f56978c93aad2b855e07b020116228909d9f3300 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add stack trace facility This is useful in debugging Commit: 15ac5219107e36ab3e62baf529ec35c1e55bcf40 https://github.com/Perl/perl5/commit/15ac5219107e36ab3e62baf529ec35c1e55bcf40 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regexec.c Log Message: ----------- regexec.c: Rename a couple of variables The previous names erroneously implied these were associated with the parameters to these functions; instead rename to indicate they are associated with some local variables. Commit: bd887497638322ff6ff95aa438d6a4f70d96b212 https://github.com/Perl/perl5/commit/bd887497638322ff6ff95aa438d6a4f70d96b212 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Change doubled semicolon to single Commit: 3744b0fd0e82ae9935ce6152dbc96538a2d9d56d https://github.com/Perl/perl5/commit/3744b0fd0e82ae9935ce6152dbc96538a2d9d56d Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists.pl: Use feature signatures Commit: 6e7862d2c246b0fd3e9ad3954b7eff125a9732e5 https://github.com/Perl/perl5/commit/6e7862d2c246b0fd3e9ad3954b7eff125a9732e5 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: White-space comments This includes outdenting and indenting where future commits will add or remove blocks Commit: 7e5d1ecd5b730af1345c15bb133f29ce17146b58 https://github.com/Perl/perl5/commit/7e5d1ecd5b730af1345c15bb133f29ce17146b58 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Clarify output table headings Changes the wording for some table headings in the generated file to indicate where to find what the abbreviations mean Commit: 42f18f06977f9c443a4b3eeb449268ec08b194b5 https://github.com/Perl/perl5/commit/42f18f06977f9c443a4b3eeb449268ec08b194b5 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Sort some lists These lists are densely packed. It is easier to find something if they are sorted Commit: 118c65df1e30f702109f85c0f734e287151e327d https://github.com/Perl/perl5/commit/118c65df1e30f702109f85c0f734e287151e327d Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Fix rule LB11 This rule is not affected by spaces, yet the code was saying it should be. Commit: c9f1e896377ade7707595df081c280a3e22ea5eb https://github.com/Perl/perl5/commit/c9f1e896377ade7707595df081c280a3e22ea5eb Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Fix rule LB12 This rule is not affected by spaces, yet the code was saying it should be. Commit: ffa9a006c9ed7ceef2bcf2a4406c55bc576db843 https://github.com/Perl/perl5/commit/ffa9a006c9ed7ceef2bcf2a4406c55bc576db843 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Fix rule LB13 This rule was written here to not include the actions when the character before the candidate break position is a number. This is just plain wrong. The Unicode rules have never said this. Commit: 6c708475ecf243377447b652fa3daae1db9d7995 https://github.com/Perl/perl5/commit/6c708475ecf243377447b652fa3daae1db9d7995 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add extensive comments Commit: f805c77a1ad9c62dc5312847decfdeb92decb5de https://github.com/Perl/perl5/commit/f805c77a1ad9c62dc5312847decfdeb92decb5de Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Narrow some output tables Future Unicode releases will greatly explode the size of certain tables. Prior to this commit, the minimum column size was two, but some table columns fit in a single window column. This commit changes to use the minimum required. Commit: 64120a957ace112845c4ed8ada47722021806802 https://github.com/Perl/perl5/commit/64120a957ace112845c4ed8ada47722021806802 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Center row labels in output tables This improves readability Commit: 7cab98315092d6a93151014335ca0cd4dfdadbb8 https://github.com/Perl/perl5/commit/7cab98315092d6a93151014335ca0cd4dfdadbb8 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Move break tables edge cells These tables are placed in charclass_invlists.h. They have a row and column for what happens when the position being checked for is at the start or end of the text. This commit reorders the tables so that the edge row and column are, well, at the edges. And it relabels the labels to be '^' and '$' respectively. Commit: 0c7052a09144130f451b412f76204428f417a5ce https://github.com/Perl/perl5/commit/0c7052a09144130f451b412f76204428f417a5ce Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Improve output table column headings This uses a more complex algorithm to generate short labels to demarcate rows and columns in some output tables. This doesn't affect the current tables for Unicode 15.0, but will in future Unicode releases. Commit: 46742bc2050fff54f622f095b188bd82638cea55 https://github.com/Perl/perl5/commit/46742bc2050fff54f622f095b188bd82638cea55 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Change two formal parameter names Commit: 99431fcab99449e5e563d560464253b7ab9d81e8 https://github.com/Perl/perl5/commit/99431fcab99449e5e563d560464253b7ab9d81e8 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Move some lines earlier in their functions where the next commits will want them Commit: df8dddf47fef2b139711279dcbc3a7c9774d768f https://github.com/Perl/perl5/commit/df8dddf47fef2b139711279dcbc3a7c9774d768f Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Change a word to be more accurate Everything is an action. Some are accomplished via DFAs. This commit uses the latter word in places where it is a DFA. It actually uses this new term where it doesn't apply. Future commits will remove those inaccuracies. Commit: 43bb2c4938757f17896dd53128953f5225e0e86f https://github.com/Perl/perl5/commit/43bb2c4938757f17896dd53128953f5225e0e86f Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Set and get break table values with functions Previously, we would just set an individual element directly. This changes most of those to use function calls instead. This has two main benefits. The function can change what's being done without having to change many lines; and these sets had a lot of visual noise with sigils and hash references. The result is a lot easier to read. The next few commits will continue this process. Note that the generated tables are unchanged by this commit. It has no effect on runtime processing. That will be true of the next commits as well. It became obvious in doing this that the rule for Perl_Tailored_HSpace does not belong in the 3's, but comes immediately before that. Arbitrarily use '2z' Commit: 7c7314732c902c1957578c1f74e01a8521cfac31 https://github.com/Perl/perl5/commit/7c7314732c902c1957578c1f74e01a8521cfac31 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Hoist calculation to sub callers And pass the result to the subroutine. This is in preparation for this value to be needed in additional places. Commit: af259bb098268d29818d6aa028a9a82995883d4e https://github.com/Perl/perl5/commit/af259bb098268d29818d6aa028a9a82995883d4e Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Set values in unused table cells to 0 These cells exist so that code is less likely to need to be changed when a new Unicode release comes along. Currently it doesn't matter at all what is in those cells, because they are never read. But future commits will want to make sure they don't refer to dfas that are obsolete and whose references to could be undefined symbols that would abort the compilation. The choice of 0 or 1 to put in the cells was arbitrary; I know of no reason to prefer one or the other Commit: 4828646c386be6a4712641c309369d9324539acb https://github.com/Perl/perl5/commit/4828646c386be6a4712641c309369d9324539acb Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Reorder two statements This now matches the order that Unicode gives; for easier checking that our code matches their demands. Commit: c9814be231b244576ef4d80fe4dfab1e66e8f08d https://github.com/Perl/perl5/commit/c9814be231b244576ef4d80fe4dfab1e66e8f08d Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add ability to specify an entire row simply Instead of having to loop through all the cells of a row or column, this commit uses '*' to represent the whole thing. This is more in keeping with the text of the Unicode rules which just leaves thing blank if it means everything; Commit: 394ab97c8e8bc58bd5f7f7fb70388d22e3596497 https://github.com/Perl/perl5/commit/394ab97c8e8bc58bd5f7f7fb70388d22e3596497 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Allow arbitrary list of cells This follows up on the previous commit which allowed simply specifying an entire row or column. This adds the ability to specify a list. Commit: a3928940e2de937ee09225e0077b3371877531e8 https://github.com/Perl/perl5/commit/a3928940e2de937ee09225e0077b3371877531e8 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add no_nobreak_override() This new function allows removing loops from the main code Commit: b5b7ffab978dd7b5c864890ed22a1073c68adbbf https://github.com/Perl/perl5/commit/b5b7ffab978dd7b5c864890ed22a1073c68adbbf Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add ability to specify a complement of list And use it in one instance. Previous commits have added the ability to pass multiple items simply to the functions that work on rows and columns. This now gives the ability to complement the set of the multiple items passed. Commit: ef5c16bcbc4da046b75cfb7457f6b531de80d946 https://github.com/Perl/perl5/commit/ef5c16bcbc4da046b75cfb7457f6b531de80d946 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Handle Combining Mark: changes CMxZWJ This is separated out from the previous commit because it is tricky XXX Commit: ada974d9729d8611132325727d5474932879f570 https://github.com/Perl/perl5/commit/ada974d9729d8611132325727d5474932879f570 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: move decls comments around Commit: 7a110968b0078953aec740ce8787ece46631d261 https://github.com/Perl/perl5/commit/7a110968b0078953aec740ce8787ece46631d261 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Allow abbreviations for break classes And use them in a couple of places. This will allow the rules to more closely align with the Unicode text, which uses abbreviations just sometimes. Commit: ef591789205839779fa39477c88771f14f1b74fb https://github.com/Perl/perl5/commit/ef591789205839779fa39477c88771f14f1b74fb Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add effectively macro expansions Unicode's Word Break rules have shortcut names that really mean multiple ones. For example, AHLetter means either ALetter or Hebrew_Letter. This commit allows "macros" to be defined like this so that the statements in this file more closely resemble those of the Unicode text. More importantly, Unicode's rules in recent times need subdivided equivalence classes, such as Alphabetics that are also East Asian. What has been done so far is when that happened, extra rules were added that were all possible combinations of these subdivisions. It is easy to miss a combination; and it turns out there are bugs. This new capability allows us to say that an Alphabetic (ALetter) is a combination of plain ALetters plus East Asian letters, and the code generates all the combinations automatically. This makes the text cleaner and safer. Commit: fd333fa304a3c2aa616a768db0f4bf1cc67509c2 https://github.com/Perl/perl5/commit/fd333fa304a3c2aa616a768db0f4bf1cc67509c2 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use new split capability with ALetter ALetter also contains the class ExtPict_LE. Prior to this commit, there had to be a rule for each ALetter doing the same thing with ExtPict_LE. But the new splits capability allows ALetter to expand automatically to both. This uncovers a bug. There should have been a rule WB5 ALetter x ExtPict_LE which was missing. Commit: 80864093f81e4a8261bc4ed5ede8a6d249c910cf https://github.com/Perl/perl5/commit/80864093f81e4a8261bc4ed5ede8a6d249c910cf Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use new split capability with ExtPict \p{Extended_Pictographic} is not fully implemented yet because unlike other properties, it can match a string instead of a single character. And it is kind of a kludge here The 14.0 release was analyzed by me and the rules here were customized based on that analysis. For example, in the Line Break property, a clause was added by Unicode to Rule LB30b that required taking the intersection of this property and all the Unassigned code points. It turns out that everything in that intersection had the Line Break class of Ideographic, so I modified mktables to split the Ideographic class into two components, the elements of the intersection went into the long-named "Unassigned_Extended_Pictographic_Ideographic" and plain Ideographic was left with the remainder. To match all of Ideographic you have to specify both classes. By using the new split capability, this can be done effectively as a macro expansion, and the special cases can be removed from the code. This commit does this. Similarly, both the Word Break and Grapheme Cluster Break properties have somewhat different interactions with Extended_Pictographic that this commit smooths over. This situation is brittle. A new release of Unicode might change things so that Ideographic isn't the only LB class in the intersection mentioned above, so the customization has to be checked in every release. A few commits later in this branch, this will be automated, and no longer a concern. Commit: 84fdf56f436f0826385b872273c858ced3765886 https://github.com/Perl/perl5/commit/84fdf56f436f0826385b872273c858ced3765886 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists: Use new split capability with AHLetter The description in UAX #29 of Unicode's Word Break property uses two convenience macros to simplify some of their rules. The split capability introduced several commits ago, allows this program to follow along, making the rules here more closely aligned to the text in UAX 29, hence simpler. This commit creates one macro, AHLetter; the next commit does the other macro. The name of the DFAs involving this name are changed to correspond. Commit: 1b7556f51ff53c10b271418e122b2e64b3400b35 https://github.com/Perl/perl5/commit/1b7556f51ff53c10b271418e122b2e64b3400b35 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists: Use new split capability with MidNumLetQ This follows on the previous commit, with the other Word Break property name that Unicode macroizes Commit: 5810cf7b466509ce3d592c73dc128a39d02798cb https://github.com/Perl/perl5/commit/5810cf7b466509ce3d592c73dc128a39d02798cb Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add ability to get set subtraction This capability will be used in future commits, so that the implementation can more closely follow Unicode's text Commit: 9a02f4d2eb2d2fcda3e9872e14de18b3d45530d5 https://github.com/Perl/perl5/commit/9a02f4d2eb2d2fcda3e9872e14de18b3d45530d5 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use new set subtraction ability This allows the removal of some combinatorial complexity, thus showing a bug in which the combination of PO to EOP had not been added when it should have been. Currently, mktables splits the Line Break OP and CP classes into East Asian ones, and the remainders. The extra combinations occurred because the code here needed to take every existing OP and add an East_Asian (EA_OP) equivalent; same with CP. It's easy to miss one, and I did. This commit allows this split to be hidden from most places in mk_invlists. Commit: 394edbac7d9f3043ecb1b6f65308b540599ec6ae https://github.com/Perl/perl5/commit/394edbac7d9f3043ecb1b6f65308b540599ec6ae Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use abbreviations for Line Break Unicode UAX #14 gives rules for the Line Break property using the short names for them. Prior to this commit, we mostly used the full names for the classes in this property. This commit changes to use the short names. This makes it easier to compare the code here with the UAX text. The abbreviations aren't always straight forward, so it was easy to go astray. Commit: 4195bab6b4c07c0361b230785d6c9869917ff59e https://github.com/Perl/perl5/commit/4195bab6b4c07c0361b230785d6c9869917ff59e Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use 'for' statement modifier This significantly cuts down on the verbiage, and makes the rules in this file more closely match the text from which they are derived in UAX 14 and UAX 29 Commit: a783a45b66258f465322f0f49e8a4106f5a8ce9d https://github.com/Perl/perl5/commit/a783a45b66258f465322f0f49e8a4106f5a8ce9d Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists: Improve DFA names This commit now imposes more structure on the names. The names are sort of pseudo code that lays out what the DFA is to do. The most significant change is to standardize what has been done in recent commits with newly added DFAs. And that is to use the string '_v_' in the name where the tip of the 'v' points to where position in the input string being processed where this rule applies to. Commit: 659c76f27c28c1fa769c4ef360ba01e03aa5e5e4 https://github.com/Perl/perl5/commit/659c76f27c28c1fa769c4ef360ba01e03aa5e5e4 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add rule numbers to break table output This is very helpful in debugging, and correlating the tables with the Unicode UAX documents from which they are derived. Commit: 947d355cc2d2ca705e199c55c75d3c1c17e8deb5 https://github.com/Perl/perl5/commit/947d355cc2d2ca705e199c55c75d3c1c17e8deb5 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Reorder some hash entries The new order is based on the order of their respective rules; the next commit expands these, and it makes it easier for a human to look up. Commit: ffa0be843b515845148d554c191859468bccbfe1 https://github.com/Perl/perl5/commit/ffa0be843b515845148d554c191859468bccbfe1 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add fields to data structure This converts each DFA form just a number into a separate hash in a bigger hash with more information besides that number. This extra information will be needed in a future commit. Commit: e796006b4f9d9fc7ad1d2e43ee2f3b86306b0bb8 https://github.com/Perl/perl5/commit/e796006b4f9d9fc7ad1d2e43ee2f3b86306b0bb8 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists/regexec.c: Generate and use macros With this commit, mk_invlists.pl now generates #define macros isFOO that regexec.c now uses to determine if a character is in a particular line breaking class. Previously, x == foo was used. This change insulates the code from having to worry about when classes get changed to be combinations. Commit: 7bda4036b3fe11b2dcb9d3cf2b1a870a511368c5 https://github.com/Perl/perl5/commit/7bda4036b3fe11b2dcb9d3cf2b1a870a511368c5 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Reverse order of break property rules Before this commit, the rules for populating the tables for break properties were laid out in reverse order, so that the lowest priority rule was executed first. It filled a cell, which then would be overwritten by any higher priority rule that applied to it. This reverse order made it harder to compare the rules with the text of the Unicode rules these are trying to implement. This commit changes things to have the rules in the same order as Unicode lists them. The previous scheme had certain advantages that this has to make up for by using temporary code to override what would otherwise have gone into the cells. This code will no longer be needed in a few commits when a general purpose stacking DFA scheme is implemented. As a result, of this temporary code, only two cells in one property change as a result of this complete reversal. They change to using a DFA which ends up returning the same results as the original unconditional value. Commit: 5beca8178fdce168b42655ce9c2d873273cfbeaa https://github.com/Perl/perl5/commit/5beca8178fdce168b42655ce9c2d873273cfbeaa Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove obsolete function This function was used when the previous scheme of applying the rules in reverse order needed to be overridden in a few cases by prohibiting changes to existing seemingly lower priority values. Now there's no lower priority value in the cell that we would need to preserve. Commit: 2c66fa6a3195efd21f887891ece84fb11174be76 https://github.com/Perl/perl5/commit/2c66fa6a3195efd21f887891ece84fb11174be76 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove some special cases These were added to compensate for reversing the order of handling the break property rules. This commit hides the need for that in one place per table, except for a second place for Line Break. The only changes to the tables occur in the garbage row and column which aren't actually accessed, so those changes are harmless. It is a temporary commit. A few commits from now, this will be removed. Commit: df2c9e56e9ffac45e6d8e96508c4a6eb4d630726 https://github.com/Perl/perl5/commit/df2c9e56e9ffac45e6d8e96508c4a6eb4d630726 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Add ability to tie table cell changes together Some Unicode rules say that some cells are supposed to be changed at the same time their master cells are. This commit forms that linkage. Commit: 84f6854246ee6f4f1f37720b3ab5bdda0a3ce417 https://github.com/Perl/perl5/commit/84f6854246ee6f4f1f37720b3ab5bdda0a3ce417 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: rm no longer used sub The previous commit took away the need for this. Commit: c0d42ef124187763922b8bcbc8f42fbd67d3a607 https://github.com/Perl/perl5/commit/c0d42ef124187763922b8bcbc8f42fbd67d3a607 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M embed.fnc M embed.h M proto.h M regexec.c Log Message: ----------- regexec.c: Change static function API This makes it clearer to use. Instead of having a boolean flag to change the behavior, there are now two macros that call the underlying function, and their names reflect the desired behavior Commit: 8ebc2205e8e5f22a2aeaaff5260e39eb5f5d3826 https://github.com/Perl/perl5/commit/8ebc2205e8e5f22a2aeaaff5260e39eb5f5d3826 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M embed.fnc M embed.h M proto.h M regexec.c Log Message: ----------- regexec.c: Change function name The new name is longer, but it makes clear that it does something that the reader of this code might find unexpected. Commit: f5541f64a3636d5e2d9a38eafb783ca644b0e7dd https://github.com/Perl/perl5/commit/f5541f64a3636d5e2d9a38eafb783ca644b0e7dd Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M embed.fnc M embed.h M proto.h M regexec.c Log Message: ----------- regexec.c: Change static function API Sometimes this functionality is needed to also skip over certain intervening classes of characters while backing up in the parse string. This commit creates two macros to call the modified underlying function with a boolean flag. This names of the macros make it easy to know what's happening. Commit: 0feb9f9b94f03c72d08fa63af2b29372dd6a8419 https://github.com/Perl/perl5/commit/0feb9f9b94f03c72d08fa63af2b29372dd6a8419 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regexec.c Log Message: ----------- regexec.c: Skip CM and ZWJ in look behind in LB parsing The Unicode standard says that these two characters are to be ignored for the purposes of determining if there is a Line Break just before certain characters. That is, you have to back up in the parse string past all adjacent ones of these, and then examine it. This applies to any lower priority rule than LB9. This commit fixes two cases that didn't do that. Commit: c974f460e0188ea893e805dd197763fd6e16d2c1 https://github.com/Perl/perl5/commit/c974f460e0188ea893e805dd197763fd6e16d2c1 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M lib/unicore/mktables Log Message: ----------- mktables: Consolidate code into a single function Some properties in Unicode essentially form equivalence classes for all possible code points. For example, Unicode publishes the Line Break (LB) property, where each possible code point is given a type, like Alphabetic, or Opening Parenthesis. All code points that act as alphabetics have the AL equivalence class. All that act like Opening Parentheses have the OP class. Unicode also publishes rules as to if it is permissible to break between code point of any types. For the Line Break property, you wouldn't break a line between two alphabetics or between an opening parenthesis and an alphabetic, but you could between a Space and almost any other type or between a closing parenthesis and many types. Perl uses these properties to implement the \b{lb} etc regular expression constructs. It uses a two-dimensional array where the value in the cell [x,y] tells whether a break is permissible between characters of type x and characters of type y. (Some cases can't be done with this simple lookup, but knowing the surrounding context is necessary to make a decision. Those are implemented as DFAs in regexec.c.) Unicode used to publish such an array for the Line Break property, and still publishes some non-normative .html files that contain similar information. But to really know what to do, one has to read documents UAX#14 and UAX#29 that contain textual descriptions of the rules. These change each new release, and are the major pain in upgrading to a new release. In recent releases, Unicode has mostly stopped creating new equivalence classes as it has refined the rules for the boundary conditions For example, the line boundary conditions are very different for East Asian (EA) characters than the Western scripts. Effectively there are thus two sets of rules. But instead of creating new equivalence classes that reflect this reality, Unicode has chosen to just document it in those two UAX documents. I don't know the motivation for this. But perl wants that table to divvy up all the possible boundary conditions, so it can continue to use the array to make most of the decisions, so mktables splits the equivalence classes that Unicode provides into new ones that reflect what the UAXes say. At first, I thought this was a one-off matter, so wrote a few lines to handle a special case; then when the next release came out, added a few more for another one, etc. But Unicode 15.1 and 16.0 continue the trend, so it's become an effort. This commit consolidates the previous one-off code snippets into one generalized function. It should be able to handle future instances without having to craft something new each time. It also creates a new data structure that mk_invlists.pl can look at so that it doesn't have to repeat the logic found here, as it currently does. Commit: 41dc8a96c74d56297b9460f7f7e3dc6a26edfd55 https://github.com/Perl/perl5/commit/41dc8a96c74d56297b9460f7f7e3dc6a26edfd55 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists: Use new mktables enhancements Now mk_invlists no longer has to know what the details are of properties that have been split into more, smaller equivalence classes. mktables handles that and provides the information in new hashes. Commit: 46eb5e9419bde70ea40fef5f9ea69ed8a52d4c3a https://github.com/Perl/perl5/commit/46eb5e9419bde70ea40fef5f9ea69ed8a52d4c3a Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists: Generalize to stack DFAs for break properties The Unicode breaking algorithms are supposed to be implemented by executing DFAs in priority order, stopping at the first one that succeeds. (In many cases a DFA isn't needed, and we can unconditionally say that there is or isn't a break at a given position simply by looking at the characters on either side of it.) But it was a significant amount of work to get from where perl started to be able to do that. And it hasn't been necessary until now. In most cases, a single DFA suffices, and where not, a more complicated single DFA took care of the stacking. But this has become untenable in Unicode 15.1, so I ended up doing the work to implement their algorithm. The result is more, but simpler DFAs, and it becomes easier to add new ones, as they don't have to interact with other ones. The stacking does that for them. This commit implements a separate DFA table beyond the x,y lookup table. If the decision that this is a breakable position requires a DFA, the x,y contents are an index into this separate table, which contains the DFA to follow. The first element gives the case statement number to use to execute the DFA. The second element gives the value to return if the DFA succeeds. If it fails, the code add +2 to get the next thing to try. Commit: 62f71dcc711ec8a153adf8ea2fb9b8bc0184dd8a https://github.com/Perl/perl5/commit/62f71dcc711ec8a153adf8ea2fb9b8bc0184dd8a Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Use new DFA scheme for horizontal white space Perl doesn't follow the Unicode standard with regard to its treatment of white space, in particular sequences of horizontal white space. Unicode allows "tailoring" of its rules for local situations, and Perl traditionally with \B has treated all sequences of white space as a single unit. Unicode originally considered each space in a sequence of them as a separate unit. A perl program would want them all a single unit. Unicode eventually came round to our way of thinking, but not entirely, as comments unaffected by this commit indicate. The DFA for this situation does not fit in with the new stackable DFA scheme, and woul start failing tests a few commits later as the shim code is removed. Convert to the new scheme, which allows us to call the functions that affect a single cell twice with effect. The order is immaterial, but one call installs a default behavior, and the other a DFA that ends up being executed first to override that behavior in certain (rare) situations. Commit: 728a27770d0c3b202ef0e963f7b60c3ebe8cb4a3 https://github.com/Perl/perl5/commit/728a27770d0c3b202ef0e963f7b60c3ebe8cb4a3 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove accesses of unused parameter set_cells() no longer reads this parameter; no need to pass it nor set it up. Commit: 14bb467a0970422e0b2f95290b08cecb5f88ffc3 https://github.com/Perl/perl5/commit/14bb467a0970422e0b2f95290b08cecb5f88ffc3 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove a temporary work-around This code was due to a few commits ago having reversed the ordering the Unicode rules are applied in. After updating to use a generalized DFA scheme, it is no longer needed Commit: 5222f5c1b67ecd5d277ff47827801345bf312b3a https://github.com/Perl/perl5/commit/5222f5c1b67ecd5d277ff47827801345bf312b3a Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove a no longer used enum The new generalized DFA scheme makes this value moot; it was used to get around not having such a scheme. Commit: 300fbe95670cc110ea5712e27577df1094bab860 https://github.com/Perl/perl5/commit/300fbe95670cc110ea5712e27577df1094bab860 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove no longer used function Commit: 67707f3cf58a75fb1ff36c2e2115bd7ffeb9d21e https://github.com/Perl/perl5/commit/67707f3cf58a75fb1ff36c2e2115bd7ffeb9d21e Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Look for a DFA optimization possibility If both branches of an else lead to the same result, skip the else and set the result unconditionally. That's what this commit does for DFAs that get the same value if they succeed as when they don't. There is one current case where the DFA can return an anomalous result, so it can't be optimized out. Add a field to the hash entry defining that entry, so it doesn't get optimized. Commit: 6224d071fc869aa0ad6ec62e5852188a4bcecbb0 https://github.com/Perl/perl5/commit/6224d071fc869aa0ad6ec62e5852188a4bcecbb0 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl Log Message: ----------- mk_invlists: Remove hard-coded numbers A couple of commits ago, the last necessarily-hard-coded DFA enum besides 0 and 1 was removed. This allows for all the rest to be assigned by using the value of an incrementing variable. This makes it easy to add DFAs in the middle of existing ones, as will happen as future Unicode releases come our way. Commit: 2521a049a533f398a8ed71304b7e395555a7f08a https://github.com/Perl/perl5/commit/2521a049a533f398a8ed71304b7e395555a7f08a Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c M regexp_constants.h Log Message: ----------- mk_invlists: Add a shorter form DFA This is just for legibility of reading the rules Commit: cb4b9028b02f8e42806258aca1b5f3d5f18eb498 https://github.com/Perl/perl5/commit/cb4b9028b02f8e42806258aca1b5f3d5f18eb498 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M lib/Unicode/UCD.t Log Message: ----------- lib/Unicode/UCD.t: Prepare for Unicode 15.1 The numeric value for U+5146 changed in 15.1 Commit: 01207563b6f7538f67abc5108cc899067f25db3d https://github.com/Perl/perl5/commit/01207563b6f7538f67abc5108cc899067f25db3d Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M lib/Unicode/UCD.t Log Message: ----------- UCD.t: Skip test which fails on 32 bit words In Unicode 15.1, the ideograph U+4EAC now has a numeric value, and that value is 10 quadrillion (1e+16). This is the first instance in Unicode of an integer not fitting in a 32 bit word, as this requires 49 bits. One of the tests in UCD.t requires round-trip equality in converting from string to number and back; skip it for this case and any future similar ones. I find it interesting that U+4EAC is listed as having the meaning "capital city". Commit: 3bd216255ce5263db558c9ed82d4aa4bb4816663 https://github.com/Perl/perl5/commit/3bd216255ce5263db558c9ed82d4aa4bb4816663 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M regen/mk_invlists.pl M regexec.c Log Message: ----------- mk_invlists/regexec.c: Prepare for Unicode 15.1 Commit: 1f497ea027b2899b12daefa48c6be602f4e91bb3 https://github.com/Perl/perl5/commit/1f497ea027b2899b12daefa48c6be602f4e91bb3 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M lib/unicore/mktables M regcharclass.h Log Message: ----------- mktables: Prepare for Unicode 15.1 Commit: b75c7517558990201b788d174f73bd2d4248da89 https://github.com/Perl/perl5/commit/b75c7517558990201b788d174f73bd2d4248da89 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M MANIFEST M charclass_invlists.inc M lib/Unicode/UCD.t M lib/unicore/ArabicShaping.txt M lib/unicore/BidiBrackets.txt M lib/unicore/BidiMirroring.txt M lib/unicore/Blocks.txt M lib/unicore/CJKRadicals.txt M lib/unicore/CaseFolding.txt M lib/unicore/CompositionExclusions.txt M lib/unicore/DAge.txt M lib/unicore/DCoreProperties.txt M lib/unicore/DNormalizationProps.txt M lib/unicore/EastAsianWidth.txt M lib/unicore/EmojiSources.txt M lib/unicore/EquivalentUnifiedIdeograph.txt M lib/unicore/HangulSyllableType.txt M lib/unicore/IdStatus.txt M lib/unicore/IdType.txt M lib/unicore/Index.txt M lib/unicore/IndicPositionalCategory.txt M lib/unicore/IndicSyllabicCategory.txt M lib/unicore/Jamo.txt M lib/unicore/LineBreak.txt M lib/unicore/NameAliases.txt M lib/unicore/NamedSequences.txt M lib/unicore/NamedSqProv.txt M lib/unicore/NamesList.txt M lib/unicore/NormTest.txt M lib/unicore/NormalizationCorrections.txt M lib/unicore/PropList.txt M lib/unicore/PropValueAliases.txt M lib/unicore/PropertyAliases.txt M lib/unicore/ReadMe.txt M lib/unicore/ScriptExtensions.txt M lib/unicore/Scripts.txt M lib/unicore/SpecialCasing.txt M lib/unicore/StandardizedVariants.txt M lib/unicore/UnicodeData.txt M lib/unicore/VerticalOrientation.txt M lib/unicore/auxiliary/GCBTest.txt M lib/unicore/auxiliary/GraphemeBreakProperty.txt M lib/unicore/auxiliary/LBTest.txt M lib/unicore/auxiliary/SBTest.txt M lib/unicore/auxiliary/SentenceBreakProperty.txt M lib/unicore/auxiliary/WBTest.txt M lib/unicore/auxiliary/WordBreakProperty.txt M lib/unicore/emoji/emoji.txt M lib/unicore/extracted/DBidiClass.txt M lib/unicore/extracted/DBinaryProperties.txt M lib/unicore/extracted/DCombiningClass.txt M lib/unicore/extracted/DDecompositionType.txt M lib/unicore/extracted/DEastAsianWidth.txt M lib/unicore/extracted/DGeneralCategory.txt M lib/unicore/extracted/DJoinGroup.txt M lib/unicore/extracted/DJoinType.txt M lib/unicore/extracted/DLineBreak.txt M lib/unicore/extracted/DNumType.txt M lib/unicore/extracted/DNumValues.txt A lib/unicore/intentional.txt M lib/unicore/uni_keywords.pl M lib/unicore/version M regcharclass.h M regen/mk_invlists.pl M regexp_constants.h M uni_keywords.h M unicode_constants.h Log Message: ----------- mk_invlists: Restore calculation of new keywords, etc Now we are ready to use a new Unicode version, we have to regenerate everything. This was turned off earlier in this branch temporarily until now so as to speed up the testing, as it was known these values wouldn't change until now. Commit: 2fbb3d680f790e39d4ec24b3bbebe9a4039f8e34 https://github.com/Perl/perl5/commit/2fbb3d680f790e39d4ec24b3bbebe9a4039f8e34 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/uni_keywords.pl M regen/mk_invlists.pl M regexp_constants.h M uni_keywords.h Log Message: ----------- mk_invlists: Include cells in calculating column widths This program generates tables for the Break properties that are somewhat human readable. Before this commit, just the heading line for a column determined its width. This commit factors in the maximum width of any cell in the column as well. It used to be that this required a separate pass, and so wasn't done. But now that separate pass is required anyway for other reasons, and it is simple to add to it this check. Commit: 5ed43b075c536111a720f254966a942c715ef48c https://github.com/Perl/perl5/commit/5ed43b075c536111a720f254966a942c715ef48c Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/uni_keywords.pl M regen/mk_invlists.pl M regexec.c M regexp_constants.h M uni_keywords.h Log Message: ----------- mk_invlists/regexec.c: Prepare for Unicode 16.0 Commit: 845c437d2e5081da6297f0327b7077699fe0469a https://github.com/Perl/perl5/commit/845c437d2e5081da6297f0327b7077699fe0469a Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/mktables M lib/unicore/uni_keywords.pl M regcharclass.h M regexp_constants.h M uni_keywords.h Log Message: ----------- mktables: Prepare for Unicode 16.0 Commit: 0fb7536d663f8f5e08bf23a72974e7e8a87ae60e https://github.com/Perl/perl5/commit/0fb7536d663f8f5e08bf23a72974e7e8a87ae60e Author: Unicode Consortium <unicode.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M MANIFEST M charclass_invlists.inc M lib/Unicode/UCD.t M lib/unicore/ArabicShaping.txt M lib/unicore/BidiBrackets.txt M lib/unicore/BidiMirroring.txt M lib/unicore/Blocks.txt M lib/unicore/CJKRadicals.txt M lib/unicore/CaseFolding.txt M lib/unicore/CompositionExclusions.txt M lib/unicore/DAge.txt M lib/unicore/DCoreProperties.txt M lib/unicore/DNormalizationProps.txt A lib/unicore/DoNotEmit.txt M lib/unicore/EastAsianWidth.txt M lib/unicore/EmojiSources.txt M lib/unicore/EquivalentUnifiedIdeograph.txt M lib/unicore/HangulSyllableType.txt M lib/unicore/IdStatus.txt M lib/unicore/IdType.txt M lib/unicore/Index.txt M lib/unicore/IndicPositionalCategory.txt M lib/unicore/IndicSyllabicCategory.txt M lib/unicore/Jamo.txt M lib/unicore/LineBreak.txt M lib/unicore/NameAliases.txt M lib/unicore/NamedSequences.txt M lib/unicore/NamedSqProv.txt M lib/unicore/NamesList.txt M lib/unicore/NormTest.txt M lib/unicore/NormalizationCorrections.txt M lib/unicore/PropList.txt M lib/unicore/PropValueAliases.txt M lib/unicore/PropertyAliases.txt M lib/unicore/ReadMe.txt M lib/unicore/ScriptExtensions.txt M lib/unicore/Scripts.txt M lib/unicore/SpecialCasing.txt M lib/unicore/StandardizedVariants.txt M lib/unicore/TestNorm.pl M lib/unicore/UnicodeData.txt A lib/unicore/Unikemet.txt M lib/unicore/VerticalOrientation.txt M lib/unicore/auxiliary/GCBTest.txt M lib/unicore/auxiliary/GraphemeBreakProperty.txt M lib/unicore/auxiliary/LBTest.txt M lib/unicore/auxiliary/SBTest.txt M lib/unicore/auxiliary/SentenceBreakProperty.txt M lib/unicore/auxiliary/WBTest.txt M lib/unicore/auxiliary/WordBreakProperty.txt M lib/unicore/emoji/emoji.txt M lib/unicore/extracted/DBidiClass.txt M lib/unicore/extracted/DBinaryProperties.txt M lib/unicore/extracted/DCombiningClass.txt M lib/unicore/extracted/DDecompositionType.txt M lib/unicore/extracted/DEastAsianWidth.txt M lib/unicore/extracted/DGeneralCategory.txt M lib/unicore/extracted/DJoinGroup.txt M lib/unicore/extracted/DJoinType.txt M lib/unicore/extracted/DLineBreak.txt M lib/unicore/extracted/DNumType.txt M lib/unicore/extracted/DNumValues.txt M lib/unicore/uni_keywords.pl M lib/unicore/version M regcharclass.h M regexp_constants.h M uni_keywords.h M unicode_constants.h Log Message: ----------- Add Unicode 16.0 This is includes updates to a few perl files that need to know the current Unicode version, and regenerating perl files that depend on the Unicode data Commit: 23d838275b4819f8e6b3768a67dff7ed62cc3133 https://github.com/Perl/perl5/commit/23d838275b4819f8e6b3768a67dff7ed62cc3133 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/mktables M lib/unicore/uni_keywords.pl M regcharclass.h M regexp_constants.h M uni_keywords.h Log Message: ----------- mktables: Note break table code for Unicode 16.0 is updated Commit: 87ab6eb9b6671eddae97502a08ac0b33d3367d0e https://github.com/Perl/perl5/commit/87ab6eb9b6671eddae97502a08ac0b33d3367d0e Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/uni_keywords.pl M regcharclass.h M regen/mk_invlists.pl M regexp_constants.h M uni_keywords.h Log Message: ----------- mk_invlists: Restore generating EBCDIC This had been turned off in this branch to speed up compilatian, and hence development. The code mostly changed in this branch is the same as in ASCII anyway. It could have become an issue only if someone tries to bisect on an EBCDIC machine, which I don't believe has happened, if ever, in decades. Commit: 1b7d99229cac051299930927aec2c25c44d69823 https://github.com/Perl/perl5/commit/1b7d99229cac051299930927aec2c25c44d69823 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M t/porting/regen.t Log Message: ----------- Revert "Temporarily skip regen porting test in this branch" This temporary commit has now served its purpose. Commit: 9a483158fe0f1bdcd4461c56f3bb4afbcb29e4ae https://github.com/Perl/perl5/commit/9a483158fe0f1bdcd4461c56f3bb4afbcb29e4ae Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M charclass_invlists.inc M lib/unicore/uni_keywords.pl M regen/mk_invlists.pl M regexp_constants.h M uni_keywords.h Log Message: ----------- mk_invlists: Update comments Commit: 7c4efc433361b4259ac336d885701248c97471a2 https://github.com/Perl/perl5/commit/7c4efc433361b4259ac336d885701248c97471a2 Author: Karl Williamson <k...@cpan.org> Date: 2025-04-20 (Sun, 20 Apr 2025) Changed paths: M pod/perldelta.pod Log Message: ----------- perldelta for Unicode update Compare: https://github.com/Perl/perl5/compare/c308ac4c9085...7c4efc433361 To unsubscribe from these emails, change your notification settings at https://github.com/Perl/perl5/settings/notifications