Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 145531d596acd6392a32c8fbd47fba2b6356cd64
      
https://github.com/Perl/perl5/commit/145531d596acd6392a32c8fbd47fba2b6356cd64
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M t/porting/regen.t

  Log Message:
  -----------
  Temporarily skip regen porting test in this branch

The digest numbers keep changing in this branch.  Turn this test off
until near its end.


  Commit: c67acbf1623afc3236358ebff71f0238caa1e9f0
      
https://github.com/Perl/perl5/commit/c67acbf1623afc3236358ebff71f0238caa1e9f0
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Temporarily don't generate some porting info

This series of commits has dozens of commits that would otherwise
require much more work to generate.  This commit temporarily turns off
generating EBCDIC tables, and the tables that only change when a new
Unicode release happens.  Bisecting on an ASCII machine is unaffected


  Commit: f56978c93aad2b855e07b020116228909d9f3300
      
https://github.com/Perl/perl5/commit/f56978c93aad2b855e07b020116228909d9f3300
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add stack trace facility

This is useful in debugging


  Commit: 15ac5219107e36ab3e62baf529ec35c1e55bcf40
      
https://github.com/Perl/perl5/commit/15ac5219107e36ab3e62baf529ec35c1e55bcf40
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Rename a couple of variables

The previous names erroneously implied these were associated with the
parameters to these functions; instead rename to indicate they are
associated with some local variables.


  Commit: bd887497638322ff6ff95aa438d6a4f70d96b212
      
https://github.com/Perl/perl5/commit/bd887497638322ff6ff95aa438d6a4f70d96b212
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Change doubled semicolon to single


  Commit: 3744b0fd0e82ae9935ce6152dbc96538a2d9d56d
      
https://github.com/Perl/perl5/commit/3744b0fd0e82ae9935ce6152dbc96538a2d9d56d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists.pl: Use feature signatures


  Commit: 6e7862d2c246b0fd3e9ad3954b7eff125a9732e5
      
https://github.com/Perl/perl5/commit/6e7862d2c246b0fd3e9ad3954b7eff125a9732e5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: White-space comments

This includes outdenting and indenting where future commits will add or
remove blocks


  Commit: 7e5d1ecd5b730af1345c15bb133f29ce17146b58
      
https://github.com/Perl/perl5/commit/7e5d1ecd5b730af1345c15bb133f29ce17146b58
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Clarify output table headings

Changes the wording for some table headings in the generated file to
indicate where to find what the abbreviations mean


  Commit: 42f18f06977f9c443a4b3eeb449268ec08b194b5
      
https://github.com/Perl/perl5/commit/42f18f06977f9c443a4b3eeb449268ec08b194b5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Sort some lists

These lists are densely packed.  It is easier to find something if they
are sorted


  Commit: 118c65df1e30f702109f85c0f734e287151e327d
      
https://github.com/Perl/perl5/commit/118c65df1e30f702109f85c0f734e287151e327d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Fix rule LB11

This rule is not affected by spaces, yet the code was saying it should
be.


  Commit: c9f1e896377ade7707595df081c280a3e22ea5eb
      
https://github.com/Perl/perl5/commit/c9f1e896377ade7707595df081c280a3e22ea5eb
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Fix rule LB12

This rule is not affected by spaces, yet the code was saying it should
be.


  Commit: ffa9a006c9ed7ceef2bcf2a4406c55bc576db843
      
https://github.com/Perl/perl5/commit/ffa9a006c9ed7ceef2bcf2a4406c55bc576db843
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Fix rule LB13

This rule was written here to not include the actions when the character
before the candidate break position is a number.  This is just plain
wrong.  The Unicode rules have never said this.


  Commit: 6c708475ecf243377447b652fa3daae1db9d7995
      
https://github.com/Perl/perl5/commit/6c708475ecf243377447b652fa3daae1db9d7995
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add extensive comments


  Commit: f805c77a1ad9c62dc5312847decfdeb92decb5de
      
https://github.com/Perl/perl5/commit/f805c77a1ad9c62dc5312847decfdeb92decb5de
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Narrow some output tables

Future Unicode releases will greatly explode the size of certain tables.
Prior to this commit, the minimum column size was two, but some table
columns fit in a single window column.  This commit changes to use the
minimum required.


  Commit: 64120a957ace112845c4ed8ada47722021806802
      
https://github.com/Perl/perl5/commit/64120a957ace112845c4ed8ada47722021806802
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Center row labels in output tables

This improves readability


  Commit: 7cab98315092d6a93151014335ca0cd4dfdadbb8
      
https://github.com/Perl/perl5/commit/7cab98315092d6a93151014335ca0cd4dfdadbb8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Move break tables edge cells

These tables are placed in charclass_invlists.h.  They have a row and
column for what happens when the position being checked for is at the
start or end of the text.  This commit reorders the tables so that the
edge row and column are, well, at the edges.  And it relabels the labels
to be '^' and '$' respectively.


  Commit: 0c7052a09144130f451b412f76204428f417a5ce
      
https://github.com/Perl/perl5/commit/0c7052a09144130f451b412f76204428f417a5ce
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Improve output table column headings

This uses a more complex algorithm to generate short labels to demarcate
rows and columns in some output tables.

This doesn't affect the current tables for Unicode 15.0, but will in
future Unicode releases.


  Commit: 46742bc2050fff54f622f095b188bd82638cea55
      
https://github.com/Perl/perl5/commit/46742bc2050fff54f622f095b188bd82638cea55
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Change two formal parameter names


  Commit: 99431fcab99449e5e563d560464253b7ab9d81e8
      
https://github.com/Perl/perl5/commit/99431fcab99449e5e563d560464253b7ab9d81e8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Move some lines earlier in their functions

where the next commits will want them


  Commit: df8dddf47fef2b139711279dcbc3a7c9774d768f
      
https://github.com/Perl/perl5/commit/df8dddf47fef2b139711279dcbc3a7c9774d768f
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Change a word to be more accurate

Everything is an action.  Some are accomplished via DFAs.  This commit
uses the latter word in places where it is a DFA.  It actually uses this
new term where it doesn't apply.  Future commits will remove those
inaccuracies.


  Commit: 43bb2c4938757f17896dd53128953f5225e0e86f
      
https://github.com/Perl/perl5/commit/43bb2c4938757f17896dd53128953f5225e0e86f
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Set and get break table values with functions

Previously, we would just set an individual element directly.  This
changes most of those to use function calls instead.  This has two main
benefits.  The function can change what's being done without having to
change many lines; and these sets had a lot of visual noise with sigils
and hash references.  The result is a lot easier to read.

The next few commits will continue this process.

Note that the generated tables are unchanged by this commit.  It has no
effect on runtime processing.  That will be true of the next commits as
well.

It became obvious in doing this that the rule for Perl_Tailored_HSpace
does not belong in the 3's, but comes immediately before that.
Arbitrarily use '2z'


  Commit: 7c7314732c902c1957578c1f74e01a8521cfac31
      
https://github.com/Perl/perl5/commit/7c7314732c902c1957578c1f74e01a8521cfac31
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Hoist calculation to sub callers

And pass the result to the subroutine.

This is in preparation for this value to be needed in additional places.


  Commit: af259bb098268d29818d6aa028a9a82995883d4e
      
https://github.com/Perl/perl5/commit/af259bb098268d29818d6aa028a9a82995883d4e
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Set values in unused table cells to 0

These cells exist so that code is less likely to need to be changed when
a new Unicode release comes along.  Currently it doesn't matter at all
what is in those cells, because they are never read.  But future commits
will want to make sure they don't refer to dfas that are obsolete and
whose references to could be undefined symbols that would abort the
compilation.

The choice of 0 or 1 to put in the cells was arbitrary; I know of no
reason to prefer one or the other


  Commit: 4828646c386be6a4712641c309369d9324539acb
      
https://github.com/Perl/perl5/commit/4828646c386be6a4712641c309369d9324539acb
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Reorder two statements

This now matches the order that Unicode gives; for easier checking that
our code matches their demands.


  Commit: c9814be231b244576ef4d80fe4dfab1e66e8f08d
      
https://github.com/Perl/perl5/commit/c9814be231b244576ef4d80fe4dfab1e66e8f08d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add ability to specify an entire row simply

Instead of having to loop through all the cells of a row or column, this
commit uses '*' to represent the whole thing.  This is more in keeping
with the text of the Unicode rules which just leaves thing blank if it
means everything;


  Commit: 394ab97c8e8bc58bd5f7f7fb70388d22e3596497
      
https://github.com/Perl/perl5/commit/394ab97c8e8bc58bd5f7f7fb70388d22e3596497
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Allow arbitrary list of cells

This follows up on the previous commit which allowed simply specifying
an entire row or column.  This adds the ability to specify a list.


  Commit: a3928940e2de937ee09225e0077b3371877531e8
      
https://github.com/Perl/perl5/commit/a3928940e2de937ee09225e0077b3371877531e8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add no_nobreak_override()

This new function allows removing loops from the main code


  Commit: b5b7ffab978dd7b5c864890ed22a1073c68adbbf
      
https://github.com/Perl/perl5/commit/b5b7ffab978dd7b5c864890ed22a1073c68adbbf
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add ability to specify a complement of list

And use it in one instance.

Previous commits have added the ability to pass multiple items simply to
the functions that work on rows and columns.  This now gives the ability
to complement the set of the multiple items passed.


  Commit: ef5c16bcbc4da046b75cfb7457f6b531de80d946
      
https://github.com/Perl/perl5/commit/ef5c16bcbc4da046b75cfb7457f6b531de80d946
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Handle Combining Mark: changes CMxZWJ

This is separated out from the previous commit because it is tricky XXX


  Commit: ada974d9729d8611132325727d5474932879f570
      
https://github.com/Perl/perl5/commit/ada974d9729d8611132325727d5474932879f570
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: move decls comments around


  Commit: 7a110968b0078953aec740ce8787ece46631d261
      
https://github.com/Perl/perl5/commit/7a110968b0078953aec740ce8787ece46631d261
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Allow abbreviations for break classes

And use them in a couple of places.  This will allow the rules to more
closely align with the Unicode text, which uses abbreviations just
sometimes.


  Commit: ef591789205839779fa39477c88771f14f1b74fb
      
https://github.com/Perl/perl5/commit/ef591789205839779fa39477c88771f14f1b74fb
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add effectively macro expansions

Unicode's Word Break rules have shortcut names that really mean multiple
ones.  For example, AHLetter means either ALetter or Hebrew_Letter.

This commit allows "macros" to be defined like this so that the
statements in this file more closely resemble those of the Unicode text.

More importantly, Unicode's rules in recent times need subdivided
equivalence classes, such as Alphabetics that are also East Asian.  What
has been done so far is when that happened, extra rules were added that
were all possible combinations of these subdivisions.  It is easy to
miss a combination; and it turns out there are bugs.  This new
capability allows us to say that an Alphabetic (ALetter) is a
combination of plain ALetters plus East Asian letters, and the code
generates all the combinations automatically.  This makes the text
cleaner and safer.


  Commit: fd333fa304a3c2aa616a768db0f4bf1cc67509c2
      
https://github.com/Perl/perl5/commit/fd333fa304a3c2aa616a768db0f4bf1cc67509c2
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use new split capability with ALetter

ALetter also contains the class ExtPict_LE.  Prior to this commit, there
had to be a rule for each ALetter doing the same thing with ExtPict_LE.
But the new splits capability allows ALetter to expand automatically to
both.

This uncovers a bug.  There should have been a rule
 WB5 ALetter x ExtPict_LE
which was missing.


  Commit: 80864093f81e4a8261bc4ed5ede8a6d249c910cf
      
https://github.com/Perl/perl5/commit/80864093f81e4a8261bc4ed5ede8a6d249c910cf
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use new split capability with ExtPict

\p{Extended_Pictographic} is not fully implemented yet because unlike
other properties, it can match a string instead of a single character.

And it is kind of a kludge here  The 14.0 release was analyzed by me and
the rules here were customized based on that analysis.  For example, in
the Line Break property, a clause was added by Unicode to Rule LB30b
that required taking the intersection of this property and all the
Unassigned code points.  It turns out that everything in that
intersection had the Line Break class of Ideographic, so I modified
mktables to split the Ideographic class into two components, the
elements of the intersection went into the long-named
"Unassigned_Extended_Pictographic_Ideographic" and plain Ideographic was
left with the remainder.  To match all of Ideographic you have to
specify both classes.  By using the new split capability, this can be
done effectively as a macro expansion, and the special cases can be
removed from the code.  This commit does this.

Similarly, both the Word Break and Grapheme Cluster Break properties
have somewhat different interactions with Extended_Pictographic that
this commit smooths over.

This situation is brittle.  A new release of Unicode might change things
so that Ideographic isn't the only LB class in the intersection
mentioned above, so the customization has to be checked in every
release.  A few commits later in this branch, this will be automated,
and no longer a concern.


  Commit: 84fdf56f436f0826385b872273c858ced3765886
      
https://github.com/Perl/perl5/commit/84fdf56f436f0826385b872273c858ced3765886
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists: Use new split capability with AHLetter

The description in UAX #29 of Unicode's Word Break property uses two
convenience macros to simplify some of their rules.

The split capability introduced several commits ago, allows this program
to follow along, making the rules here more closely aligned to the text
in UAX 29, hence simpler.

This commit creates one macro, AHLetter; the next commit does the other
macro.

The name of the DFAs involving this name are changed to correspond.


  Commit: 1b7556f51ff53c10b271418e122b2e64b3400b35
      
https://github.com/Perl/perl5/commit/1b7556f51ff53c10b271418e122b2e64b3400b35
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists: Use new split capability with MidNumLetQ

This follows on the previous commit, with the other Word Break property
name that Unicode macroizes


  Commit: 5810cf7b466509ce3d592c73dc128a39d02798cb
      
https://github.com/Perl/perl5/commit/5810cf7b466509ce3d592c73dc128a39d02798cb
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add ability to get set subtraction

This capability will be used in future commits, so that the
implementation can more closely follow Unicode's text


  Commit: 9a02f4d2eb2d2fcda3e9872e14de18b3d45530d5
      
https://github.com/Perl/perl5/commit/9a02f4d2eb2d2fcda3e9872e14de18b3d45530d5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use new set subtraction ability

This allows the removal of some combinatorial complexity, thus showing
a bug in which the combination of PO to EOP had not been added when it
should have been.

Currently, mktables splits the Line Break OP and CP classes into East
Asian ones, and the remainders.  The extra combinations occurred because
the code here needed to take every existing OP and add an East_Asian
(EA_OP) equivalent; same with CP.  It's easy to miss one, and I did.

This commit allows this split to be hidden from most places in
mk_invlists.


  Commit: 394edbac7d9f3043ecb1b6f65308b540599ec6ae
      
https://github.com/Perl/perl5/commit/394edbac7d9f3043ecb1b6f65308b540599ec6ae
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use abbreviations for Line Break

Unicode UAX #14 gives rules for the Line Break property using the short
names for them.  Prior to this commit, we mostly used the full names for
the classes in this property.  This commit changes to use the short
names.  This makes it easier to compare the code here with the UAX text.
The abbreviations aren't always straight forward, so it was easy to go
astray.


  Commit: 4195bab6b4c07c0361b230785d6c9869917ff59e
      
https://github.com/Perl/perl5/commit/4195bab6b4c07c0361b230785d6c9869917ff59e
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use 'for' statement modifier

This significantly cuts down on the verbiage, and makes the rules in
this file more closely match the text from which they are derived in UAX
14 and UAX 29


  Commit: a783a45b66258f465322f0f49e8a4106f5a8ce9d
      
https://github.com/Perl/perl5/commit/a783a45b66258f465322f0f49e8a4106f5a8ce9d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists: Improve DFA names

This commit now imposes more structure on the names.

The names are sort of pseudo code that lays out what the DFA is to do.
The most significant change is to standardize what has been done in
recent commits with newly added DFAs.  And that is to use the string
'_v_' in the name where the tip of the 'v' points to where position in
the input string being processed where this rule applies to.


  Commit: 659c76f27c28c1fa769c4ef360ba01e03aa5e5e4
      
https://github.com/Perl/perl5/commit/659c76f27c28c1fa769c4ef360ba01e03aa5e5e4
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add rule numbers to break table output

This is very helpful in debugging, and correlating the tables with the
Unicode UAX documents from which they are derived.


  Commit: 947d355cc2d2ca705e199c55c75d3c1c17e8deb5
      
https://github.com/Perl/perl5/commit/947d355cc2d2ca705e199c55c75d3c1c17e8deb5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Reorder some hash entries

The new order is based on the order of their respective rules; the next
commit expands these, and it makes it easier for a human to look up.


  Commit: ffa0be843b515845148d554c191859468bccbfe1
      
https://github.com/Perl/perl5/commit/ffa0be843b515845148d554c191859468bccbfe1
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add fields to data structure

This converts each DFA form just a number into a separate hash in a
bigger hash with more information besides that number.

This extra information will be needed in a future commit.


  Commit: e796006b4f9d9fc7ad1d2e43ee2f3b86306b0bb8
      
https://github.com/Perl/perl5/commit/e796006b4f9d9fc7ad1d2e43ee2f3b86306b0bb8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists/regexec.c: Generate and use macros

With this commit, mk_invlists.pl now generates #define macros isFOO that
regexec.c now uses to determine if a character is in a particular line
breaking class.  Previously, x == foo was used.  This change insulates
the code from having to worry about when classes get changed to be
combinations.


  Commit: 7bda4036b3fe11b2dcb9d3cf2b1a870a511368c5
      
https://github.com/Perl/perl5/commit/7bda4036b3fe11b2dcb9d3cf2b1a870a511368c5
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Reverse order of break property rules

Before this commit, the rules for populating the tables for break
properties were laid out in reverse order, so that the lowest priority
rule was executed first.  It filled a cell, which then would be
overwritten by any higher priority rule that applied to it.

This reverse order made it harder to compare the rules with the text of
the Unicode rules these are trying to implement.

This commit changes things to have the rules in the same order as
Unicode lists them.

The previous scheme had certain advantages that this has to make up for
by using temporary code to override what would otherwise have gone into
the cells.  This code will no longer be needed in a few commits when a
general purpose stacking DFA scheme is implemented.

As a result, of this temporary code, only two cells in one property
change as a result of this complete reversal.  They change to using a
DFA which ends up returning the same results as the original
unconditional value.


  Commit: 5beca8178fdce168b42655ce9c2d873273cfbeaa
      
https://github.com/Perl/perl5/commit/5beca8178fdce168b42655ce9c2d873273cfbeaa
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove obsolete function

This function was used when the previous scheme of applying the rules in
reverse order needed to be overridden in a few cases by prohibiting
changes to existing seemingly lower priority values.  Now there's
no lower priority value in the cell that we would need to preserve.


  Commit: 2c66fa6a3195efd21f887891ece84fb11174be76
      
https://github.com/Perl/perl5/commit/2c66fa6a3195efd21f887891ece84fb11174be76
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove some special cases

These were added to compensate for reversing the order of handling the
break property rules.  This commit hides the need for that in one place
per table, except for a second place for Line Break.

The only changes to the tables occur in the garbage row and column which
aren't actually accessed, so those changes are harmless.

It is a temporary commit.  A few commits from now, this will be removed.


  Commit: df2c9e56e9ffac45e6d8e96508c4a6eb4d630726
      
https://github.com/Perl/perl5/commit/df2c9e56e9ffac45e6d8e96508c4a6eb4d630726
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Add ability to tie table cell changes together

Some Unicode rules say that some cells are supposed to be changed at the
same time their master cells are.  This commit forms that linkage.


  Commit: 84f6854246ee6f4f1f37720b3ab5bdda0a3ce417
      
https://github.com/Perl/perl5/commit/84f6854246ee6f4f1f37720b3ab5bdda0a3ce417
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: rm no longer used sub

The previous commit took away the need for this.


  Commit: c0d42ef124187763922b8bcbc8f42fbd67d3a607
      
https://github.com/Perl/perl5/commit/c0d42ef124187763922b8bcbc8f42fbd67d3a607
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M regexec.c

  Log Message:
  -----------
  regexec.c: Change static function API

This makes it clearer to use.  Instead of having a boolean flag to
change the behavior, there are now two macros that call the underlying
function, and their names reflect the desired behavior


  Commit: 8ebc2205e8e5f22a2aeaaff5260e39eb5f5d3826
      
https://github.com/Perl/perl5/commit/8ebc2205e8e5f22a2aeaaff5260e39eb5f5d3826
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M regexec.c

  Log Message:
  -----------
  regexec.c: Change function name

The new name is longer, but it makes clear that it does something that
the reader of this code might find unexpected.


  Commit: f5541f64a3636d5e2d9a38eafb783ca644b0e7dd
      
https://github.com/Perl/perl5/commit/f5541f64a3636d5e2d9a38eafb783ca644b0e7dd
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M regexec.c

  Log Message:
  -----------
  regexec.c: Change static function API

Sometimes this functionality is needed to also skip over certain
intervening classes of characters while backing up in the parse string.
This commit creates two macros to call the modified underlying function
with a boolean flag.  This names of the macros make it easy to know
what's happening.


  Commit: 0feb9f9b94f03c72d08fa63af2b29372dd6a8419
      
https://github.com/Perl/perl5/commit/0feb9f9b94f03c72d08fa63af2b29372dd6a8419
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Skip CM and ZWJ in look behind in LB parsing

The Unicode standard says that these two characters are to be ignored
for the purposes of determining if there is a Line Break just before
certain characters.  That is, you have to back up in the parse string
past all adjacent ones of these, and then examine it.

This applies to any lower priority rule than LB9.  This commit fixes two
cases that didn't do that.


  Commit: c974f460e0188ea893e805dd197763fd6e16d2c1
      
https://github.com/Perl/perl5/commit/c974f460e0188ea893e805dd197763fd6e16d2c1
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M lib/unicore/mktables

  Log Message:
  -----------
  mktables: Consolidate code into a single function

Some properties in Unicode essentially form equivalence classes for all
possible code points.

For example, Unicode publishes the Line Break (LB) property, where each
possible code point is given a type, like Alphabetic, or Opening
Parenthesis.  All code points that act as alphabetics have the AL
equivalence class.  All that act like Opening Parentheses have the OP
class.

Unicode also publishes rules as to if it is permissible to break between
code point of any types.  For the Line Break property, you wouldn't
break a line between two alphabetics or between an opening parenthesis
and an alphabetic, but you could between a Space and almost any other
type or between a closing parenthesis and many types.

Perl uses these properties to implement the \b{lb} etc regular
expression constructs.  It uses a two-dimensional array where the value
in the cell [x,y] tells whether a break is permissible between
characters of type x and characters of type y.  (Some cases can't be
done with this simple lookup, but knowing the surrounding context is
necessary to make a decision.  Those are implemented as DFAs in
regexec.c.)

Unicode used to publish such an array for the Line Break property, and
still publishes some non-normative .html files that contain similar
information.  But to really know what to do, one has to read documents
UAX#14 and UAX#29 that contain textual descriptions of the rules.  These
change each new release, and are the major pain in upgrading to a new
release.

In recent releases, Unicode has mostly stopped creating new equivalence
classes as it has refined the rules for the boundary conditions  For
example, the line boundary conditions are very different for East Asian
(EA) characters than the Western scripts.  Effectively there are thus
two sets of rules.  But instead of creating new equivalence classes that
reflect this reality, Unicode has chosen to just document it in those
two UAX documents.  I don't know the motivation for this.

But perl wants that table to divvy up all the possible boundary
conditions, so it can continue to use the array to make most of the
decisions, so mktables splits the equivalence classes that Unicode
provides into new ones that reflect what the UAXes say.  At first, I
thought this was a one-off matter, so wrote a few lines to handle a
special case; then when the next release came out, added a few more for
another one, etc.  But Unicode 15.1 and 16.0 continue the trend, so it's
become an effort.

This commit consolidates the previous one-off code snippets into one
generalized function.  It should be able to handle future instances
without having to craft something new each time.

It also creates a new data structure that mk_invlists.pl can look at so
that it doesn't have to repeat the logic found here, as it currently
does.


  Commit: 41dc8a96c74d56297b9460f7f7e3dc6a26edfd55
      
https://github.com/Perl/perl5/commit/41dc8a96c74d56297b9460f7f7e3dc6a26edfd55
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists: Use new mktables enhancements

Now mk_invlists no longer has to know what the details are of properties
that have been split into more, smaller equivalence classes.  mktables
handles that and provides the information in new hashes.


  Commit: 46eb5e9419bde70ea40fef5f9ea69ed8a52d4c3a
      
https://github.com/Perl/perl5/commit/46eb5e9419bde70ea40fef5f9ea69ed8a52d4c3a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists: Generalize to stack DFAs for break properties

The Unicode breaking algorithms are supposed to be implemented by
executing DFAs in priority order, stopping at the first one that
succeeds.  (In many cases a DFA isn't needed, and we can unconditionally
say that there is or isn't a break at a given position simply by looking
at the characters on either side of it.)

But it was a significant amount of work to get from where perl started
to be able to do that.  And it hasn't been necessary until now.  In most
cases, a single DFA suffices, and where not, a more complicated single
DFA took care of the stacking.

But this has become untenable in Unicode 15.1, so I ended up doing the
work to implement their algorithm.  The result is more, but simpler
DFAs, and it becomes easier to add new ones, as they don't have to
interact with other ones.  The stacking does that for them.

This commit implements a separate DFA table beyond the x,y lookup table.
If the decision that this is a breakable position requires a DFA, the
x,y contents are an index into this separate table, which contains the
DFA to follow.  The first element gives the case statement number to use
to execute the DFA.  The second element gives the value to return if the
DFA succeeds.  If it fails, the code add +2 to get the next thing to
try.


  Commit: 62f71dcc711ec8a153adf8ea2fb9b8bc0184dd8a
      
https://github.com/Perl/perl5/commit/62f71dcc711ec8a153adf8ea2fb9b8bc0184dd8a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Use new DFA scheme for horizontal white space

Perl doesn't follow the Unicode standard with regard to its treatment of
white space, in particular sequences of horizontal white space.  Unicode
allows "tailoring" of its rules for local situations, and Perl
traditionally with \B  has treated all sequences of white space as a
single unit.  Unicode originally considered each space in a sequence of
them as a separate unit.  A perl program would want them all a single
unit.  Unicode eventually came round to our way of thinking, but not
entirely, as comments unaffected by this commit indicate.

The DFA for this situation does not fit in with the new stackable DFA
scheme, and woul start failing tests a few commits later as the shim
code is removed.  Convert to the new scheme, which allows us to call the
functions that affect a single cell twice with effect.  The order is
immaterial, but one call installs a default behavior, and the other a DFA
that ends up being executed first to override that behavior in certain
(rare) situations.


  Commit: 728a27770d0c3b202ef0e963f7b60c3ebe8cb4a3
      
https://github.com/Perl/perl5/commit/728a27770d0c3b202ef0e963f7b60c3ebe8cb4a3
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove accesses of unused parameter

set_cells() no longer reads this parameter; no need to pass it nor set it
up.


  Commit: 14bb467a0970422e0b2f95290b08cecb5f88ffc3
      
https://github.com/Perl/perl5/commit/14bb467a0970422e0b2f95290b08cecb5f88ffc3
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove a temporary work-around

This code was due to a few commits ago having reversed the ordering the
Unicode rules are applied in.  After updating to use a generalized DFA
scheme, it is no longer needed


  Commit: 5222f5c1b67ecd5d277ff47827801345bf312b3a
      
https://github.com/Perl/perl5/commit/5222f5c1b67ecd5d277ff47827801345bf312b3a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove a no longer used enum

The new generalized DFA scheme makes this value moot; it was used to get
around not having such a scheme.


  Commit: 300fbe95670cc110ea5712e27577df1094bab860
      
https://github.com/Perl/perl5/commit/300fbe95670cc110ea5712e27577df1094bab860
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove no longer used function


  Commit: 67707f3cf58a75fb1ff36c2e2115bd7ffeb9d21e
      
https://github.com/Perl/perl5/commit/67707f3cf58a75fb1ff36c2e2115bd7ffeb9d21e
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Look for a DFA optimization possibility

If both branches of an else lead to the same result, skip the else and
set the result unconditionally.  That's what this commit does for DFAs
that get the same value if they succeed as when they don't.

There is one current case where the DFA can return an anomalous result,
so it can't be optimized out.  Add a field to the hash entry defining
that entry, so it doesn't get optimized.


  Commit: 6224d071fc869aa0ad6ec62e5852188a4bcecbb0
      
https://github.com/Perl/perl5/commit/6224d071fc869aa0ad6ec62e5852188a4bcecbb0
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl

  Log Message:
  -----------
  mk_invlists: Remove hard-coded numbers

A couple of commits ago, the last necessarily-hard-coded DFA enum
besides 0 and 1  was removed.  This allows for all the rest to be
assigned by using the value of an incrementing variable.

This makes it easy to add DFAs in the middle of existing ones, as will
happen as future Unicode releases come our way.


  Commit: 2521a049a533f398a8ed71304b7e395555a7f08a
      
https://github.com/Perl/perl5/commit/2521a049a533f398a8ed71304b7e395555a7f08a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c
    M regexp_constants.h

  Log Message:
  -----------
  mk_invlists: Add a shorter form DFA

This is just for legibility of reading the rules


  Commit: cb4b9028b02f8e42806258aca1b5f3d5f18eb498
      
https://github.com/Perl/perl5/commit/cb4b9028b02f8e42806258aca1b5f3d5f18eb498
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M lib/Unicode/UCD.t

  Log Message:
  -----------
  lib/Unicode/UCD.t: Prepare for Unicode 15.1

The  numeric value for U+5146 changed in 15.1


  Commit: 01207563b6f7538f67abc5108cc899067f25db3d
      
https://github.com/Perl/perl5/commit/01207563b6f7538f67abc5108cc899067f25db3d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M lib/Unicode/UCD.t

  Log Message:
  -----------
  UCD.t: Skip test which fails on 32 bit words

In Unicode 15.1, the ideograph U+4EAC now has a numeric value, and that
value is 10 quadrillion (1e+16).  This is the first instance in Unicode
of an integer not fitting in a 32 bit word, as this requires 49 bits.
One of the tests in UCD.t requires round-trip equality in converting
from string to number and back; skip it for this case and any future
similar ones.

I find it interesting that U+4EAC is listed as having the meaning
"capital city".


  Commit: 3bd216255ce5263db558c9ed82d4aa4bb4816663
      
https://github.com/Perl/perl5/commit/3bd216255ce5263db558c9ed82d4aa4bb4816663
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M regen/mk_invlists.pl
    M regexec.c

  Log Message:
  -----------
  mk_invlists/regexec.c: Prepare for Unicode 15.1


  Commit: 1f497ea027b2899b12daefa48c6be602f4e91bb3
      
https://github.com/Perl/perl5/commit/1f497ea027b2899b12daefa48c6be602f4e91bb3
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M lib/unicore/mktables
    M regcharclass.h

  Log Message:
  -----------
  mktables: Prepare for Unicode 15.1


  Commit: b75c7517558990201b788d174f73bd2d4248da89
      
https://github.com/Perl/perl5/commit/b75c7517558990201b788d174f73bd2d4248da89
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M MANIFEST
    M charclass_invlists.inc
    M lib/Unicode/UCD.t
    M lib/unicore/ArabicShaping.txt
    M lib/unicore/BidiBrackets.txt
    M lib/unicore/BidiMirroring.txt
    M lib/unicore/Blocks.txt
    M lib/unicore/CJKRadicals.txt
    M lib/unicore/CaseFolding.txt
    M lib/unicore/CompositionExclusions.txt
    M lib/unicore/DAge.txt
    M lib/unicore/DCoreProperties.txt
    M lib/unicore/DNormalizationProps.txt
    M lib/unicore/EastAsianWidth.txt
    M lib/unicore/EmojiSources.txt
    M lib/unicore/EquivalentUnifiedIdeograph.txt
    M lib/unicore/HangulSyllableType.txt
    M lib/unicore/IdStatus.txt
    M lib/unicore/IdType.txt
    M lib/unicore/Index.txt
    M lib/unicore/IndicPositionalCategory.txt
    M lib/unicore/IndicSyllabicCategory.txt
    M lib/unicore/Jamo.txt
    M lib/unicore/LineBreak.txt
    M lib/unicore/NameAliases.txt
    M lib/unicore/NamedSequences.txt
    M lib/unicore/NamedSqProv.txt
    M lib/unicore/NamesList.txt
    M lib/unicore/NormTest.txt
    M lib/unicore/NormalizationCorrections.txt
    M lib/unicore/PropList.txt
    M lib/unicore/PropValueAliases.txt
    M lib/unicore/PropertyAliases.txt
    M lib/unicore/ReadMe.txt
    M lib/unicore/ScriptExtensions.txt
    M lib/unicore/Scripts.txt
    M lib/unicore/SpecialCasing.txt
    M lib/unicore/StandardizedVariants.txt
    M lib/unicore/UnicodeData.txt
    M lib/unicore/VerticalOrientation.txt
    M lib/unicore/auxiliary/GCBTest.txt
    M lib/unicore/auxiliary/GraphemeBreakProperty.txt
    M lib/unicore/auxiliary/LBTest.txt
    M lib/unicore/auxiliary/SBTest.txt
    M lib/unicore/auxiliary/SentenceBreakProperty.txt
    M lib/unicore/auxiliary/WBTest.txt
    M lib/unicore/auxiliary/WordBreakProperty.txt
    M lib/unicore/emoji/emoji.txt
    M lib/unicore/extracted/DBidiClass.txt
    M lib/unicore/extracted/DBinaryProperties.txt
    M lib/unicore/extracted/DCombiningClass.txt
    M lib/unicore/extracted/DDecompositionType.txt
    M lib/unicore/extracted/DEastAsianWidth.txt
    M lib/unicore/extracted/DGeneralCategory.txt
    M lib/unicore/extracted/DJoinGroup.txt
    M lib/unicore/extracted/DJoinType.txt
    M lib/unicore/extracted/DLineBreak.txt
    M lib/unicore/extracted/DNumType.txt
    M lib/unicore/extracted/DNumValues.txt
    A lib/unicore/intentional.txt
    M lib/unicore/uni_keywords.pl
    M lib/unicore/version
    M regcharclass.h
    M regen/mk_invlists.pl
    M regexp_constants.h
    M uni_keywords.h
    M unicode_constants.h

  Log Message:
  -----------
  mk_invlists: Restore calculation of new keywords, etc

Now we are ready to use a new Unicode version, we have to regenerate
everything.  This was turned off earlier in this branch temporarily
until now so as to speed up the testing, as it was known these values
wouldn't change until now.


  Commit: 2fbb3d680f790e39d4ec24b3bbebe9a4039f8e34
      
https://github.com/Perl/perl5/commit/2fbb3d680f790e39d4ec24b3bbebe9a4039f8e34
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mk_invlists: Include cells in calculating column widths

This program generates tables for the Break properties that are somewhat
human readable.  Before this commit, just the heading line for a column
determined its width.  This commit factors in the maximum width of any
cell in the column as well.  It used to be that this required a separate
pass, and so wasn't done.  But now that separate pass is required anyway
for other reasons, and it is simple to add to it this check.


  Commit: 5ed43b075c536111a720f254966a942c715ef48c
      
https://github.com/Perl/perl5/commit/5ed43b075c536111a720f254966a942c715ef48c
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regexec.c
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mk_invlists/regexec.c: Prepare for Unicode 16.0


  Commit: 845c437d2e5081da6297f0327b7077699fe0469a
      
https://github.com/Perl/perl5/commit/845c437d2e5081da6297f0327b7077699fe0469a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    M regcharclass.h
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mktables: Prepare for Unicode 16.0


  Commit: 0fb7536d663f8f5e08bf23a72974e7e8a87ae60e
      
https://github.com/Perl/perl5/commit/0fb7536d663f8f5e08bf23a72974e7e8a87ae60e
  Author: Unicode Consortium <unicode.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M MANIFEST
    M charclass_invlists.inc
    M lib/Unicode/UCD.t
    M lib/unicore/ArabicShaping.txt
    M lib/unicore/BidiBrackets.txt
    M lib/unicore/BidiMirroring.txt
    M lib/unicore/Blocks.txt
    M lib/unicore/CJKRadicals.txt
    M lib/unicore/CaseFolding.txt
    M lib/unicore/CompositionExclusions.txt
    M lib/unicore/DAge.txt
    M lib/unicore/DCoreProperties.txt
    M lib/unicore/DNormalizationProps.txt
    A lib/unicore/DoNotEmit.txt
    M lib/unicore/EastAsianWidth.txt
    M lib/unicore/EmojiSources.txt
    M lib/unicore/EquivalentUnifiedIdeograph.txt
    M lib/unicore/HangulSyllableType.txt
    M lib/unicore/IdStatus.txt
    M lib/unicore/IdType.txt
    M lib/unicore/Index.txt
    M lib/unicore/IndicPositionalCategory.txt
    M lib/unicore/IndicSyllabicCategory.txt
    M lib/unicore/Jamo.txt
    M lib/unicore/LineBreak.txt
    M lib/unicore/NameAliases.txt
    M lib/unicore/NamedSequences.txt
    M lib/unicore/NamedSqProv.txt
    M lib/unicore/NamesList.txt
    M lib/unicore/NormTest.txt
    M lib/unicore/NormalizationCorrections.txt
    M lib/unicore/PropList.txt
    M lib/unicore/PropValueAliases.txt
    M lib/unicore/PropertyAliases.txt
    M lib/unicore/ReadMe.txt
    M lib/unicore/ScriptExtensions.txt
    M lib/unicore/Scripts.txt
    M lib/unicore/SpecialCasing.txt
    M lib/unicore/StandardizedVariants.txt
    M lib/unicore/TestNorm.pl
    M lib/unicore/UnicodeData.txt
    A lib/unicore/Unikemet.txt
    M lib/unicore/VerticalOrientation.txt
    M lib/unicore/auxiliary/GCBTest.txt
    M lib/unicore/auxiliary/GraphemeBreakProperty.txt
    M lib/unicore/auxiliary/LBTest.txt
    M lib/unicore/auxiliary/SBTest.txt
    M lib/unicore/auxiliary/SentenceBreakProperty.txt
    M lib/unicore/auxiliary/WBTest.txt
    M lib/unicore/auxiliary/WordBreakProperty.txt
    M lib/unicore/emoji/emoji.txt
    M lib/unicore/extracted/DBidiClass.txt
    M lib/unicore/extracted/DBinaryProperties.txt
    M lib/unicore/extracted/DCombiningClass.txt
    M lib/unicore/extracted/DDecompositionType.txt
    M lib/unicore/extracted/DEastAsianWidth.txt
    M lib/unicore/extracted/DGeneralCategory.txt
    M lib/unicore/extracted/DJoinGroup.txt
    M lib/unicore/extracted/DJoinType.txt
    M lib/unicore/extracted/DLineBreak.txt
    M lib/unicore/extracted/DNumType.txt
    M lib/unicore/extracted/DNumValues.txt
    M lib/unicore/uni_keywords.pl
    M lib/unicore/version
    M regcharclass.h
    M regexp_constants.h
    M uni_keywords.h
    M unicode_constants.h

  Log Message:
  -----------
  Add Unicode 16.0

This is includes updates to a few perl files that need to know the
current Unicode version, and regenerating perl files that depend on the
Unicode data


  Commit: 23d838275b4819f8e6b3768a67dff7ed62cc3133
      
https://github.com/Perl/perl5/commit/23d838275b4819f8e6b3768a67dff7ed62cc3133
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    M regcharclass.h
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mktables: Note break table code for Unicode 16.0 is updated


  Commit: 87ab6eb9b6671eddae97502a08ac0b33d3367d0e
      
https://github.com/Perl/perl5/commit/87ab6eb9b6671eddae97502a08ac0b33d3367d0e
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/uni_keywords.pl
    M regcharclass.h
    M regen/mk_invlists.pl
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mk_invlists: Restore generating EBCDIC

This had been turned off in this branch to speed up compilatian, and
hence development.  The code mostly changed in this branch is the same
as in ASCII anyway.  It could have become an issue only if someone tries
to bisect on an EBCDIC machine, which I don't believe has happened, if
ever, in decades.


  Commit: 1b7d99229cac051299930927aec2c25c44d69823
      
https://github.com/Perl/perl5/commit/1b7d99229cac051299930927aec2c25c44d69823
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M t/porting/regen.t

  Log Message:
  -----------
  Revert "Temporarily skip regen porting test in this branch"

This temporary commit has now served its purpose.


  Commit: 9a483158fe0f1bdcd4461c56f3bb4afbcb29e4ae
      
https://github.com/Perl/perl5/commit/9a483158fe0f1bdcd4461c56f3bb4afbcb29e4ae
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M charclass_invlists.inc
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regexp_constants.h
    M uni_keywords.h

  Log Message:
  -----------
  mk_invlists: Update comments


  Commit: 7c4efc433361b4259ac336d885701248c97471a2
      
https://github.com/Perl/perl5/commit/7c4efc433361b4259ac336d885701248c97471a2
  Author: Karl Williamson <k...@cpan.org>
  Date:   2025-04-20 (Sun, 20 Apr 2025)

  Changed paths:
    M pod/perldelta.pod

  Log Message:
  -----------
  perldelta for Unicode update


Compare: https://github.com/Perl/perl5/compare/c308ac4c9085...7c4efc433361

To unsubscribe from these emails, change your notification settings at 
https://github.com/Perl/perl5/settings/notifications

Reply via email to