[Perl/perl5] a09142: Fix \d script run with unusual Unicode data layout

Karl Williamson via perl5-changes Mon, 02 Sep 2024 11:58:12 -0700

  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: a091427ee8a210221dc488b4d263632edc72e29c
      
https://github.com/Perl/perl5/commit/a091427ee8a210221dc488b4d263632edc72e29c
  Author: Karl Williamson <k...@cpan.org>
  Date:   2024-09-02 (Mon, 02 Sep 2024)


  Changed paths:
    M regexec.c
    M t/re/script_run.t

  Log Message:
  -----------
  Fix \d script run with unusual Unicode data layout

This fixes GH #22535

Unicode guarantees that \d code points occur in groups of 10 consecutive
ones, with the lowest having a numeric value of 0 and the highest having
a value of 9.

A script run in a regular expression pattern matches only characters in
a single script.  Further, if more than a single digit is matched, all
must come from the same group of 10 consecutive code points.

The 'Common' script has many such groups, not just 0-9.  Perl's
implementation assumed that all groups were isolated from each other in
the Unicode ordering of code points.  This is true in all but one case
where there are 5 groups which adjoin each other.  This commit changes
the implementation to be cognizant of this possibility.



To unsubscribe from these emails, change your notification settings at 
https://github.com/Perl/perl5/settings/notifications

[Perl/perl5] a09142: Fix \d script run with unusual Unicode data layout

Reply via email to