Branch: refs/heads/blead
Home: https://github.com/Perl/perl5
Commit: 31c9996116abb1e2bec2f9a6477ad164dd32062d
https://github.com/Perl/perl5/commit/31c9996116abb1e2bec2f9a6477ad164dd32062d
Author: Karl Williamson <[email protected]>
Date: 2026-01-15 (Thu, 15 Jan 2026)
Changed paths:
M charclass_invlists.inc
M embed.h
M handy.h
M l1_char_class_tab.h
M lib/unicore/uni_keywords.pl
M regen/mk_PL_charclass.pl
M regexp_constants.h
M uni_keywords.h
Log Message:
-----------
Add class for underscore character to l1_char_class_tab.h
l1_char_class_tab.h categorizes characters in the Latin1 range into
various classes, mostly into the POSIX classes like [:word:]. Each
character has a bit set corresponding to every class it is a member of.
These values are placed in a 256-element array and the ordinal value of
a character is used as an index into it for quick determination of if a
character is a member of a given class.
Besides the POSIX classes, there are some classes that make it more
convenient and/or faster for our code. For example, there is a class
that allows us to quickly know if a given character is one that needs to
be preceded by a backslash by quotemeta().
This commit adds a class for the single character underscore '_', and a
macro that allows for seeing if a character is either an underscore or a
member of any other class, using a single conditional.
This means code that checks for if character X is either an underscore
or a member of class Y can change to eliminate one conditional.
Thus the reason to do this is efficiency. Currently, the only places
that do this explicitly are in non-hot code. But I have wip that has
hot code that could benefit from this.
The only downside of doing this is that it uses up one bit of the 32
available (without shenanigans) for such classes, leaving 4 spare. But
before this release, the last time any new bit had been used up was
5.32, so the rate of using these spare up is quite low.
This bit could be reclaimed because the IDFIRST class in the Latin1
range is identical to ALPHA plus the underscore, so it could be
rewritten as that combination and its bit freed up. However, this would
require adding some macros that take two class parameters instead of
one. I briefly thought about doing that now, but since we have spare
bits and the rate of using them up is low, I didn't think it was worth
it at this time.
\w in this range is ALPHANUMERIC plus underscore. But its use is more
embedded than IDFIRST is, so an attempt to reclaim its bit would require
more effort.
Commit: 2bdcfe0cb24c894afd4c2a62625040158e2fc684
https://github.com/Perl/perl5/commit/2bdcfe0cb24c894afd4c2a62625040158e2fc684
Author: Karl Williamson <[email protected]>
Date: 2026-01-15 (Thu, 15 Jan 2026)
Changed paths:
M handy.h
M toke.c
Log Message:
-----------
Add isDIGIT_or_UNDERSCORE() and use it
This uses the macro added in the previous commit to create this new
macro, and changes code in toke.c to use it.
toke.c is not hot code, but this demonstrates that the new scheme works,
and makes the code in toke.c a bit cleaner.
Compare: https://github.com/Perl/perl5/compare/b116971bb030...2bdcfe0cb24c
To unsubscribe from these emails, change your notification settings at
https://github.com/Perl/perl5/settings/notifications