Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: a67a651c71e3c673a0d1d99f5c041832ba16e0e0 https://github.com/Perl/perl5/commit/a67a651c71e3c673a0d1d99f5c041832ba16e0e0 Author: Karl Williamson <k...@cpan.org> Date: 2024-11-04 (Mon, 04 Nov 2024)
Changed paths: M perl.h Log Message: ----------- perl.h: Add comments regarding UTF-8 conversion table UTF-8 is one of several sanctioned ways of encoding Unicode "code points". But a code point is, at its heart, just a non-negative integer. The mechanism of UTF-8 can't handle numbers 2**36 and higher. (And Unicode and other standards artificially limit what numbers are considered acceptable.) Perl decided to create an extension to UTF-8 for representing higher values, so it could be used for any 64-bit number. We now have a DFA that translates UTF-8 for numbers less than 2**36. For larger numbers, a different mechanism (the older one) is used. The DFA uses table lookup. To get it to accept larger numbers, the table would have to be widened from U8 to U16 (and the numbers in it recalculated). The table is about 180 bytes now. Widening it wouldn't consume that many more bytes in the grand scheme of things, but I don't know of anyone actually using these extremely large numbers, so I haven't felt that it is worth it. But every so often, I get curious about what it would take, so this commit sketches that out, for possible future reference. To unsubscribe from these emails, change your notification settings at https://github.com/Perl/perl5/settings/notifications