Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: a67a651c71e3c673a0d1d99f5c041832ba16e0e0
      
https://github.com/Perl/perl5/commit/a67a651c71e3c673a0d1d99f5c041832ba16e0e0
  Author: Karl Williamson <k...@cpan.org>
  Date:   2024-11-04 (Mon, 04 Nov 2024)

  Changed paths:
    M perl.h

  Log Message:
  -----------
  perl.h: Add comments regarding UTF-8 conversion table

UTF-8 is one of several sanctioned ways of encoding Unicode "code
points".  But a code point is, at its heart, just a non-negative integer.
The mechanism of UTF-8 can't handle numbers 2**36 and higher.  (And
Unicode and other standards artificially limit what numbers are
considered acceptable.)

Perl decided to create an extension to UTF-8 for representing higher
values, so it could be used for any 64-bit number.

We now have a DFA that translates UTF-8 for numbers less than 2**36.
For larger numbers, a different mechanism (the older one) is used.
The DFA uses table lookup.  To get it to accept larger numbers, the
table would have to be widened from U8 to U16 (and the numbers in it
recalculated).

The table is about 180 bytes now.  Widening it wouldn't consume that
many more bytes in the grand scheme of things, but I don't know of
anyone actually using these extremely large numbers, so I haven't felt
that it is worth it.

But every so often, I get curious about what it would take, so this
commit sketches that out, for possible future reference.



To unsubscribe from these emails, change your notification settings at 
https://github.com/Perl/perl5/settings/notifications

Reply via email to