In perl.git, the branch smoke-me/khw-invariant has been created <https://perl5.git.perl.org/perl.git/commitdiff/0053f1991c830aa13e56a140877e41c7f7db28d6?hp=0000000000000000000000000000000000000000>
at 0053f1991c830aa13e56a140877e41c7f7db28d6 (commit) - Log ----------------------------------------------------------------- commit 0053f1991c830aa13e56a140877e41c7f7db28d6 Author: Karl Williamson <k...@cpan.org> Date: Sun Nov 26 23:12:24 2017 -0700 f commit 3e2eef46aa55fe5dc028208fcb08c1a3a9984bc2 Author: Karl Williamson <k...@cpan.org> Date: Sun Nov 26 22:32:58 2017 -0700 Fix and clarify the pod for utf8_length() Contrary to what it previously said, it does not croak. This clarifies what happens if the start and end pointers have the same value. commit 30b1ba44956228f29bf7068c4595480d2af6e10a Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 22 23:10:58 2017 -0700 pp_multiconcat() Use faster UTF-8 variant counting commit 7c096f35089a6ef113a94a3c4b3fb3a366c81061 Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 22 23:10:01 2017 -0700 S_multiconcat() Use faster variant counting commit cf890f508c70b9c4f560c27f62cf4f6d570d81ba Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 22 23:12:37 2017 -0700 toke.c: lex_stuff_pvn() Use faster UTF-8 variant count commit a75c2ea362da78769ef0538ea078d07460297e59 Author: Karl Williamson <k...@cpan.org> Date: Thu Nov 23 09:42:24 2017 -0700 XXX Clean up, names columns comments svgrow mention EBCDIC commit 32a07c33d2ed6c1cc42fb5e27c2fa431fb72e2de Author: Karl Williamson <k...@cpan.org> Date: Wed Nov 22 22:30:16 2017 -0700 Add variant_under_utf8_count() core function This function takes a string that isn't encoded in UTF-8 (hence is assumed to be in Latin1), and counts how many of the bytes therein would change if it were to be translated into UTF-8. Each such byte will occupy two UTF-8 bytes. This function is useful for calculating the expansion factor precisely when converting to UTF-8, so as to know how much to malloc. This function uses a non-obvious method to do the calculations word-at-a-time, as opposed to the byte-at-a-time method used now, and hence should be much faster than the current methods. And, in fact the comparison below shows that the new method uses 1/8 the conditionals and 1/8 of the reads that the old one did on a platform with 64-bit words. byte word ------ ------ Ir 100.00 363.49 Dr 100.00 798.48 Dw 100.00 102.22 COND 100.00 799.38 IND 100.00 100.00 COND_m 100.00 100.00 IND_m 100.00 100.00 Ir_m1 100.00 100.00 Dr_m1 100.00 100.00 Dw_m1 100.00 100.00 Ir_mm 100.00 100.00 Dr_mm 100.00 100.00 Dw_mm 100.00 100.00 I found this trick on the internet many years ago, but I can't seem to find it again to give them credit. ----------------------------------------------------------------------- -- Perl5 Master Repository