In perl.git, the branch smoke-me/khw-utf8 has been created

<https://perl5.git.perl.org/perl.git/commitdiff/eda76014614ada6c83b771d33a470be78e5af652?hp=0000000000000000000000000000000000000000>

        at  eda76014614ada6c83b771d33a470be78e5af652 (commit)

- Log -----------------------------------------------------------------
commit eda76014614ada6c83b771d33a470be78e5af652
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Feb 4 21:52:13 2018 -0700

    Significantly speed up bytes_to_utf8()
    
    This function replaces its input into its non-UTF-8 equivalent.  For
    malformed inputs or those that can't be translated into only bytes, it
    returns an error, leaving the input unchanged.
    
    Prior to this commit, the meat of this function was two loops, one to
    figure out if the input was downgradable, and the second to do the
    translation if the first didn't find any problems.
    
    This commit changes that first loop to use per-word operations, at the
    expense of extra shifting and masking per iteration.  But on a 64-bit
    machine, there are 1/8 the iterations compared to currently.  I haven't
    done the benchmarks for this, because I've done them for several similar
    changes recently, and that 1/8 the conditionals is a big win.
    
    That new loop doesn't do a full syntax check; it just verifies that
    the only start bytes in it are ones that are not for wide characters.
    
    The second loop does the translation, while verifying well-formedness.
    If a malformation is found, the changes to the input so far are backed
    out.  Since the first loop has ruled out all problems except
    malformedness, the back out should very rarely happen.

commit 87c767ac1782b17093cd5a3adbc64dd67e61aebe
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Feb 4 21:47:09 2018 -0700

    APItest: Add tests for utf8_to_bytes()

commit 50e3c09c17783d2d6e1edf26531cba79a0663f59
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Feb 4 21:44:17 2018 -0700

    APItest:t/utf8_setup.pl: Display printables as themselves
    
    Instead of the harder to read \xXX

commit 8fba606f87731f6317061e54561b8d10e4aaf8ac
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Jun 6 02:06:30 2017 -0600

    XXX Don't push; experimental utf8_to_bytes backout if wide char
    
    This changes utf8_to_bytes() to not do an initial scan before starting
    the conversion.  If it encounters a wide character that means it should
    fail, it undoes what it's already done.
    
    This is faster if the frequency of being called on input that can't be
    downgraded is small.

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

Reply via email to