[perl.git] branch smoke-me/khw-encode created. v5.27.8-37-g35f52632db

Karl Williamson Sat, 27 Jan 2018 17:03:49 -0800

In perl.git, the branch smoke-me/khw-encode has been created

<https://perl5.git.perl.org/perl.git/commitdiff/35f52632dbc74b979cd001d46b31b95088497d28?hp=0000000000000000000000000000000000000000>


        at  35f52632dbc74b979cd001d46b31b95088497d28 (commit)

- Log -----------------------------------------------------------------
commit 35f52632dbc74b979cd001d46b31b95088497d28
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 27 17:43:00 2018 -0700

    Add utf8n_to_uvchr_msgs()

commit ec8ed495a8f923b898912beec0d3cdf246ef0629
Author: Pali <[email protected]>
Date:   Wed Sep 13 00:30:29 2017 +0200

    Rewrite encode, decode, encode_utf8, decode_utf8 and from_to functions to XS

commit 3f035f5843f7ffb8bafcba315e0e1dac7a834f84
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:57:22 2017 -0700

    encengine.c: Properly indent code within blocks
    
    This makes it much more legible

commit c3a80fcede5376caf82d7e599813b7e834db0625
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:29:43 2017 -0700

    Speed up UTF-8 validation checking on modern perls
    
    Perl 5.26 introduced infrastructure in the core that can be used by
    Encode to check UTF-8 stream validity much faster than before.
    
    It is not clear when or if this functionality will be backported into
    Devel::PPPort, in part because there is no one available currently who
    knows how to do it, and in part because it may be that everyone else
    relies on Encode, so it's not needed generally to be backported.
    
    This commit replaces the current scheme for checking UTF-8 validity if
    the infrastructure is availabe, by one in which normal processing
    doesn't require having to decode the UTF-8 into code points.  The
    copying of characters individually from the input to the output is
    changed to be a single operation for each entire span of valid input at
    once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    There are currently some problems with Encode on EBCDIC platforms.  The
    infrastructure is known to correctly work there, so I'm hopeful this
    will solve these portability issues.

commit 05d4825675873646ecdd3e4d2304f460956d7097
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:09:06 2017 -0700

    Encode/Encode.xs: Pull condition out of loop
    
    The value for this condition is known before the loop, so move it
    outside the loop.

commit e111589f21cee62da340286501e971292eea56e7
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:06:45 2017 -0700

    Encode/encode.h: Use system REPLACEMENT char if available
    
    On modern perls, there is a definition for the REPLACEMENT CHARACTER
    UTF-8 string.  Use this if available, as it is portable to EBCDIC, and
    this one isn't.

commit 000dafcffe3b1c3b3ee7114dc3e9aa23c2b20cf2
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:04:15 2017 -0700

    Encode: Add comments
    
    This documents process_utf8(), and adds another helpful comment

commit 28259d78b909be18709053dae284e8f50dc9c3af
Author: Karl Williamson <[email protected]>
Date:   Thu Dec 28 14:01:34 2017 -0700

    Encode: White space only
    
    This correctly indents things in blocks, and removes trailing space

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

[perl.git] branch smoke-me/khw-encode created. v5.27.8-37-g35f52632db

Reply via email to