In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/19da912b851259f7c1c05868b9e2bee2eb376fec?hp=0000000000000000000000000000000000000000>

        at  19da912b851259f7c1c05868b9e2bee2eb376fec (commit)

- Log -----------------------------------------------------------------
commit 19da912b851259f7c1c05868b9e2bee2eb376fec
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 21 16:15:08 2016 -0600

    Centralize definitions of MIN, MAX
    
    Instead of having each file have them, keep them in handy.h, but only
    for core compilations.

M       handy.h
M       regcomp.c
M       utf8.c

commit 84248416e5ca5915303eb3dc910f8e45f006b874
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 19 10:14:30 2016 -0600

    Add is_utf8_fixed_width_buf_flags() and use it
    
    This encodes a simple pattern that may not be immediately obvious to
    someone needing it.  If you have a fixed-size buffer that is full of
    purportedly UTF-8 bytes, is it valid or not?  It's easy to do, as shown
    in this commit.  The file test operators -T and -B can be simpified by
    using this function.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       inline.h
M       pp_sys.c
M       proto.h

commit b301f8f39450d363f00e651bae8e17252833eb52
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 19 09:59:32 2016 -0600

    Add API Unicode handling functions
    
    These functions are all extensions of the is_utf8_string_foo()
    functions, that restrict the UTF-8 recognized as valid in various ways.
    There are named ones for the two definitions that Unicode makes, and
    foo_flags ones for more custom restrictions.
    
    The named ones are implemented as tries, while the flags ones provide
    complete generality

M       embed.fnc
M       embed.h
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       inline.h
M       proto.h
M       utf8.h

commit c956f9ea81f0cbb1d7f9c0cc29a572d616f6a8b5
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Sep 20 10:12:45 2016 -0600

    XS-APItest/t/utf8.t:  Add some tests
    
    These will help in testing the string functions coming in the next
    commit.  These add problematic code points to the first testing loop.
    As a result some of the tests in the final loop may be redundant, but
    since this .t is quick to run, I chose not to investigate and remove any
    such.

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/t/utf8.t

commit abb5d36533f4c0fc6087e0a739c4e97ade12b1e2
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 19 09:52:57 2016 -0600

    perlapi: Clarifications, nits in Unicode support docs
    
    This also does a white space change to inline.h

M       inline.h
M       utf8.h

commit 9f3c6bbefb373a75803c40429ad56efba5cb925a
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 19:57:46 2016 -0600

    Move #define to different header
    
    Instead of having a comment in one header pointing to the #define in the
    other, remove the indirection and just have the #define itself where it
    is needed.

M       inline.h
M       utf8.h

commit deb2936be5c0385cdff112c661bf386e0f40b100
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit 802702d423ec5d183d103acc61efcec2cc83d01d
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:06:39 2016 -0600

    perlapi: Minor clarifications to sv_utf8_decode

M       sv.c

commit 2cb8c070042d310249701ef0669e719e590f6246
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit c9ad34addd75e92479b58a6cd9e7870c31f3a8dc
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit d28fad6645ec4631230697c29525524860cf47b7
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit f9fb00a51842c92f0bf34d214f4f318d8d710a85
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit dd3345e977fd030e06af998def5adea0a9a88ab2
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to