In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/c2c4578f00d535f111698c2aadbb4aa2cee2caf3?hp=0000000000000000000000000000000000000000>

        at  c2c4578f00d535f111698c2aadbb4aa2cee2caf3 (commit)

- Log -----------------------------------------------------------------
commit c2c4578f00d535f111698c2aadbb4aa2cee2caf3
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit cb16414c43485413e3146d3d560dcb40b088abce
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:06:39 2016 -0600

    perlapi: Minor clarifications to sv_utf8_decode

M       sv.c

commit 85759ad705520295924451f81a52050a70de23c3
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit 467d40072504c288dc2ffad3dc50aeecf6448526
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit 5f03026600264f8f446fd8a06d49d3b42a83b03d
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit 17cc6f7ed3774e3f472f61e103d1a1fda982a3b1
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit 1962be345e86b9fa3c90f5a6b041895b62b4149a
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs

commit 95b7397c9f5b0bc6b6f59ea73dd254453ea66803
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 20:15:56 2016 -0600

    XXX tests: Add is_utf8_buf_flags() and use it
    
    This encodes a simple pattern that may not be immediately obvious to
    someone needing it.  If you have a fixed-size buffer that is full of
    purportedly UTF-8 bytes, is it valid or not?  It's easy to do, as shown
    in this commit.  The file test operators -T and -B can be simpified by
    using this function.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_sys.c
M       proto.h

commit 313692bc0c0c41eda9d5b85ace7621fa9dff4a07
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 20:03:16 2016 -0600

    XXX Flesh out, tests: Add is_utf8_foo()

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h

commit 5015592ed0469e0292dde6a5ca5692b772b1510f
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 19:57:46 2016 -0600

    Move #define to different header
    
    Instead of having a comment in one header pointing to the #define in the
    other, remove the indirection and just have the #define itself where it
    is needed.

M       inline.h
M       utf8.h

commit 95a1ac9157055b547b0731096a6b5fb8325264c0
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 19:49:52 2016 -0600

    perlapi: Clarify docs for some is_utf8_foo functions

M       inline.h

commit 03f8936e099dc9853a80cdc5544478e3ae94048e
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 18:54:23 2016 -0600

    Add isUTF8_CHAR_flags() macro
    
    This is like the previous 2 commits, but the macro takes a flags
    parameter so any combination of the disallowed flags may be used.  The
    others, along with the original isUTF8_CHAR(), are the most commonly
    desired strictures, and use an implementation of a, hopefully, inlined
    trie for speed.  This is for generality and the major portion of its
    implementation isn't inlined.

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       utf8.h

commit b34eff6f793201d3a6c30e679cbd328fd6de49e3
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 16:52:41 2016 -0600

    Add macro for Unicode Corregindum #9 strict
    
    This macro follows Unicode Corrigendum #9 to allow non-character code
    points.  These are still discouraged but not completely forbidden.
    
    It's best for code that isn't intended to operate on arbitrary other
    code text to use the original definition, but code that does things,
    such as source code control, should change to use this definition if it
    wants to be Unicode-strict.
    
    Perl can't adopt C9 wholesale, as it might create security holes in
    existing applications that rely on Perl keeping non-chars out.

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit f2ee67210ff845671b84be61117b77b4653ba396
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 13:38:22 2016 -0600

    Add macro for determining if UTF-8 is Unicode-strict

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit bd3bc7853f81dddc4a9b4d4c7e90c579b6daa23f
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 14:30:15 2016 -0600

    perlapi: Clarify isUTF8_CHAR()

M       utf8.h

commit 1cf20e5c1daac8241495d7ab3b7395ffd1beb574
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 17:09:51 2016 -0600

    inline.h: Add 'const's; avoid hiding outer variable
    
    This changes some formal parameters to be const, and avoids reusing the
    same variable name within an inner block, to avoid confusion

M       embed.fnc
M       inline.h
M       proto.h

commit 1bd1a5e91eb89c68ca877437520e7e6b29e5e530
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 8 11:34:15 2016 -0600

    Add tests for is_valid_partial_utf8_char_flags()

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 6823d9254c49c8c32592dec7d4c993f22ab5850d
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 11 22:18:57 2016 -0600

    Add is_utf8_valid_partial_char_flags()
    
    This is a generalization of is_utf8_valid_partial_char to allow the
    caller to automatically exclude things such as surrogates.

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h

commit 6df9c77fd6294ecd27190557d4b199ec003d4008
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 11 09:40:37 2016 -0600

    perlapi: Reword description of is_utf8_valid_partial_char

M       inline.h

commit c48e530e5e830cb857a0e600429ea398e1afeb18
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:27:37 2016 -0600

    Fix off-by-one error in is_utf8_valid_partial_char()

M       inline.h

commit 1bd73796eae740f0f363fce9fe65d1f7a4db350d
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:24:48 2016 -0600

    handy.h: Comment memEQs and memNEs

M       handy.h

commit 900e9387f57443f5bdbb9393f5d699ff12d1982a
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:18:59 2016 -0600

    utf8.c: Add some UNLIKELYs

M       utf8.c

commit 1959a7939b6acdec6cecc2a674518af85ca11398
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:18:16 2016 -0600

    utf8.h: Add comment, white-space changes

M       utf8.h

commit d9f80678aa773efd3c04c0bdc97ef65b00a3c381
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:09:44 2016 -0600

    Enhance and rename is_utf8_char_slow()
    
    This changes the name of this helper function and adds a parameter and
    functionality to allow it to exclude problematic classes of code
    points, the same ones excludeable by utf8n_to_uvchar(), like surrogates
    or non-character code points.

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h
M       utf8.c
M       utf8.h

commit 188a2b00ada8185fa536473a43bab20aa7605840
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:22:01 2016 -0600

    APItest/t/utf8.t:   Add tests
    
    These fill in gaps in current testing.  In particular all the overlong
    UTF-8 possible edge cases are now tested.

M       ext/XS-APItest/t/utf8.t

commit b6f7cebb4d97b8666b24766d43e369f7fe77fea4
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:14:38 2016 -0600

    APItest/utf8.t: Some clean up
    
    This adds some information to test names, does some white-space
    alignments, changes one test to stress things slightly more, and adds a
    'use bytes' because in some cases the desired byte-oriented output was
    not showing up.

M       ext/XS-APItest/t/utf8.t

commit 0ff82715eeaee878beafe899a0dca8c6f670cec0
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 4 21:32:08 2016 -0600

    Test isUTF8_CHAR()

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 59c60a40e62af5eabbbc6fe073120d5d2daac783
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:19:42 2016 -0600

    lib/warnings/utf8:  Reinstate warning test
    
    I removed this in 35f8c9bd0ff4f298f8bc09ae9848a14a9667a95a, thinking the
    warning was no longer being raised.  But in fact, it was showing a bug,
    now fixed by the previous commit.

M       t/lib/warnings/utf8

commit 0094884088c3d72085333f53c123e60e5ab04bd4
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:15:04 2016 -0600

    Revamp overlong handling in is_utf8_char_slow, fixing a bug
    
    This combines EBCDIC and ASCII branches as much as possible, and fixes a
    bug that showed up only on EBCDIC platforms, and 64-bit ASCII ones for
    the highest overlong, where it could erroneously conclude that a
    sequence was an overlong.

M       utf8.c

commit a4f913a9ff912ba9d59d1ea42a91fb0e407efe0b
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:43:42 2016 -0600

    Forbid UTF-8 start bytes 0x FF on 32-bit ASCII
    
    These all are for code points that won't fit into a 32 bit word.

M       utf8.h

commit 62d802bd3241f7cd03f406344de7516a1cbc2ba8
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:06:39 2016 -0600

    utf8.c: Fix typo in comment, add some comments

M       utf8.c

commit f04b0b8d2d2525eed3f5cbfd17ad04f7d6433c10
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 09:00:03 2016 -0600

    utf8.c: Extract duplicate code to common fcn
    
    Actually the code isn't quite duplicate, but should be because one
    instance is wrong.  This failure would only show up on 64-bit EBCDIC
    platforms.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/t/utf8.t
M       proto.h
M       utf8.c

commit 4adf9e30152c6b04a2be2384ff2f08eba17d7ab3
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 08:54:36 2016 -0600

    handy.h: Add memLT, memLE, memGT, memGE
    
    These correspond to strLT, etc.  I am deferring documenting them in case
    this turns out to be a bad idea for some reason.

M       handy.h

commit 2ceab79252696bcdcd1a85aa33f7894890124f8f
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 08:46:18 2016 -0600

    XXX unconditionally do memcmp if not sane

M       perl.h

commit 45c86a51c68c42f7b5dccb4685a9e1edf5e4868f
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 14:12:27 2016 -0600

    isUTF8_CHAR(): Bring UTF-EBCDIC to parity with ASCII
    
    This changes the macro isUTF8_CHAR to have the same number of code
    points built-in for EBCDIC as ASCII.  This obsoletes the
    IS_UTF8_CHAR_FAST macro, which is removed.
    
    Previously, the code generated by regen/regcharclass.pl for ASCII
    platforms was hand copied into utf8.h, and LIKELY's manually added, then
    the generating code was commented out.  Now this has been done with
    EBCDIC platforms as well.  This makes regenerating regcharclass.h
    faster.
    
    The copied macro in utf8.h is moved by this commit to within the main
    code section for non-EBCDIC compiles, cutting the number of #ifdef's
    down, and the comments about it changed somewhat.

M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit 8d9b3365a77be3b6ad6cfbfe520b458da2e08f7e
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 12:15:29 2016 -0600

    regen/regcharclass.pl: surrogates are code points
    
    They are not "characters"

M       regcharclass.h
M       regen/regcharclass.pl

commit 0899da10c082013172c212fd291d9e558c849339
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 16:13:15 2016 -0600

    Add IS_UTF8_INVARIANT and IS_UVCHR_INVARIANT to API

M       utf8.h

commit a2432ca2e2ba85e8c8c00ef4febf80c842fb5d44
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:03:21 2016 -0600

    utfebcdic.h: Fix typo in comment

M       utfebcdic.h

commit 26c211867588c59e51aae4b9132dba1a35dcb364
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 16:05:35 2016 -0600

    Add #defines for XS code for Unicode Corregindum 9
    
    These are convenience macros.

M       utf8.c
M       utf8.h

commit 056961ce93cc98dc2f60658fc864f7393ab98942
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 16:02:50 2016 -0600

    perlapi: Clarify utf8n_to_uvchr entry

M       utf8.c

commit e3fbbd1878d66b0d7d180ed8526964c7124e32d9
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 15:57:34 2016 -0600

    perlunicode: Fix typo

M       pod/perlunicode.pod

commit 5f4c87effa7a251db8fbc5d04dbb05b59cd98291
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Sep 13 16:40:44 2016 -0600

    append_utf8_from_native_byte: Add parens for clarity
    
    I can never remember the precedence of dereference and ++.

M       inline.h
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to