In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/adf9b819500501defb89b82745d8d368303bec57?hp=0000000000000000000000000000000000000000>

        at  adf9b819500501defb89b82745d8d368303bec57 (commit)

- Log -----------------------------------------------------------------
commit adf9b819500501defb89b82745d8d368303bec57
Author: Karl Williamson <k...@cpan.org>
Date:   Fri Sep 16 22:21:17 2016 -0600

    smoke

M       utf8.c

commit d05927bc3d33f576c529353f46502c626f820d80
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit f0467b8c149eb86abef90c614cbfbaa184d68d3e
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 15 09:06:39 2016 -0600

    perlapi: Minor clarifications to sv_utf8_decode

M       sv.c

commit f066d559c1d5c2d6c7cb659bd3ba9a626c8d519f
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit f62b4cdfb78820b4a46c9835ad655a8cf4792c14
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit bd211626afa26b7657c5e23e03877512d7f54004
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit 69ce7f673fa40415c7407ef08e17c38598da49ca
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit 738ff5eb5c2e8aff55dbe274aee9fa783187c040
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs

commit 05b298e243da060f60f30d6391ec5c67e4b0eef3
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 20:15:56 2016 -0600

    XXX tests: Add is_utf8_buf_flags() and use it
    
    This encodes a simple pattern that may not be immediately obvious to
    someone needing it.  If you have a fixed-size buffer that is full of
    purportedly UTF-8 bytes, is it valid or not?  It's easy to do, as shown
    in this commit.  The file test operators -T and -B can be simpified by
    using this function.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_sys.c
M       proto.h

commit 502a38034364d3fce09dbeaec8bab135b92170da
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 20:03:16 2016 -0600

    XXX Flesh out, tests: Add is_utf8_foo()

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h

commit b543ac64c28bb4d5c1b4da2b54a466d94186b7bf
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 19:57:46 2016 -0600

    Move #define to different header
    
    Instead of having a comment in one header pointing to the #define in the
    other, remove the indirection and just have the #define itself where it
    is needed.

M       inline.h
M       utf8.h

commit f56a8a8778df5243e73b421332b67ada3657b773
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 19:49:52 2016 -0600

    perlapi: Clarify docs for some is_utf8_foo functions

M       inline.h

commit 48624323edfb1387b785116a36a0517803c1c9c5
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 18:54:23 2016 -0600

    Add isUTF8_CHAR_flags() macro
    
    This is like the previous 2 commits, but the macro takes a flags
    parameter so any combination of the disallowed flags may be used.  The
    others, along with the original isUTF8_CHAR(), are the most commonly
    desired strictures, and use an implementation of a, hopefully, inlined
    trie for speed.  This is for generality and the major portion of its
    implementation isn't inlined.

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       utf8.h

commit d3044bce6779aaa538c761d54600aa393b5c6a3c
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 16:52:41 2016 -0600

    Add macro for Unicode Corregindum #9 strict
    
    This macro follows Unicode Corrigendum #9 to allow non-character code
    points.  These are still discouraged but not completely forbidden.
    
    It's best for code that isn't intended to operate on arbitrary other
    code text to use the original definition, but code that does things,
    such as source code control, should change to use this definition if it
    wants to be Unicode-strict.
    
    Perl can't adopt C9 wholesale, as it might create security holes in
    existing applications that rely on Perl keeping non-chars out.

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit 435f2b411812edbbcf8e3b9d6f22c1f3f8aafdb1
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 13:38:22 2016 -0600

    Add macro for determining if UTF-8 is Unicode-strict

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t
M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit 8abfbe338708a13de82a75896c9b6405f24dc7d3
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Sep 12 14:30:15 2016 -0600

    perlapi: Clarify isUTF8_CHAR()

M       utf8.h

commit e026936ba1a284adec6fa4b00989aa3c395df5ce
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 17:09:51 2016 -0600

    inline.h: Add 'const's; avoid hiding outer variable
    
    This changes some formal parameters to be const, and avoids reusing the
    same variable name within an inner block, to avoid confusion

M       embed.fnc
M       inline.h
M       mathoms.c
M       proto.h

commit c6a3a1a663c525b88ba7002cba8bc5a325916ba4
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Sep 8 11:34:15 2016 -0600

    Add tests for is_valid_partial_utf8_char_flags()

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 3e7888e357befaf1c44194f1f002d959668aed3a
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 11 22:18:57 2016 -0600

    Add is_utf8_valid_partial_char_flags()
    
    This is a generalization of is_utf8_valid_partial_char to allow the
    caller to automatically exclude things such as surrogates.

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h

commit d939808d01f53e56eaa4f92a1f4a9683c2a13baa
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 11 09:40:37 2016 -0600

    perlapi: Reword description of is_utf8_valid_partial_char

M       inline.h

commit 0a1007ccbfeae6e1e1779c9d321f8cf64808ca49
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:27:37 2016 -0600

    Fix off-by-one error in is_utf8_valid_partial_char()

M       inline.h

commit 90defc4a2c69ccf6aa3850e1917cc3e8b1fc979f
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:24:48 2016 -0600

    handy.h: Comment memEQs and memNEs

M       handy.h

commit c1cf1a0a1a7e32092e7e86ebeabab4f274027fd7
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:18:59 2016 -0600

    utf8.c: Add some UNLIKELYs

M       utf8.c

commit 1ecdbc8ddd9859a4f81e57815fb6c8d0d7df4a27
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:18:16 2016 -0600

    utf8.h: Add comment, white-space changes

M       utf8.h

commit 882c2c3c2992acb6a67221d2055d317d43eee106
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:09:44 2016 -0600

    Enhance and rename is_utf8_char_slow()
    
    This changes the name of this helper function and adds a parameter and
    functionality to allow it to exclude problematic classes of code
    points, the same ones excludeable by utf8n_to_uvchar(), like surrogates
    or non-character code points.

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h
M       utf8.c
M       utf8.h

commit df57aa9ba86deeda358f0345ab9145564c4ec06d
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:22:01 2016 -0600

    APItest/t/utf8.t:   Add tests
    
    These fill in gaps in current testing.  In particular all the overlong
    UTF-8 possible edge cases are now tested.

M       ext/XS-APItest/t/utf8.t

commit b6f7cebb4d97b8666b24766d43e369f7fe77fea4
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:14:38 2016 -0600

    APItest/utf8.t: Some clean up
    
    This adds some information to test names, does some white-space
    alignments, changes one test to stress things slightly more, and adds a
    'use bytes' because in some cases the desired byte-oriented output was
    not showing up.

M       ext/XS-APItest/t/utf8.t

commit 0ff82715eeaee878beafe899a0dca8c6f670cec0
Author: Karl Williamson <k...@cpan.org>
Date:   Sun Sep 4 21:32:08 2016 -0600

    Test isUTF8_CHAR()

M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 59c60a40e62af5eabbbc6fe073120d5d2daac783
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 22:19:42 2016 -0600

    lib/warnings/utf8:  Reinstate warning test
    
    I removed this in 35f8c9bd0ff4f298f8bc09ae9848a14a9667a95a, thinking the
    warning was no longer being raised.  But in fact, it was showing a bug,
    now fixed by the previous commit.

M       t/lib/warnings/utf8

commit 0094884088c3d72085333f53c123e60e5ab04bd4
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:15:04 2016 -0600

    Revamp overlong handling in is_utf8_char_slow, fixing a bug
    
    This combines EBCDIC and ASCII branches as much as possible, and fixes a
    bug that showed up only on EBCDIC platforms, and 64-bit ASCII ones for
    the highest overlong, where it could erroneously conclude that a
    sequence was an overlong.

M       utf8.c

commit a4f913a9ff912ba9d59d1ea42a91fb0e407efe0b
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:43:42 2016 -0600

    Forbid UTF-8 start bytes 0x FF on 32-bit ASCII
    
    These all are for code points that won't fit into a 32 bit word.

M       utf8.h

commit 62d802bd3241f7cd03f406344de7516a1cbc2ba8
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 21:06:39 2016 -0600

    utf8.c: Fix typo in comment, add some comments

M       utf8.c

commit f04b0b8d2d2525eed3f5cbfd17ad04f7d6433c10
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 09:00:03 2016 -0600

    utf8.c: Extract duplicate code to common fcn
    
    Actually the code isn't quite duplicate, but should be because one
    instance is wrong.  This failure would only show up on 64-bit EBCDIC
    platforms.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/t/utf8.t
M       proto.h
M       utf8.c

commit 4adf9e30152c6b04a2be2384ff2f08eba17d7ab3
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 08:54:36 2016 -0600

    handy.h: Add memLT, memLE, memGT, memGE
    
    These correspond to strLT, etc.  I am deferring documenting them in case
    this turns out to be a bad idea for some reason.

M       handy.h

commit 2ceab79252696bcdcd1a85aa33f7894890124f8f
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 10 08:46:18 2016 -0600

    XXX unconditionally do memcmp if not sane

M       perl.h

commit 45c86a51c68c42f7b5dccb4685a9e1edf5e4868f
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 14:12:27 2016 -0600

    isUTF8_CHAR(): Bring UTF-EBCDIC to parity with ASCII
    
    This changes the macro isUTF8_CHAR to have the same number of code
    points built-in for EBCDIC as ASCII.  This obsoletes the
    IS_UTF8_CHAR_FAST macro, which is removed.
    
    Previously, the code generated by regen/regcharclass.pl for ASCII
    platforms was hand copied into utf8.h, and LIKELY's manually added, then
    the generating code was commented out.  Now this has been done with
    EBCDIC platforms as well.  This makes regenerating regcharclass.h
    faster.
    
    The copied macro in utf8.h is moved by this commit to within the main
    code section for non-EBCDIC compiles, cutting the number of #ifdef's
    down, and the comments about it changed somewhat.

M       regcharclass.h
M       regen/regcharclass.pl
M       utf8.h
M       utfebcdic.h

commit 8d9b3365a77be3b6ad6cfbfe520b458da2e08f7e
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 12:15:29 2016 -0600

    regen/regcharclass.pl: surrogates are code points
    
    They are not "characters"

M       regcharclass.h
M       regen/regcharclass.pl

commit 0899da10c082013172c212fd291d9e558c849339
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Sep 3 16:13:15 2016 -0600

    Add IS_UTF8_INVARIANT and IS_UVCHR_INVARIANT to API

M       utf8.h

commit a2432ca2e2ba85e8c8c00ef4febf80c842fb5d44
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 7 22:03:21 2016 -0600

    utfebcdic.h: Fix typo in comment

M       utfebcdic.h

commit 26c211867588c59e51aae4b9132dba1a35dcb364
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 16:05:35 2016 -0600

    Add #defines for XS code for Unicode Corregindum 9
    
    These are convenience macros.

M       utf8.c
M       utf8.h

commit 056961ce93cc98dc2f60658fc864f7393ab98942
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 16:02:50 2016 -0600

    perlapi: Clarify utf8n_to_uvchr entry

M       utf8.c

commit e3fbbd1878d66b0d7d180ed8526964c7124e32d9
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Sep 14 15:57:34 2016 -0600

    perlunicode: Fix typo

M       pod/perlunicode.pod

commit 5f4c87effa7a251db8fbc5d04dbb05b59cd98291
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Sep 13 16:40:44 2016 -0600

    append_utf8_from_native_byte: Add parens for clarity
    
    I can never remember the precedence of dereference and ++.

M       inline.h
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to