In perl.git, the branch smoke-me/khw-encode has been created
<http://perl5.git.perl.org/perl.git/commitdiff/c909068903c393b8f3230477bce1fa59c7d5f740?hp=0000000000000000000000000000000000000000>
at c909068903c393b8f3230477bce1fa59c7d5f740 (commit)
- Log -----------------------------------------------------------------
commit c909068903c393b8f3230477bce1fa59c7d5f740
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 20:17:22 2016 -0700
customized
M t/porting/customized.dat
commit 5d45040612ea3b088ce88050a959d3bd5f72fd8f
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 17:47:35 2016 -0700
Split diagnostics for two UTF-8 malformations
Some UTF-8 sequences may have multiple malformations. Commit
2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 tried to make sure that all
possible ones are raised, instead of abandoning searching after one is
found. Since, I realized that there was yet another case of two
malformations that it returned only one or the other of.
An input buffer may be too short to fully express the code point it
purports to. This can be determined by the first byte of the UTF-8
sequence indicating a longer sequence is requred than the space
available. But also, that shortened sequence can have a premature
beginning of another character earlier than the shortness. This commit
causes these to be both raised, instead of the previous behavior of
noting just one.
M ext/XS-APItest/t/utf8.t
M t/op/utf8decode.t
M utf8.c
commit e06d0cb8d87985bb125e1f1472162f054a04944a
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:14:45 2016 -0700
APItest/t/utf8.t: Partially refactor to use table data
This removes kludgy code that was trying, given a partial
character, to determine if there enough bytes present to guarantee that
the whole character must belong to a class of characters or not. Now
the necessary length to make that determination has instead manually
been placed in a table, so it can be looked up. In doing so, I
corrected one length that was failing on EBCDIC.
M ext/XS-APItest/t/utf8.t
commit 85769f8a02f708b8924e48bb881d13eb98dde92b
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:07:31 2016 -0700
APItest/t/utf8.t: Fix test
It turns out that this test has two malformations, and should only have
one; a future commit will remove the masking of the 2nd one.
M ext/XS-APItest/t/utf8.t
commit 80c8df918bb0f0937dd52a7b92001e6e1e83b5d3
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:01:21 2016 -0700
APItest/t/utf8.t: Comments only
M ext/XS-APItest/t/utf8.t
commit 1d4fb888183a988e0e2d1be8ce5cda42c88d836a
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 17:55:10 2016 -0700
APItest/t/utf8.t: Add some indentation to diagnositcs
This is so they don't interrupt reading the output when there are
errors.
M ext/XS-APItest/t/utf8.t
commit 84c893c5a4df989f87863fb2178ac3637ded50f7
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 13:15:18 2016 -0700
utf8.c: Clarify warning message.
This warning was changed recently in the 5.25 series, and has not been
in a stable release.
M ext/XS-APItest/t/utf8.t
M lib/utf8.t
M t/op/lex.t
M t/op/utf8decode.t
M utf8.c
commit f1a13e0dce2b3abf6dc9ff9a75d8c017bb6695d2
Author: Karl Williamson <[email protected]>
Date: Mon Nov 21 14:59:47 2016 -0700
APItest/t/utf8.t: Simplify expression slightly
M ext/XS-APItest/t/utf8.t
commit 0c4e4418cc06eb9fcaab8eb05f9b399c67d34fbf
Author: Karl Williamson <[email protected]>
Date: Sun Nov 20 07:56:40 2016 -0700
APItest/t/handy.t: Output details if test fails
There should be no warnings generated, but if there are, we want to see
what they were.
M ext/XS-APItest/t/handy.t
commit 7718e670f9d9ce88fadcf97189ce51eccdf9b7f1
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 05:03:37 2016 -0600
XXX For EBCDIC debug
M utf8.c
commit 7c09d0dced9a08427afef7f5f0f4c4a7a64b2735
Author: Karl Williamson <[email protected]>
Date: Tue Nov 1 22:23:47 2016 -0600
customized
M t/porting/customized.dat
commit 826ef9c124630e79250abb54ec6ef4e09f0f211f
Author: Karl Williamson <[email protected]>
Date: Tue Oct 18 14:09:43 2016 -0600
pali
M cpan/Encode/Encode.xs
commit 2e3464509f6483c5cc3c570773b469cf76f638ac
Author: Karl Williamson <[email protected]>
Date: Wed Oct 12 20:33:29 2016 -0600
later
M utf8.h
commit 3551a510b962c0b1fcc0ff103814056f6d37caa0
Author: Karl Williamson <[email protected]>
Date: Thu Sep 15 09:09:07 2016 -0600
XXX incomplete: Add sv_utf8_decode_flags
M embed.fnc
M embed.h
M proto.h
M sv.c
M sv.h
commit ff8eda8bf226b1554d2fbbbbb3d6e411ebbcec45
Author: Karl Williamson <[email protected]>
Date: Wed Sep 14 22:40:23 2016 -0600
customized
M t/porting/customized.dat
commit d505a589f0294a6580f94f9508bfcd9c7e0ee5e6
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:20:52 2016 -0600
Use core REPLACEMENT CHARACTER definition
This allows the code to now work on EBCDIC as well.
M cpan/Encode/Encode/encode.h
commit 6807fad30a4be4f68591665ab7d4c5699bd5e0cf
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:16:00 2016 -0600
XXX commit msg: Encode.xs: Rmv unused function
M cpan/Encode/Encode.xs
commit 098893d6a99ef0bea880c51e5359a1d8294bf305
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:39 2016 -0600
Encode.xs: white-space, comment only
This removes some trailing white space, and indents various blocks
properly according to perl standards, and adds a comment, fixes grammar
in another.
M cpan/Encode/Encode.xs
commit c3acaadcc97be01166f88fb9a38fdf17c6751615
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:06 2016 -0600
XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
This replaces the current scheme for checking UTF-8 validity by one
in which normal processing doesn't require having to decode the UTF-8
into code points. The copying of characters individually from the input
to the output is changed to be a single operation for each entire span
of valid input at once.
Thus in the normal case, what ends up happening is a tight loop to
check the validity, and then a memmove of the entire input to the
output, then return.
If an error is found, it copies all the valid input before the error,
then handles the character in error, then positions to the next input
position, and repeats the whole process starting from there.
It uses the functionality available from the Perl 5 core to to look at
just the bytes that comprise the UTF-8 to make the determination,
converting to code points only those that are defective some how in
order to display them in warnings and error messages.
Thus, this does not need to know about the intricacies of UTF-8
malformations, relying on the core to handle this.
This cannot be pushed to CPAN until Devel::PPPort has been updated to
implement all the functions now needed.
M cpan/Encode/Encode.pm
M cpan/Encode/Encode.xs
-----------------------------------------------------------------------
--
Perl5 Master Repository