In perl.git, the branch smoke-me/khw-encode has been created
<http://perl5.git.perl.org/perl.git/commitdiff/d55177b09f9baa25b3a9dbd30a74b5795313da0a?hp=0000000000000000000000000000000000000000>
at d55177b09f9baa25b3a9dbd30a74b5795313da0a (commit)
- Log -----------------------------------------------------------------
commit d55177b09f9baa25b3a9dbd30a74b5795313da0a
Author: Karl Williamson <[email protected]>
Date: Wed Nov 23 13:27:43 2016 -0700
Add isFOO_utf8_safe() macros
The original API assumed that we could keep malformed UTF-8 out by use
of gatekeepers, but that is currently impossible. This commit adds
"safe" versions to macros for determining if a UTF-8 sequence represents
an alphabetic, a digit, etc. Each new macro has an extra parameter
pointing to the end of the sequence, so that looking beyond the input
string can be avoided.
The macros aren't currently completely safe, as they don't test that
there is at least a single valid character in the input, except by an
assertion in DEBUGGING builds. This is because typically they are
called in code that makes that assumption, and frequently tests the
current character for one thing or another. While debugging this and
future commits, The assertion showed some current errors where that
assumption turned out to be false.
M embed.fnc
M embed.h
M handy.h
M proto.h
M utf8.c
commit 67c3cdde83c5c08bcb59ba878c43c84dac24533e
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 20:17:22 2016 -0700
customized
M t/porting/customized.dat
commit c8512d9d49349893ccc92ddc7be8ebc179b19ba9
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 20:30:56 2016 -0700
APItest/t/utf8.t: White space only
This indents the new block formed by the previous commit. However,
since the indentation is getting too much, it also changes the indents
for all the nested for loops to 2 spaces to allow room on the line.
M ext/XS-APItest/t/utf8.t
commit 7973f1a7221bb44f6a648d021434d244a0fb5011
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 17:47:35 2016 -0700
Split diagnostics for two UTF-8 malformations
Some UTF-8 sequences may have multiple malformations. Commit
2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 tried to make sure that all
possible ones are raised, instead of abandoning searching after one is
found. Since, I realized that there was yet another case of two
malformations that it returned only one or the other of.
An input buffer may be too short to fully express the code point it
purports to. This can be determined by the first byte of the UTF-8
sequence indicating a longer sequence is requred than the space
available. But also, that shortened sequence can have a premature
beginning of another character earlier than the shortness. This commit
causes these to be both raised, instead of the previous behavior of
noting just one.
M ext/XS-APItest/t/utf8.t
M t/op/utf8decode.t
M utf8.c
commit dbcc0d750af7ecc30f3566706c24755d0b129b8c
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:14:45 2016 -0700
APItest/t/utf8.t: Partially refactor to use table data
This removes kludgy code that was trying, given a partial
character, to determine if there enough bytes present to guarantee that
the whole character must belong to a class of characters or not. Now
the necessary length to make that determination has instead manually
been placed in a table, so it can be looked up. In doing so, I
corrected one length that was failing on EBCDIC.
M ext/XS-APItest/t/utf8.t
commit 129d245c5ae102fc623235afdc0fa1ee9e3d4672
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:07:31 2016 -0700
APItest/t/utf8.t: Fix test
It turns out that this test has two malformations, and should only have
one; a future commit will remove the masking of the 2nd one.
M ext/XS-APItest/t/utf8.t
commit 4789005cae55867636d2d352998829cef68267cc
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 18:01:21 2016 -0700
APItest/t/utf8.t: Comments only
M ext/XS-APItest/t/utf8.t
commit a641ad00d9b27ce86ab1c4c3c8972860813f671c
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 17:55:10 2016 -0700
APItest/t/utf8.t: Add some indentation to diagnositcs
This is so they don't interrupt reading the output when there are
errors.
M ext/XS-APItest/t/utf8.t
commit 1134175ba7b7901d529e08b8bb39580d4badda2e
Author: Karl Williamson <[email protected]>
Date: Tue Nov 22 13:15:18 2016 -0700
utf8.c: Clarify warning message.
This warning was changed recently in the 5.25 series, and has not been
in a stable release.
M ext/XS-APItest/t/utf8.t
M lib/utf8.t
M t/op/lex.t
M t/op/utf8decode.t
M utf8.c
commit 06a6f490d6add3b94ecd25c893d489d26dc46831
Author: Karl Williamson <[email protected]>
Date: Mon Nov 21 14:59:47 2016 -0700
APItest/t/utf8.t: Simplify expression slightly
M ext/XS-APItest/t/utf8.t
commit 8d440bfd9b51e0148a1f7cbe9a9e50f66c2f895a
Author: Karl Williamson <[email protected]>
Date: Sun Nov 20 07:56:40 2016 -0700
APItest/t/handy.t: Output details if test fails
There should be no warnings generated, but if there are, we want to see
what they were.
M ext/XS-APItest/t/handy.t
commit 8e95415bbb780d5362cdbf65c7a45c8db9f8ec80
Author: Karl Williamson <[email protected]>
Date: Thu Sep 15 09:09:07 2016 -0600
XXX incomplete: Add sv_utf8_decode_flags
M embed.fnc
M embed.h
M proto.h
M sv.c
M sv.h
commit 7339ac10881a902c0b7777b1801fe4d7699fc9a7
Author: Karl Williamson <[email protected]>
Date: Tue Nov 1 22:23:47 2016 -0600
customized
M t/porting/customized.dat
commit 79203adb2b13741585378a97d2ccf02ba87051e4
Author: Karl Williamson <[email protected]>
Date: Tue Oct 18 14:09:43 2016 -0600
pali
M cpan/Encode/Encode.xs
commit 7e649a0a0c000ae33fed56865a02127e93b2ea6d
Author: Karl Williamson <[email protected]>
Date: Wed Oct 12 20:33:29 2016 -0600
later
M utf8.h
commit c89fc8c252e6ad9351c83e6bb2b18f6ba57a7295
Author: Karl Williamson <[email protected]>
Date: Wed Sep 14 22:40:23 2016 -0600
customized
M t/porting/customized.dat
commit 69015b32dfa88911c82c2d5e948cab2fe82673d6
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:20:52 2016 -0600
Use core REPLACEMENT CHARACTER definition
This allows the code to now work on EBCDIC as well.
M cpan/Encode/Encode/encode.h
commit 12005725e31c7105bda2e86739c0cbd750889d4d
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:16:00 2016 -0600
XXX commit msg: Encode.xs: Rmv unused function
M cpan/Encode/Encode.xs
commit 1f5da59b7986c06d68ebffde0f49e04cef0fe522
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:39 2016 -0600
Encode.xs: white-space, comment only
This removes some trailing white space, and indents various blocks
properly according to perl standards, and adds a comment, fixes grammar
in another.
M cpan/Encode/Encode.xs
commit f1fe0f20838dc854ba99f295d66783fe56fe5232
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:06 2016 -0600
XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
This replaces the current scheme for checking UTF-8 validity by one
in which normal processing doesn't require having to decode the UTF-8
into code points. The copying of characters individually from the input
to the output is changed to be a single operation for each entire span
of valid input at once.
Thus in the normal case, what ends up happening is a tight loop to
check the validity, and then a memmove of the entire input to the
output, then return.
If an error is found, it copies all the valid input before the error,
then handles the character in error, then positions to the next input
position, and repeats the whole process starting from there.
It uses the functionality available from the Perl 5 core to to look at
just the bytes that comprise the UTF-8 to make the determination,
converting to code points only those that are defective some how in
order to display them in warnings and error messages.
Thus, this does not need to know about the intricacies of UTF-8
malformations, relying on the core to handle this.
This cannot be pushed to CPAN until Devel::PPPort has been updated to
implement all the functions now needed.
M cpan/Encode/Encode.pm
M cpan/Encode/Encode.xs
commit e3000d7e9ad6eb78cbe90d1b8ddb53cc3e4ab41e
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 05:03:37 2016 -0600
XXX For EBCDIC debug
M utf8.c
-----------------------------------------------------------------------
--
Perl5 Master Repository