In perl.git, the branch smoke-me/khw-encode has been created
<http://perl5.git.perl.org/perl.git/commitdiff/2b232ed1ce442e9ea50a5b2453cfd53173ca77ee?hp=0000000000000000000000000000000000000000>
at 2b232ed1ce442e9ea50a5b2453cfd53173ca77ee (commit)
- Log -----------------------------------------------------------------
commit 2b232ed1ce442e9ea50a5b2453cfd53173ca77ee
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 08:46:53 2016 -0600
XS-APItest/t/utf8.t: Test with longest possible overlong
As part of testing, certain malformations are perturbed to also be
overlong to see that the combination of them is properly handled. To do
this, the code will take a test case and calculate an overlong that is
longer than it. However if the test case is as long as the overlong
would be, this can't be done, and is skipped. This commit now
uses a longer overlong than previously (now the maximum possible) so
that fewer tests have to be skipped.
M ext/XS-APItest/t/utf8.t
commit 24b9ea9efb57e390d1662b6fe2a7909acb69cbbd
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 08:44:43 2016 -0600
XS-APItest/t/utf8.t: White-space only
M ext/XS-APItest/t/utf8.t
commit ce0c9378fad5fa98af70f1121c198c7f29761c27
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 08:42:38 2016 -0600
XS-APItest/t/utf8.t: Fix EBCDIC bug
This number needs to be adjusted for EBCDIC platforms
M ext/XS-APItest/t/utf8.t
commit 2c36d7f40639db7241be5e06fae4daff0e9b769d
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 08:36:56 2016 -0600
XS-APItest/t/utf8.t: Move a common expression to $var
The maximum byte length of a single code-points UTF-8 representation is
used in a bunch of places. Calculate it once.
M ext/XS-APItest/t/utf8.t
commit d7099f298bc4a657b9bd5aca87419c065176f8f8
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 08:31:09 2016 -0600
XS-APItest/t/utf8.t: Fix wrong test on EBCDIC
The I8 string doesn't work the same as UTF-8, as it only takes 5 bits
from each continuation byte instead of 6.
M ext/XS-APItest/t/utf8.t
commit 885bdc7f97303962bdb4b985c2a6855be5fec51f
Author: Karl Williamson <[email protected]>
Date: Fri Oct 28 05:03:37 2016 -0600
XXX For EBCDIC debug
M utf8.c
commit db52803e1d0aaa935087812ff9de23097ed06acb
Author: Karl Williamson <[email protected]>
Date: Tue Oct 18 14:09:43 2016 -0600
pali
M cpan/Encode/Encode.xs
commit ee6f151c413ac3f8ac3c057a104362b95a16c0dc
Author: Karl Williamson <[email protected]>
Date: Wed Oct 12 20:33:29 2016 -0600
later
M utf8.h
commit d1d6eb574c621c54c1142fd92c401ba9573d8c78
Author: Karl Williamson <[email protected]>
Date: Thu Sep 15 09:09:07 2016 -0600
XXX incomplete: Add sv_utf8_decode_flags
M embed.fnc
M embed.h
M proto.h
M sv.c
M sv.h
commit e5af3cab40f1af3cecf217354460706b20c7e8fe
Author: Karl Williamson <[email protected]>
Date: Wed Sep 14 22:40:23 2016 -0600
customized
M t/porting/customized.dat
commit 06a50a8c7aaf2f3a6ac646e62910fc9d9b4a1063
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:20:52 2016 -0600
Use core REPLACEMENT CHARACTER definition
This allows the code to now work on EBCDIC as well.
M cpan/Encode/Encode/encode.h
commit 7306f2d8504d76c65a625bfe00c05df066225831
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:16:00 2016 -0600
XXX commit msg: Encode.xs: Rmv unused function
M cpan/Encode/Encode.xs
commit 7b3b2fe7348dc3abe96d0e0cfa28aa919a1978fe
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:39 2016 -0600
Encode.xs: white-space only
M cpan/Encode/Encode.xs
commit dbef7dbf59c4a73c7db6e66206da544db210d796
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:06 2016 -0600
XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
This replaces the current scheme for checking UTF-8 validity by one
in which normal processing doesn't require having to decode the UTF-8
into code points. The copying of characters individually from the input
to the output is changed to be a single operation for each entire span
of valid input at once.
Thus in the normal case, what ends up happening is a tight loop to
check the validity, and then a memmove of the entire input to the
output, then return.
If an error is found, it copies all the valid input before the error,
then handles the character in error, then positions to the next input
position, and repeats the whole process starting from there.
It uses the functionality available from the Perl 5 core to to look at
just the bytes that comprise the UTF-8 to make the determination,
converting to code points only those that are defective some how in
order to display them in warnings and error messages.
Thus, this does not need to know about the intricacies of UTF-8
malformations, relying on the core to handle this.
This cannot be pushed to CPAN until Devel::PPPort has been updated to
implement all the functions now needed.
M cpan/Encode/Encode.pm
M cpan/Encode/Encode.xs
-----------------------------------------------------------------------
--
Perl5 Master Repository