In perl.git, the branch smoke-me/khw-encode has been created
<https://perl5.git.perl.org/perl.git/commitdiff/35f52632dbc74b979cd001d46b31b95088497d28?hp=0000000000000000000000000000000000000000>
at 35f52632dbc74b979cd001d46b31b95088497d28 (commit)
- Log -----------------------------------------------------------------
commit 35f52632dbc74b979cd001d46b31b95088497d28
Author: Karl Williamson <[email protected]>
Date: Sat Jan 27 17:43:00 2018 -0700
Add utf8n_to_uvchr_msgs()
commit ec8ed495a8f923b898912beec0d3cdf246ef0629
Author: Pali <[email protected]>
Date: Wed Sep 13 00:30:29 2017 +0200
Rewrite encode, decode, encode_utf8, decode_utf8 and from_to functions to XS
commit 3f035f5843f7ffb8bafcba315e0e1dac7a834f84
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:57:22 2017 -0700
encengine.c: Properly indent code within blocks
This makes it much more legible
commit c3a80fcede5376caf82d7e599813b7e834db0625
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:29:43 2017 -0700
Speed up UTF-8 validation checking on modern perls
Perl 5.26 introduced infrastructure in the core that can be used by
Encode to check UTF-8 stream validity much faster than before.
It is not clear when or if this functionality will be backported into
Devel::PPPort, in part because there is no one available currently who
knows how to do it, and in part because it may be that everyone else
relies on Encode, so it's not needed generally to be backported.
This commit replaces the current scheme for checking UTF-8 validity if
the infrastructure is availabe, by one in which normal processing
doesn't require having to decode the UTF-8 into code points. The
copying of characters individually from the input to the output is
changed to be a single operation for each entire span of valid input at
once.
Thus in the normal case, what ends up happening is a tight loop to
check the validity, and then a memmove of the entire input to the
output, then return.
If an error is found, it copies all the valid input before the error,
then handles the character in error, then positions to the next input
position, and repeats the whole process starting from there.
Thus, this does not need to know about the intricacies of UTF-8
malformations, relying on the core to handle this.
There are currently some problems with Encode on EBCDIC platforms. The
infrastructure is known to correctly work there, so I'm hopeful this
will solve these portability issues.
commit 05d4825675873646ecdd3e4d2304f460956d7097
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:09:06 2017 -0700
Encode/Encode.xs: Pull condition out of loop
The value for this condition is known before the loop, so move it
outside the loop.
commit e111589f21cee62da340286501e971292eea56e7
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:06:45 2017 -0700
Encode/encode.h: Use system REPLACEMENT char if available
On modern perls, there is a definition for the REPLACEMENT CHARACTER
UTF-8 string. Use this if available, as it is portable to EBCDIC, and
this one isn't.
commit 000dafcffe3b1c3b3ee7114dc3e9aa23c2b20cf2
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:04:15 2017 -0700
Encode: Add comments
This documents process_utf8(), and adds another helpful comment
commit 28259d78b909be18709053dae284e8f50dc9c3af
Author: Karl Williamson <[email protected]>
Date: Thu Dec 28 14:01:34 2017 -0700
Encode: White space only
This correctly indents things in blocks, and removes trailing space
-----------------------------------------------------------------------
--
Perl5 Master Repository