In perl.git, the branch smoke-me/khw-bitwise has been created
<http://perl5.git.perl.org/perl.git/commitdiff/08b81986b9548c9d68affebd2234dd31ea8e43c6?hp=0000000000000000000000000000000000000000>
at 08b81986b9548c9d68affebd2234dd31ea8e43c6 (commit)
- Log -----------------------------------------------------------------
commit 08b81986b9548c9d68affebd2234dd31ea8e43c6
Author: Karl Williamson <[email protected]>
Date: Tue Jun 6 16:25:10 2017 -0600
Fatalize bit ops on wide characters
These deprecations are scheduled to become fatal in 5.28.
It turns out that a bunch of the code dealing with UTF-8 operands could
be cut out. The operands are simply converted to non-UTF-8, croaking if
that fails. Then the operations is done on non UTF-8, using the
existing code. To preserve current behavior, the result is left not
UTF-8 if the op is "~", and is changed back to UTF-8 for the other ops.
The part of this that took the most time was going through t/op/bop.t,
looking at all the failing tests and trying to determine if the tests
are simply testing the wide character behavior, and hence can just be
deleted; or if they are testing general UTF-8 behavior, and hence should
be modified to not use above-FF code points, while retaining UTF-8ness.
I went through the blame log for each such test, reading the commit
messages. Most turned out, unless I erred, to be testing wide character
behavior, which is now illegal.
M doop.c
M op.h
M pod/perldeprecation.pod
M pod/perldiag.pod
M pod/perlop.pod
M pod/perlunicode.pod
M pod/perluniintro.pod
M pp.c
M t/lib/warnings/doop
M t/lib/warnings/pp
M t/op/bop.t
M t/op/substr.t
commit 3b453e5211c18241e67c47002f60a54af3098831
Author: Karl Williamson <[email protected]>
Date: Tue Jun 6 02:06:30 2017 -0600
XXX Don't push; experimental
M utf8.c
commit 75ae2faee31e5e8c4f17e99b3e7e637c4298a363
Author: Karl Williamson <[email protected]>
Date: Tue Jun 6 01:54:46 2017 -0600
utf8.c: Clarify pod for three functions
utf8_to_bytes(), bytes_from_utf8(), bytes_to_utf8()
M utf8.c
commit f81623ad1b96b2757bdfd01f3e8b80531609e902
Author: Karl Williamson <[email protected]>
Date: Tue Jun 6 01:45:32 2017 -0600
utf8.c: Change formal parameter name
The parameter "len" really is a pointer in utf8_to_bytes(),
bytes_from_utf8(), and bytes_to_utf8(). Call it lenp. The
documentation was sloppy about it; clean that up.
M embed.fnc
M proto.h
M utf8.c
commit e66ed070c5080d99d3658156556f957b2a00e149
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 19:26:37 2017 -0600
utf8.c: White-space, comment only
This adjusts the indentation to reflect changes in the previous commit.
M utf8.c
commit 3777dc190b3317790cef5698af73d7da190cdd14
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 19:21:41 2017 -0600
utf8_to_bytes(): Avoid work if possible
This converts to use the new function is_utf8_invariant_string_loc() to
find the first variant in the input. If none are found, the function is
a no-op. If the intial part of the input is all invariants, they are
now skipped during conversion, resulting in less work for such input.
The new function could also be optimized to speed up searching.
M utf8.c
commit f887848e9a70140370d3c5a2b6e1ac206f51f4cb
Author: Karl Williamson <[email protected]>
Date: Tue Jun 6 02:01:10 2017 -0600
utf8.c: Change UTF8 to UVCHR
There is no practical difference between UTF8_IS_INVARIANT and
UVCHR_IS_INVARIANT, except that the latter is supposed to be used on
characters. Fix to conform.
M utf8.c
commit c4275f4156fe8a564c681e9e9dcac55a1eea0b25
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 18:51:28 2017 -0600
sv.c: Convert to use is_utf8_invariant_string_loc
This inline function was added in the previous commit.
And the function has the potential to be sped up by using word-at-a-time
operations.
M sv.c
commit 95d4efeb8843043526c82a37dffc2ae52cb32235
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 18:33:05 2017 -0600
Add XS-callable function is_utf8_invariant_string_loc()
This is like is_utf8_invariant_string(), but takes an additional
parameter, a pointer into which it stores the localtion of the first
variant if any are found.
M embed.fnc
M embed.h
M inline.h
M pod/perldelta.pod
M proto.h
commit 9ba6b2542b5aa6eb4b37d4ce8d3397481ab7cc4b
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 12:56:28 2017 -0600
bytes_to_utf8(): Remove obsolete comment
It said the logic was duplicated elsewhere, but now the essence of the
logic is in an inlined function used in both places.
M utf8.c
commit aa514a50457482ec2628529290fe20c49308349b
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 12:26:06 2017 -0600
bytes_from_utf8() Can must do memcpy if all invariant
This function does two passes over the input. In the first it decides
if the string can be downgraded, and computes the size needed for the
downgraded string. In the 2nd pass, it does the conversion.
Likely, this wouldn't be called unless there is something to downgrade,
but adding a single 'if' can cause the downgrading to be entirely
skipped, replaced by a memcpy().
M utf8.c
commit 2c804234475c315e794aa204c7583122f96f2e52
Author: Karl Williamson <[email protected]>
Date: Mon Jun 5 12:24:39 2017 -0600
utf8.c: A byte count should be Size_t, not I32
M utf8.c
commit 0798b0ee861fda64feb9c25adf590f02595b89df
Author: Karl Williamson <[email protected]>
Date: Sat Jun 3 08:34:56 2017 -0600
t/op/bop.t: Verify complement downgrades UTF-8.
This adds a test for the existing result, so it doesn't get changed in
the future by mistake. I make no claims that it should work this way.
M t/op/bop.t
commit 075f37e3aa2c3c2c9213dd388e2c226b5cf6cf57
Author: Karl Williamson <[email protected]>
Date: Sat Jun 3 09:09:38 2017 -0600
vutil.c: Use setpvf to avoid uninit.
M vutil.c
commit 09471b7cb78b838b9a183dffe716ee85b7856ce8
Author: Karl Williamson <[email protected]>
Date: Sat Jun 3 09:08:50 2017 -0600
Make LOCK_LC_NUMERIC_STANDARD recursive
Same for UNLOCK_LC_NUMERIC_STANDARD.
M perl.h
-----------------------------------------------------------------------
--
Perl5 Master Repository