Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: 0401b9c802499120e98cb9792159086a07fefb79 https://github.com/Perl/perl5/commit/0401b9c802499120e98cb9792159086a07fefb79 Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024)
Changed paths: M utf8.c Log Message: ----------- utf8_to_uv_msgs: Move code to ensure initialization There was a path through this function in which the caller's parameter it asked to be set, &msgs, did not get set. And doing it at the beginning means not needing a second place. Similarly for &errors. There is no path where it didn't get set, but it is cleaner to do it in at the same time as doing msgs. Commit: bb07e051599ac6ef0b043d378e133aa19156e7ec https://github.com/Perl/perl5/commit/bb07e051599ac6ef0b043d378e133aa19156e7ec Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M utf8.c Log Message: ----------- utf8_to_uv_msgs: Add branch predictions These two input parameters are for very specialized uses. Commit: be1548c6afe60f52ec4e2fa49a90b1fc6ec7f813 https://github.com/Perl/perl5/commit/be1548c6afe60f52ec4e2fa49a90b1fc6ec7f813 Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M inline.h M proto.h M utf8.c Log Message: ----------- Inline utf8_to_uvchr_buf This is a one line function that just calls another function. Commit: be98641b0d29edf79bb33f46445d6139c02b1b7b https://github.com/Perl/perl5/commit/be98641b0d29edf79bb33f46445d6139c02b1b7b Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M inline.h M proto.h M utf8.h Log Message: ----------- Merge utf8_to_uvchr_buf() and its helper The helper adds no value Commit: 395e3b63ac5b0925e14d1347803dd3aca9892e0f https://github.com/Perl/perl5/commit/395e3b63ac5b0925e14d1347803dd3aca9892e0f Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.c M utf8.h Log Message: ----------- Convert utf8n_to_uvchr_error to macro It was a macro, but had a long-name function as well. This converts to using two macros. Commit: ddfa240a44db6da8bd50dabce39a9b39616ddadd https://github.com/Perl/perl5/commit/ddfa240a44db6da8bd50dabce39a9b39616ddadd Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.c M utf8.h Log Message: ----------- Convert utf8n_to_uvchr() to macro It was a macro, but had a long-name function as well. This converts to using two macros. Commit: 0187f3d9c375c14d50be5f9d9c75bd696bba8b0a https://github.com/Perl/perl5/commit/0187f3d9c375c14d50be5f9d9c75bd696bba8b0a Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M inline.h M proto.h M utf8.c Log Message: ----------- Add utf8_to_uv_msgs() This is the first of several functions with the naming style utf8_to_uv(), and which are designed to be used instead of the problematic current ones that are like utf8_to_uvchr(). The previous ones basically throw away crucial information in their returns upon failure, creating hassles for the caller. It is hard to recover from malformed input with them to keep going to continue parsing. That is what modern UTF-8 handlers have settled on doing. Originally I planned to replace just the most problematic one, utf8_to_uvchr_buf(), but I realized that each level threw away information, so it would be better to start at the base level one, which utf8_to_uvchr_buf() eventually calls with a bunch of 0 parameters. The previous functions all had to disambiguate failure returns. This stops that at the root. The new series all return a boolean as to their success, with a consistent API throughout. The old series had one outlier, again utf8_to_uvchr_buf(), which had a different calling convention and returns. The basic logic in the base level function, which this commit handles, was sound. It just failed to return relevant information upon failure. The new API has somewhat different formal parameter names and uses Size_t instead of STRLEN for one of the parameters. It also passes the end of string position instead of a length. The latter is problematic when it could go negative, and instead becomes a huge positive number. The old base function now merely calls the new one, and throws away the relevant information, as it always has. Commit: e6e110f113b55f3f9b842a5a000b2c2063932429 https://github.com/Perl/perl5/commit/e6e110f113b55f3f9b842a5a000b2c2063932429 Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.h Log Message: ----------- Add utf8_to_uv_error(s) This is just utf8n_to_uvchr_error() with a more convenient API that is harder to misuse. New code should use this new function instead of the old. Commit: 16d0f3cb1f7f954597e48cb4ea5d7e1c97bf5ffa https://github.com/Perl/perl5/commit/16d0f3cb1f7f954597e48cb4ea5d7e1c97bf5ffa Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.h Log Message: ----------- Add utf8_to_uv_flags() This is just utf8n_to_uvchr() with a more convenient API that is harder to misuse. New code should use this new function instead of the old. Commit: 95f8a0bcabcf4b646686b22122f5a38a014bf369 https://github.com/Perl/perl5/commit/95f8a0bcabcf4b646686b22122f5a38a014bf369 Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.h Log Message: ----------- Add utf8_to_uv() This performs the same function as utf8_to_uvchr_buf() with a more convenient API that is much harder to misuse. All code should convert to use this new function instead of the old. The behavior of utf8_to_uvchr_buf() varies depending on if <utf8> warnings are enabled or not, and no code in core actually takes that into account If warnings are enabled: A zero return can mean both success or failure Hence a zero return must be disambiguated. Success would come from the next character being a NUL. If failure, <retlen> will be -1, so can't be used to find where to start parsing again. If disabled: Both the return and <retlen> will be usable values, but the return of the REPLACEMENT CHARACTER is ambiguous. It could mean failure, or it could mean that that was the next character in the input and was successfully decoded. It may very well not matter to you what the source of this particular value was. It likely means a failure somewhere. But there are occasions where you might care. The new function returns true upon success; false on failure. And it is passed pointers to return the computed code point and byte length into. These values always contain the correct information, regardless of if the input is malformed or not. It is easy to test for failure in a conditional and then to take appropriate action. However, most often it seems the appropriate action is to use, going forward, the REPLACEMENT CHARACTER returned in failure cases. And if you don't care particularly if it succeeds or not, you just use it without testing the result. This happens when you are confident that the input is well-formed, or say in converting a string for display. Commit: 137c0f08f75d51af8799a78b69158b6a7e40d2ff https://github.com/Perl/perl5/commit/137c0f08f75d51af8799a78b69158b6a7e40d2ff Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M inline.h Log Message: ----------- Implement utf8_to_uvchr_buf in terms of utf8_to_uv_flags This is simpler than the existing one. Commit: 77b3314b8c71cddecf3ca7f12754a560d115cf08 https://github.com/Perl/perl5/commit/77b3314b8c71cddecf3ca7f12754a560d115cf08 Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M embed.h M proto.h M utf8.h Log Message: ----------- Add utf8_to_uv() flavors One of these is a more explicit synonym for that function; the other two restrict what's acceptable to Unicode's legal interchange or their C9 legal interchange. Commit: ae865e73a1fb69da88b4ad0f4b1a2443c73ab8fc https://github.com/Perl/perl5/commit/ae865e73a1fb69da88b4ad0f4b1a2443c73ab8fc Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M embed.fnc M inline.h M mathoms.c M utf8.c Log Message: ----------- Document new utf8_to_uv function family Commit: cffb5af552dad60649b537b4ffa7cbe9d2f5fcfa https://github.com/Perl/perl5/commit/cffb5af552dad60649b537b4ffa7cbe9d2f5fcfa Author: Karl Williamson <k...@cpan.org> Date: 2024-12-02 (Mon, 02 Dec 2024) Changed paths: M pod/perldelta.pod M utf8.c Log Message: ----------- perldelta for utf8_to_uv() family Compare: https://github.com/Perl/perl5/compare/6a9d009e69dd...cffb5af552da To unsubscribe from these emails, change your notification settings at https://github.com/Perl/perl5/settings/notifications