In perl.git, the branch blead has been updated

<https://perl5.git.perl.org/perl.git/commitdiff/ee0ff0f58536ba7975a4b8f1d21309ae9f451df7?hp=a578d0f3e37a8500429796cdeeba96dbba029778>

- Log -----------------------------------------------------------------
commit ee0ff0f58536ba7975a4b8f1d21309ae9f451df7
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 9 10:02:31 2019 -0600

    Add UTF8_CHK_SKIP() macro
    
    This is a safer version of UTF8SKIP for use when the input could be
    possibly malformed.  It uses strnlen() to not read past a NUL in the
    input.  Since Perl adds NULs to the end of SV's, this will likely
    prevent reading beyond the end of a buffer.
    
    A still safer version could be written that doesn't look for just a NUL,
    but any unexpected byte, and stops just before that.  I suspect that is
    overkill, and since strnlen() can be very fast, I went with this
    approach instead.  Nothing precludes adding another version that does
    this full checking

commit a281f16cacceabade4e75fbbbeb567285d462ba0
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 9 10:01:32 2019 -0600

    Document UTF8_SKIP()

commit bd350c85f2b40fbbcd57c61670e9aff330675586
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 9 09:39:27 2019 -0600

    Fix pod entry for toLOWER_utf8
    
    It was missing a parameter

-----------------------------------------------------------------------

Summary of changes:
 handy.h |  2 +-
 utf8.h  | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/handy.h b/handy.h
index 8349fd1699..e89b43449d 100644
--- a/handy.h
+++ b/handy.h
@@ -1153,7 +1153,7 @@ The first code point of the lowercased version is returned
 (but note, as explained at L<the top of this section|/Character case
 changing>, that there may be more).
 
-=for apidoc Am|UV|toLOWER_utf8|U8* p|U8* s|STRLEN* lenp
+=for apidoc Am|UV|toLOWER_utf8|U8* p|U8* e|U8* s|STRLEN* lenp
 Converts the first UTF-8 encoded character in the sequence starting at C<p> and
 extending no further than S<C<e - 1>> to its lowercase version, and
 stores that in UTF-8 in C<s>, and its length in bytes in C<lenp>.  Note
diff --git a/utf8.h b/utf8.h
index 889324e587..83cccf16c3 100644
--- a/utf8.h
+++ b/utf8.h
@@ -530,15 +530,61 @@ encoded as UTF-8.  C<cp> is a native (ASCII or EBCDIC) 
code point if less than
 /*
 
 =for apidoc Am|STRLEN|UTF8SKIP|char* s
-returns the number of bytes in the UTF-8 encoded character whose first (perhaps
-only) byte is pointed to by C<s>.
+returns the number of bytes a non-malformed UTF-8 encoded character whose first
+(perhaps only) byte is pointed to by C<s>.
+
+If there is a possibility of malformed input, use instead:
+
+=over
+
+=item L</C<UTF8_SAFE_SKIP>> if you know the maximum ending pointer in the
+buffer pointed to by C<s>; or
+
+=item L</C<UTF8_CHK_SKIP>> if you don't know it.
+
+=back
+
+It is better to restructure your code so the end pointer is passed down so that
+you know what it actually is at the point of this call, but if that isn't
+possible, L</C<UTF8_CHK_SKIP>> can minimize the chance of accessing beyond the 
end
+of the input buffer.
 
 =cut
  */
 #define UTF8SKIP(s)  PL_utf8skip[*(const U8*)(s)]
+
+/*
+=for apidoc Am|STRLEN|UTF8_SKIP|char* s
+This is a synonym for L</C<UTF8SKIP>>
+
+=cut
+*/
+
 #define UTF8_SKIP(s) UTF8SKIP(s)
 
 /*
+=for apidoc Am|STRLEN|UTF8_CHK_SKIP|char* s
+
+This is a safer version of L</C<UTF8SKIP>>, but still not as safe as
+L</C<UTF8_SAFE_SKIP>>.  This version doesn't blindly assume that the input
+string pointed to by C<s> is well-formed, but verifies that there isn't a NUL
+terminating character before the expected end of the next character in C<s>.
+The length C<UTF8_CHK_SKIP> returns stops just before any such NUL.
+
+Perl tends to add NULs, as an insurance policy, after the end of strings in
+SV's, so it is likely that using this macro will prevent inadvertent reading
+beyond the end of the input buffer, even if it is malformed UTF-8.
+
+This macro is intended to be used by XS modules where the inputs could be
+malformed, and it isn't feasible to restructure to use the safer
+L</C<UTF8_SAFE_SKIP>>, for example when interfacing with a C library.
+
+=cut
+*/
+
+#define UTF8_CHK_SKIP(s)                                                       
\
+            (s[0] == '\0' ? 1 : MIN(my_strnlen((char *) (s), UTF8SKIP(s))))
+/*
 
 =for apidoc Am|STRLEN|UTF8_SAFE_SKIP|char* s|char* e
 returns 0 if S<C<s E<gt>= e>>; otherwise returns the number of bytes in the

-- 
Perl5 Master Repository

Reply via email to