Change 18242 by jhi@kosh on 2002/12/03 15:04:07

        Slight tweaks on the XS-and-Unicode docs, inspired by [perl #17852].

Affected files ...

.... //depot/maint-5.8/perl/pod/perlguts.pod#2 edit
.... //depot/maint-5.8/perl/pod/perlunicode.pod#3 edit

Differences ...

==== //depot/maint-5.8/perl/pod/perlguts.pod#2 (text) ====
Index: perl/pod/perlguts.pod
--- perl/pod/perlguts.pod#1~17645~      Fri Jul 19 12:29:57 2002
+++ perl/pod/perlguts.pod       Tue Dec  3 07:04:07 2002
@@ -2230,13 +2230,15 @@
 over. You're on your own about bounds checking, though, so don't use it
 lightly.
 
-All bytes in a multi-byte UTF8 character will have the high bit set, so
-you can test if you need to do something special with this character
-like this:
+All bytes in a multi-byte UTF8 character will have the high bit set,
+so you can test if you need to do something special with this
+character like this (the UTF8_IS_CONTINUED() is a macro that tests
+whether the byte is part of a multi-byte UTF-8 character):
 
-    UV uv;
+    U8 *utf;
+    UV uv;     /* Note: a UV, not a U8, not a char */
 
-    if (utf & 0x80)
+    if (UTF8_IS_CONTINUED(*utf))
         /* Must treat this as UTF8 */
         uv = utf8_to_uv(utf);
     else
@@ -2247,7 +2249,7 @@
 value of the character; the inverse function C<uv_to_utf8> is available
 for putting a UV into UTF8:
 
-    if (uv > 0x80)
+    if (UTF8_IS_CONTINUED(uv))
         /* Must treat this as UTF8 */
         utf8 = uv_to_utf8(utf8, uv);
     else
@@ -2309,6 +2311,10 @@
 not it's dealing with UTF8 data, so that it can handle the string
 appropriately.
 
+Since just passing an SV to an XS function and copying the data of
+the SV is not enough to copy the UTF8 flags, even less right is just
+passing a C<char *> to an XS function.
+
 =head2 How do I convert a string to UTF8?
 
 If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
@@ -2349,12 +2355,13 @@
 =item *
 
 If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
-unless C<!(*s & 0x80)> in which case you can use C<*s>.
+unless C<!UTF8_IS_CONTINUED(*s)> in which case you can use C<*s>.
 
 =item *
 
-When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
-C<uv < 0x80> in which case you can use C<*s = uv>.
+When writing a character C<uv> to a UTF8 string, B<always> use
+C<uv_to_utf8>, unless C<!UTF8_IS_CONTINUED(uv))> in which case
+you can use C<*s = uv>.
 
 =item *
 

==== //depot/maint-5.8/perl/pod/perlunicode.pod#3 (text) ====
Index: perl/pod/perlunicode.pod
--- perl/pod/perlunicode.pod#2~18080~   Sun Nov  3 21:23:04 2002
+++ perl/pod/perlunicode.pod    Tue Dec  3 07:04:07 2002
@@ -1015,8 +1015,10 @@
 
 =head2 Using Unicode in XS
 
-If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful.  See L<perlapi> for details.
+If you want to handle Perl Unicode in XS extensions, you may find the
+following C APIs useful.  See also L<perlguts/"Unicode Support"> for an
+explanation about Unicode at the XS level, and L<perlapi> for the API
+details.
 
 =over 4
 
End of Patch.

Reply via email to