Change 18242 by jhi@kosh on 2002/12/03 15:04:07
Slight tweaks on the XS-and-Unicode docs, inspired by [perl #17852].
Affected files ...
.... //depot/maint-5.8/perl/pod/perlguts.pod#2 edit
.... //depot/maint-5.8/perl/pod/perlunicode.pod#3 edit
Differences ...
==== //depot/maint-5.8/perl/pod/perlguts.pod#2 (text) ====
Index: perl/pod/perlguts.pod
--- perl/pod/perlguts.pod#1~17645~ Fri Jul 19 12:29:57 2002
+++ perl/pod/perlguts.pod Tue Dec 3 07:04:07 2002
@@ -2230,13 +2230,15 @@
over. You're on your own about bounds checking, though, so don't use it
lightly.
-All bytes in a multi-byte UTF8 character will have the high bit set, so
-you can test if you need to do something special with this character
-like this:
+All bytes in a multi-byte UTF8 character will have the high bit set,
+so you can test if you need to do something special with this
+character like this (the UTF8_IS_CONTINUED() is a macro that tests
+whether the byte is part of a multi-byte UTF-8 character):
- UV uv;
+ U8 *utf;
+ UV uv; /* Note: a UV, not a U8, not a char */
- if (utf & 0x80)
+ if (UTF8_IS_CONTINUED(*utf))
/* Must treat this as UTF8 */
uv = utf8_to_uv(utf);
else
@@ -2247,7 +2249,7 @@
value of the character; the inverse function C<uv_to_utf8> is available
for putting a UV into UTF8:
- if (uv > 0x80)
+ if (UTF8_IS_CONTINUED(uv))
/* Must treat this as UTF8 */
utf8 = uv_to_utf8(utf8, uv);
else
@@ -2309,6 +2311,10 @@
not it's dealing with UTF8 data, so that it can handle the string
appropriately.
+Since just passing an SV to an XS function and copying the data of
+the SV is not enough to copy the UTF8 flags, even less right is just
+passing a C<char *> to an XS function.
+
=head2 How do I convert a string to UTF8?
If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
@@ -2349,12 +2355,13 @@
=item *
If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
-unless C<!(*s & 0x80)> in which case you can use C<*s>.
+unless C<!UTF8_IS_CONTINUED(*s)> in which case you can use C<*s>.
=item *
-When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
-C<uv < 0x80> in which case you can use C<*s = uv>.
+When writing a character C<uv> to a UTF8 string, B<always> use
+C<uv_to_utf8>, unless C<!UTF8_IS_CONTINUED(uv))> in which case
+you can use C<*s = uv>.
=item *
==== //depot/maint-5.8/perl/pod/perlunicode.pod#3 (text) ====
Index: perl/pod/perlunicode.pod
--- perl/pod/perlunicode.pod#2~18080~ Sun Nov 3 21:23:04 2002
+++ perl/pod/perlunicode.pod Tue Dec 3 07:04:07 2002
@@ -1015,8 +1015,10 @@
=head2 Using Unicode in XS
-If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful. See L<perlapi> for details.
+If you want to handle Perl Unicode in XS extensions, you may find the
+following C APIs useful. See also L<perlguts/"Unicode Support"> for an
+explanation about Unicode at the XS level, and L<perlapi> for the API
+details.
=over 4
End of Patch.