On Tue, Apr 11, 2006 at 11:30:41AM -0400, John E. Malmberg wrote:
> What I would like to know is if I have figured out this patch fragment
> correct for getting the UTF8 attribute passed back and forth.
>
> Specifically, when I am returning a UTF8 encoded string back to Perl, do
> I need to run it through sv_utf8_upgrade(), or is there a better method?
Sorry, missed this question, which I knew the answer to.
> + if (rslt != NULL) [
> + sv_usepvn(ST(0),rslt,strlen(rslt));
> + if (fs_utf8) {
> + sv_utf8_upgrade(ST(0));
> + }
> + }
No, sv_utf8_upgrade is for converting an SV holding a sequence of bytes that
are ISO-8859-1 characters into an SV holding a (longer) sequence of bytes
that are those same characters encoded in UTF-8.
What I think you need here is
ST(0) = sv_newmortal();
- if (rslt != NULL) sv_usepvn(ST(0),rslt,strlen(rslt));
+ if (rslt != NULL) [
+ sv_usepvn(ST(0),rslt,strlen(rslt));
+ if (fs_utf8) {
+ SvUTF8_on(ST(0));
+ }
+ }
because you need to signal to the internals that the sequence of bytes in
the SV is in UTF-8.
(I'm assuming that the sequence of bytes in rslt was in ISO-8859-1 if fs_utf8
was false, and UTF-8 if fs_utf8 was true. If not, I misunderstood something)
If you're re-using an existing SV (rather than the new one created here by
sv_newmortal()), I'd add an else block with SvUTF8_off(...), as there have
been bugs in the core caused by scalars getting SvUTF8(...) turned on, but
then never turned on, so it "leaks" through on scalar re-use.
Nicholas Clark