Erland Sommarskog <[EMAIL PROTECTED]> writes: >Jean-Michel Hiver ([EMAIL PROTECTED]) writes: >> Erland Sommarskog wrote: >>>I working with an XS module that passes queries to MS SQL Server and >>>returns data back using SQLOLEDB. MS SQL Server stores Unicode data >>>as UTF-16. Also, all metadata is UTF-16. >>> >>>Currently when I get Unicode data back from SQL Server, I convert it to >>>UTF-8, stash it in an SV, and then set the UTF-8 flag, without checking >>>whether this is really necessary.
That should be okay. A reasonably cheap option is to convert to UTF-8 as above, then scan so see if any of high bits are set and only set SvUTF8_on if they occur. That way pure ASCII isn't "penalized" by having UTF-8 bit set. Doing a convert to iso-8859-1 is the alternative, but note that NOT setting UTF-8 flag on high chars (even if representable) affects (sadly) the semantics. So unless "locale" is used (which is a bit alien to Win32) 'Ã' (N with tilde) etc. are not alpha as perl defaults to C locale. Note too that normal Windows "latin 1" code page is a superset of iso-8859-1 - so converting to that is wrong, as it will encode Euro, smart quotes and m-dash etc. into places (0x80..0x9f) that are not what perl expects. >>> >> Personally I try to use Encode as much as possible which does The Right >> Thing for me. >> >> $string = Encode::decode ('utf-16', $octets); is pretty safe. As far as I recall Encode::decode leaves the SvUTF8 flag on once it has done its thing. But Dan may have cleaned that up. >> >> Regarding to speed, Encode seems pretty fast to me - but YMMV I guess. > >Alright, I failed to say that this is an XS module, so I convert with >WideCharToMultiByte, a Windows routine(*), put the result in an SV, and >then say SvUTF8_on. The possible danger here is if the "multi byte" encoding for user's environment is not UTF-8 but (say) a Japanese one. Using Encode avoids that. > >(*) SQLOLEDB is available on Windows only, so portability is not an issue.