ElfData has a few methods for this!
The first one is completely reliable, as long as the file has a BOM.
function DetectUTFType(s as string) as string
dim e as ElfData
dim bomLen as integer
e = s.ElfData
bomLen = e.EncodingReadBOM // UTF-16 should have a BOMLength of 2
if bomLen > 0 then
e = e.Mid( BomLen + 1 ) // strip the BOM
end if
return e.ToString // .ToString automatically sets the encoding to
UTF-8 or UTF-16 !!
end function
An even simpler way to do this in one line, is this:
function DetectUTFType(s as string) as string
return s.ElfData.ConvertToUTF8
end function
That will also detect the BOM correctly, strip the BOM if it exists,
and set the encoding correctly. The only difference is that it will
convert UTF-16 to UTF-8 as well ;) Actually, for what I do, that's
*desirable*. Because I do all my string processing in UTF-8. The
ElfData plugin is a UTF-8 designed plugin. And also REALbasic seems
to favour UTF-8 over UTF-16, even if only because string literals in
RB are all UTF-8.
Besides, what if the data is UTF-16-LE?? Well, you'll need it
converted, then!
Now, what if the file *does not* start a BOM?
Well, two possible answers:
1) If the first 4 characters are always ASCII (like in XML), ElfData
can detect the encoding, still.
2) Otherwise, ElfData can't help you.
Heres a really short way of doing this:
function XMLorBOMToUTF8(e as ElfData) as string
e.EncodingXMLGuess
return e.ConvertToUTF8.ToString
end function
Handy eh? Just two lines!
you can just call it with a string even, using ElfData.Operator_Convert
msgbox XMLorBOMToUTF8( "hello" ) // despite that "Hello" is a string,
it can be passed as an ElfData parameter
You'd need my plugin of course. My plugin compiles and is tested for
all Win/Mac/Lin.
--
http://elfdata.com/plugin/
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>