From: Matthias Buercher <[EMAIL PROTECTED]>
Date: Sun, 12 Feb 2006 10:25:54 +0100

two utf16 questions:

(1) with

        defineencoding(encodings.utf16)

i can define the string as utf16.
but how can i define it as bigendian or littleendian?

(2) given a binarystream and i know that i have to read a utf16
string with a given character length, what would be the proper method
to read this string? the bytelength can be bigger then
2*characterlength. i thought to read first 2*characterlength and then
test if the character length is achieved, else read chunks as long
until the string has its length.

You can't set a string to have an endian property. However as Boris correctly mentioned, binarystreams and memoryblocks can have endian properties.

The best way to read a binary stream is usually to do it all at once, I think, at least that's the simplest way, I don't know if RB manages to make it into the fastest way.

dim s as string

s = mybinarystream.read( mybinarystream.length )

If you want to then swap the endianness, use a memoryblock, or a plugin such as my own.

with my plugin you'd do this:

ed = s.ElfData
ed.UTF=16
ed = ed.ConvertTo(16, ed.BigEndian=false)
s = ed.ToString

That's it, you've swapped the endianness, and you did it through an optimised method. .ConvertTo won't do any more work than necessary, so here it only swaps bytes and does nothing else.

If you wanted to convert it into UTF-8, you could just do this:

ed = ed.ConvertToUTF8
s = ed.ToString

That's also a lot faster than swapping it manually. My encoding converter is at least 2x faster than RB's encoding converter, although RB might just be going through Apple's UTF converter so it's not necessarily a comparison of my coding skills against theirs.

Also, if you knew that the first 4 characters if your string are always ASCII (like an XML file, or an HTML file or even a human editable config file), then you can use my .EncodingXMLGuess method.

ed = s.ElfData
ed.EncodingXMLGuess
ed = ed.ConvertToUTF8
s = ed.ToString

That nice 4 lines of code will convert a string containing any UTF (even UTF-32), into UTF-8, and do it perfectly reliably, as long as the first 4 characters of the string are ASCII! (EncodingXMLGuess also checks for a BOM, if one is present.)

--
http://elfdata.com/plugin/



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to