From: Marcus Bointon <[EMAIL PROTECTED]>
Date: Thu, 3 Aug 2006 13:17:42 +0100

I may be wrong here, but I'm fairly sure that the dominant unicode
library (IBM's ICU) is centred around UTF-16.

ElfData isn't too bad at doing Unicode stuff. Maybe nowhere near as rich, but it does a lot of stuff still. It even does NFD and NFC.

Also, I do NFD and NFC on UTF-8, directly.

I've been told over and over that this isn't possible.

I know before I wrote this code, that it is possible, and also it will be fast and simple to implement (for me).

They told me it wasn't possible, still.

I went and built it, and showed them.

Then they shut up :)

That sounds like a good
reason for using it. Generally I've got the impression that UTF-8 is
much better for web use as it's more space-efficient, but it's also
apparently slower to process than UTF-16, which would explain the
choice in a library.

Not necessarily. I haven't seen any evidence that processing it is slower, and I know that because of it's compactness it could even be quicker. The fact that we don't have to interconvert to UTF-8 also speeds things up.

UTF-16 has endian issues too, which UTF-8 does not.

And it's very reliable to detect if text is valid UTF-8 even without a BOM. I have such a detection function in my ElfData plugin. You can't reliably detect if text is UTF-16, without a BOM, unfortunately.

I know that Valentina went UTF-16 for precisely this reason.

Could be a mistake :( Is he processing the full code points? If he is, then the variable widthness of UTF-16 kills off the advantage over utf-8.

RB's regex requires UTF-8, btw. If UTF-16 is so much easier, then why is it using UTF-8?

--
http://elfdata.com/plugin/



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to