Re: SOT: Premature Optimization

Theodore H. Smith Thu, 03 Aug 2006 09:11:59 -0700

From: Marcus Bointon <[EMAIL PROTECTED]>
Date: Thu, 3 Aug 2006 13:17:42 +0100


I may be wrong here, but I'm fairly sure that the dominant unicode
library (IBM's ICU) is centred around UTF-16.

ElfData isn't too bad at doing Unicode stuff. Maybe nowhere near asrich, but it does a lot of stuff still. It even does NFD and NFC.


Also, I do NFD and NFC on UTF-8, directly.

I've been told over and over that this isn't possible.

I know before I wrote this code, that it is possible, and also itwill be fast and simple to implement (for me).


They told me it wasn't possible, still.

I went and built it, and showed them.

Then they shut up :)

That sounds like a good
reason for using it. Generally I've got the impression that UTF-8 is
much better for web use as it's more space-efficient, but it's also
apparently slower to process than UTF-16, which would explain the
choice in a library.

Not necessarily. I haven't seen any evidence that processing it isslower, and I know that because of it's compactness it could even bequicker. The fact that we don't have to interconvert to UTF-8 alsospeeds things up.


UTF-16 has endian issues too, which UTF-8 does not.

And it's very reliable to detect if text is valid UTF-8 even withouta BOM. I have such a detection function in my ElfData plugin. Youcan't reliably detect if text is UTF-16, without a BOM, unfortunately.

I know that Valentina went UTF-16 for precisely this reason.

Could be a mistake :( Is he processing the full code points? If heis, then the variable widthness of UTF-16 kills off the advantageover utf-8.

RB's regex requires UTF-8, btw. If UTF-16 is so much easier, then whyis it using UTF-8?


--
http://elfdata.com/plugin/



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization

Reply via email to