On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote:

> As painful as it may sound (codingwise) I would urge to spare some
> thought to using (internally) UTF-32 for those encodings for which
> UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).

most CPUs can load a 32 bit quantity in 1 machine instruction
most CPUs would take 2 or 3 machine instructions to load 2 or 3 bytes of
variable length encoding, and I'd guess that on most RISC CPUs those
three instructions take three times the space, (and take 3 times the
single load instruction)
And that's ignoring the code to bit shuffle those bytes that make up the
character.

So it may be more total space efficient to use 32 bits for data.
And although it feels like we'll be shifting 32 bits of data round per
character instead of 8-40 with an average less than 32, it might still take
longer because we're doing it less efficiently.

Just a passing thought. Extrapolated up from 1 RISC CPU I know quite well.

Nicholas Clark

Reply via email to