Re: Internal Representation of Unicode

Rick McGowan Fri, 26 Sep 2003 09:45:12 -0700

myrkraverk.......sourceforge.... wrote:

> In a plain text environment, there is often a need to encode more than
> just the plain character.
...
> Since I'm using 64 bits, I call it Excessive Memory Usage Encoding, or
> EMUE.
...
> I thought of dividing the 64 bit code space into 32 variably wide
> plains, one for control characters, one for latin characters, one for
> han characters, and so on;


This all seems to me like something of a pointless excercise. Or maybe  
you're not making clear what is your intented audience of users and  
problems that you're trying to solve.

Decent libraries exist that already do nice things with strings having
attributes. And that, in my opinion, is a better model than bit-hacking in
a 64-bit space with vague implementation-defined attributes that change
depending on the "script" of a character. Such "attributed strings" are
easy to work with and provide a much higher-level model than this.

You might want to check out Apple's Cocoa environment, particularly the
definitions of the attributed string classes. For example...
http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Java/Classes/NSAttributedString.html
or even the intro:
http://developer.apple.com/documentation/Cocoa/Conceptual/AttributedStrings/index.html

I'm sure there are libraries with similar capabilities for storing
characters + attributes in Java and other languages, I'm just not familiar
with them. Maybe some of the developers can chime in with their favorite
attributed string libraries. Even if you don't use one, you might find the
attributed string model educational.

(All of the above of course reflects only my personal opinion.)

        Rick

Re: Internal Representation of Unicode

Reply via email to