In message <[EMAIL PROTECTED]>
          Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 07:03 PM 10/8/2001 -0500, Gibbs Tanton - tgibbs wrote:
> >This looks good.
> >
> >Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and
> >then mask off the lower 8 or 16 bits?  We can still have utf8_t be defined
> >as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get
> >the utf16_t's size.
>
> utf8 and utf16 are both variable length encodings for space reasons.
> There's not much reason to space-compact something then expand the heck out
> of it.

I think he was just referring to the internal type used to hold a
character during processing, not to expanding the whole string.

> On the other hand, I'd really, *really* rather not have Unicode
> constants in anything other than UTF-32, so I'd as soon we chopped out the
> utf-8 and utf-16 constant support from this.
>
> A should be the prefix for US-ASCII characters.
> U should be the prefix for Unicode characters
> N should be the prefix for the native character set (and the default)
>
> Beyond that I'm not sure what, if anything, we should accommodate in the
> assembler.

What does US-ASCII correspond to internally - we don't have an
encoding for that. unless you're planning to mark it as UTF-8 and
rely on US-ASCII being a subset of UTF-8 of course ;-)

The only oter thing is that writing tests for UTF-8 and UTF-16 strings
and the transcoder is going to be quite tricky if we can't generate
them using the assembler.

Other than that I'll sort out a patch for this later today.

Moving on, my next target is to get string comparison working. That's
not too difficult until you have to compare strings whose encodings
are different - comparing two unicode strings is OK as we can always
transcode the second to the same type as the first, but if we're
comparing a native string with a unicode string we will have to do
a transcode from native to unicode even if the native string is
first, so the transcoding will have to be done at the string layer
rather than the strnative/strutfn layers I think.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Reply via email to