In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 07:03 PM 10/8/2001 -0500, Gibbs Tanton - tgibbs wrote: > >This looks good. > > > >Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and > >then mask off the lower 8 or 16 bits? We can still have utf8_t be defined > >as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get > >the utf16_t's size. > > utf8 and utf16 are both variable length encodings for space reasons. > There's not much reason to space-compact something then expand the heck out > of it. I think he was just referring to the internal type used to hold a character during processing, not to expanding the whole string. > On the other hand, I'd really, *really* rather not have Unicode > constants in anything other than UTF-32, so I'd as soon we chopped out the > utf-8 and utf-16 constant support from this. > > A should be the prefix for US-ASCII characters. > U should be the prefix for Unicode characters > N should be the prefix for the native character set (and the default) > > Beyond that I'm not sure what, if anything, we should accommodate in the > assembler. What does US-ASCII correspond to internally - we don't have an encoding for that. unless you're planning to mark it as UTF-8 and rely on US-ASCII being a subset of UTF-8 of course ;-) The only oter thing is that writing tests for UTF-8 and UTF-16 strings and the transcoder is going to be quite tricky if we can't generate them using the assembler. Other than that I'll sort out a patch for this later today. Moving on, my next target is to get string comparison working. That's not too difficult until you have to compare strings whose encodings are different - comparing two unicode strings is OK as we can always transcode the second to the same type as the first, but if we're comparing a native string with a unicode string we will have to do a transcode from native to unicode even if the native string is first, so the transcoding will have to be done at the string layer rather than the strnative/strutfn layers I think. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/