In message <[EMAIL PROTECTED]>
        Gibbs Tanton <[EMAIL PROTECTED]> wrote:

> I've applied this patch.

I just did an update and noticed the new files had appeared about
two seconds before your mail arrived ;-)

> I realize that we have a ways to go before we can fully support unicode, but
> I felt that this patch was a big step in the right direction; with it
> committed we can now start incrementally cleaning it up and making it work
> correctly.  Since it doesn't affect anything we are working on it shouldn't
> get in the way at all.

Absolutely. A few other issues that I remembered last night are:

  - The current code assumes that the string data will be two
    byte aligned for UTF-16 and four byte aligned for UTF-32 which
    is probably reasonable but maybe not.

  - The utf8_t, utf16_t and utf32_t types will need to be determined
    by configure as they will currently break on some machines. Plus
    machines without native 8, 16 and 32 bit types will be a problem.

  - There are byte ordering issues for UTF-16 and UTF-32 strings. The
    current code assumes host byte ordering but should we be spotting
    byte order markers in the strings and adjusting to cope?

> We do need to figure out how to change from unicode to native.  We also need
> to make sure that we don't hardcode the encoding in the assembler, the
> assembler should be able to get what encoding to use from a file.

A fundamental question (which I think Simon was hinting at with his
cryptic comment) is whether the native encoding is fixed when parrot
is built or can change on the fly as they user changes their locale
settings. If it's the latter than conversion to and from native will
have to work by loading an appropriate conversion table at run time.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Reply via email to