In message <[EMAIL PROTECTED]> Gibbs Tanton <[EMAIL PROTECTED]> wrote:
> I've applied this patch. I just did an update and noticed the new files had appeared about two seconds before your mail arrived ;-) > I realize that we have a ways to go before we can fully support unicode, but > I felt that this patch was a big step in the right direction; with it > committed we can now start incrementally cleaning it up and making it work > correctly. Since it doesn't affect anything we are working on it shouldn't > get in the way at all. Absolutely. A few other issues that I remembered last night are: - The current code assumes that the string data will be two byte aligned for UTF-16 and four byte aligned for UTF-32 which is probably reasonable but maybe not. - The utf8_t, utf16_t and utf32_t types will need to be determined by configure as they will currently break on some machines. Plus machines without native 8, 16 and 32 bit types will be a problem. - There are byte ordering issues for UTF-16 and UTF-32 strings. The current code assumes host byte ordering but should we be spotting byte order markers in the strings and adjusting to cope? > We do need to figure out how to change from unicode to native. We also need > to make sure that we don't hardcode the encoding in the assembler, the > assembler should be able to get what encoding to use from a file. A fundamental question (which I think Simon was hinting at with his cryptic comment) is whether the native encoding is fixed when parrot is built or can change on the fly as they user changes their locale settings. If it's the latter than conversion to and from native will have to work by loading an appropriate conversion table at run time. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu