----- Original Message -----
From: "Curt Arnold" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, May 14, 2001 10:06 PM
Subject: Re: XMLCh & wchar_t conversion on multiple platforms


> The encoding used in the source document or the current code page is
> immaterial to the converting UTF-16 to ISO-8859-1.  I definitely don't
> want to be using any arbitrary encoding as an internal representation.
> Whatever the original code point used in the source file was converted
> to the corresponding Unicode code point by the transcoder and either
> the XMLCh* string can be converted to ISO-8859-1 by truncating
> the high bytes or it can't.
>

Ok, yes if you go through the parser and you get it out in XMLCh format,
then you could do a trivial truncation to get 8859-1. But, you'd have to
scan the entire outgoing contents to figure out whether you could do it. If
there is a lot of source, if you add up that extra overhead, and the fact
that you know have to do a switch inside every operation inside DOMString,
that's kind of robbing Peter (performance) to pay Paul (memory.)

Also, if you did that on a per-callback basis, you might find on some that
you have to store it in UTF-8 because it has > 255 code points, and others
that don't. So you'd have some DOMStrings that are UTF-8 and some in 8859-1.
What happens when you have to compare them and such, as you'd probably do a
LOT of in a complicated XSL transformation on a large file?

Wouldn't you be better off just unconditionally using UTF-8, at least in the
scheme you are trying to implement, which I do not at all advocate for the
DOM in general.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
[EMAIL PROTECTED]
http://www.charmedquark.com

"Why put off until tomorrow what you can
put off until the day after tomorrow?"



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to