I would like to go on the record as saying that I'm very much against this
proposal, nothing personal.

I believe that the simplicity and maintainability of the current
architecture is more important. The work to make the parser core
conditionally do Unicode or ASCII is more than I'm willing to stomach. It
would require twice the development and testing effort. Great pain has been
taken to avoid conditional code in the parser, and its stability and
maintainability reflect that.

For those few people who need to do so, transcoding from Unicode to local
code page is trivial. It would be far, far, far more trivial for you to put
a transcoding event handler layer over our events which transcode into
local buffers and then pass these on, so that the client code only ever
sees whatever form it wants to.

The original intent when I designed the system was that XMLCh would float
to wchar_t. Unfortunately, I never made this clear, and this was not done
on all platforms. We hope to address this, hopefully for the 3.1.0 release.
This will make the output on all platforms passable directly to the wide
character APIs of the platform. Its currently done on VC++ and Borland, and
perhaps just by accident on other platforms, but it needs to be done
explicitly in the next release.

Having the size of the character float to either 16 or 32 bits (which is
all that's involved in floating to wchar_t) is trivial, and basically only
comes into play in a couple of places. The encoding is still Unicode
internally and all of the code that assumes this is unaffected by the size
of the actually character storage.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



"Pardoe, Julian" <[EMAIL PROTECTED]> on 12/20/99 10:26:37 AM

Please respond to [EMAIL PROTECTED]

To:   "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
cc:
Subject:  RE: PROPOSAL: DOMString



On Monday, December 13, 1999 at 8:17 PM [EMAIL PROTECTED] wrote:
<<<And if someone wants to change the typedefs and recompile, they could
create a 32-bit for efficiency, or 8-bit for ASCII-only with memory
constraints.>>>

I'd be very keen to see an option to allow XMLCh to be equated to char.  We
currently have minimal interest in supporting anything other than ASCII.
Storing non-ASCII characters using Java-style NUL-free UTF8 would be a
simple way of handling the occasional non-ASCII character we might see as
these will typically be in fields that we never examine but simply copy
around and compare.

Having XMLChs as some other than chars makes life a major pain.  Suddenly
all the regular facilities one's used to using aren't there any more.
Suddenly your having to convert strings before you can pass them to any
part
of your existing system.  The answer is of course transcoding: one can wrap
every string access in a call to a transcoder.  But this is clumsy and
ineffecient -- it would be nice if the transcoding were done long before
the
input ever reached you!

A define would be nice, e.g. XMLCH_IS_CHAR so that one could use #ifdefs to
control the inclusion of potentially clashing overloads.

I guess some people would like to see the same for wchar_t.

Talking of potentially clashing overloads I've also found the theft of a
basic type with a different meaning (a character is not an integer of any
size, shape or form!) a pain.  I can't see any way around this though
(without the ADA-like ability to say "type XMLCh = new Integer;").  One
could use a wrapper class with suitable non-explicit constructor and
conversion operator but the cost of having a constructor would probably
make
this solution nacceptable.


Reply via email to