Before writing a character to the output stream the serializer makes an
attempt to determine whether it can be represented in the output encoding.
With a few exceptions, if the character is representable it is converted
to the appropriate byte sequence, otherwise it is written as a character
reference (or as a reference to one of the predfined entities: amp, lt,
quot, etc... if it must be escaped).  Really these details shouldn't
matter to the application developer as all of these forms contain the same
information.

On Tue, 23 Dec 2003, Bob Foster wrote:

> Really? You mean the serializer should look at the encoding of the
> document, determine that the character cannot be represented in the
> encoding and automatically convert it to &#nnnn; form?
>
> I don't know that you're wrong; I'm just surprised. It seems quite
> ambitious to try to know what characters can be represented in all the
> encodings in the world. (Some encodings even have user-defined spaces,
> that can be assigned glyphs by local convention.) Does Xerces do that?
>
> Even if a serializer tried to do that, it would have no justification
> for converting a character to &#nnnn; form if the character was
> representable in the encoding, and _all_ Unicode characters can be
> represented in, e.g., UTF-8.
>
> Bob Foster
> http://xmlbuddy.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to