Before writing a character to the output stream the serializer makes an attempt to determine whether it can be represented in the output encoding. With a few exceptions, if the character is representable it is converted to the appropriate byte sequence, otherwise it is written as a character reference (or as a reference to one of the predfined entities: amp, lt, quot, etc... if it must be escaped). Really these details shouldn't matter to the application developer as all of these forms contain the same information.
On Tue, 23 Dec 2003, Bob Foster wrote: > Really? You mean the serializer should look at the encoding of the > document, determine that the character cannot be represented in the > encoding and automatically convert it to &#nnnn; form? > > I don't know that you're wrong; I'm just surprised. It seems quite > ambitious to try to know what characters can be represented in all the > encodings in the world. (Some encodings even have user-defined spaces, > that can be assigned glyphs by local convention.) Does Xerces do that? > > Even if a serializer tried to do that, it would have no justification > for converting a character to &#nnnn; form if the character was > representable in the encoding, and _all_ Unicode characters can be > represented in, e.g., UTF-8. > > Bob Foster > http://xmlbuddy.com/ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------- Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
