XML serializer; handling characters outside the encoding

Christopher Painter-Wakefield Mon, 15 Dec 2003 08:31:17 -0800



I have a data consumer who is pulling XML from our Cocoon webapp.  They
couldn't handle UTF-8 on their end, so I gave them the option to pull data
in US-ASCII encoding.  However, when I did that, symbol characters such as
Greek and math symbols got sent over even though they aren't in the
encoding.  When I saved a result from our system and opened it with XML
Spy, it complained about these characters.  On my consumer's end, it makes
his software blow up.  I'm not sure exactly how these characters are output
(I don't have a good byte-level editor), but I assume it is doing some kind
of double-character thing that creates bytes outside the range of defined
characters for the encoding, or something similar.

My question is, what should the behavior be when coping with characters
outside the encoding, and where does the responsibility lie?  My assumption
would be that the XML serializer should take characters outside the
encoding and turn them into entity references (&#916; for greek delta, for
instance).  I am on C2.0.3, so maybe that has been done in a later release,
but if not, should it?  I am going to explore a change to the serializer
for just that purpose, but if it has already been done, I'd like to grab
the code for it.  I'm assuming you can use character entities in any
encoding, regardless of whether the characters thus specified have a code
in that encoding.

Thanks,
Christopher


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

XML serializer; handling characters outside the encoding

Reply via email to