Hi all,
It is a good idea to use a different encoding (like US-ASCII) to convert all the special characters into their entity references.
 
But, the problem is there are many places in the application the xml gets processed using XML processors (JDOM), after the transformation is done. We use encodings like ISO-8859-1 and UTF-8 at those places. And, the transformed xml is now US-ASCII encoded. Do you think those entity references will remain as is (without again getting converted into direct characters) during such XML processings? If it does, then that is the best solution. The problem it takes huge time to change the remaining part of the code (where XML processing is done), if it doesn't. If it doesn't, is there any way to change the encoding back to the original (without costly processing) and change those "®" to "&174;"
 
And the main goal is to make the xml source have entity references (like ®) for ALL special characters, so that no XML processor fails because of encoding. Is that reasonable to do (If not the smartest thing:-)?
 
Please suggest!
 
Thanks,
 
Pramodh.
----- Original Message -----
Sent: Wednesday, January 14, 2004 1:01 PM
Subject: RE: replacing ALL chars in a string while transforming


It sounds like the answer is to specify that your output encoding is US ASCII, thus forcing all characters outside the 7-bit set to be escaped, or otherwise to handle this at the encoding/serialization level rather than in the stylesheet.

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.  
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk

Reply via email to