A real world example...
1) An XML text node starts out life as the following:
These tight elastic stockings keep blood from staying in the legs and
causing clots. The stockings are also called Ted HoseĀ® or Jobst
StockingsĀ®. These stockings can keep you from getting blood clots.
Note the character(s) immediately following 'Ted Hose' and 'Jobst
Stockings.' They may be unreadable in some editors, but they are reigstered
signs (/u8482)
2a) An XSL stylesheet is applied using xalan-j.
(from the manifest for our xalan.jar):
[Manifest-Version: 1.0
Created-By: 1.2.2 (Sun Microsystems Inc.)
Main-Class: org.apache.xalan.xslt.Process
Class-Path: jaxp.jar xerces.jar crimson.jar]
2b) A xalan transformer is applied using the following code:
DOMSource src = new DOMSource(node); // text node is within node's hierarchy
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
FileOutputStream = new FileOutputStream("some_file.html");
OutputStreamWriter writer = OutputStreamWriter(fostream, "latin1");
StreamResult result = new StreamResult(writer);
transformer.transform(src, result);
3) The resulting html document ends up as the following:
These tight elastic stockings keep blood from staying in the legs and
causing clots. The stockings are also called Ted Hose® or Jobst
Stockings®. These stockings can keep you from getting blood clots.
Note that two types of entity references are generated in the conversion.
Both are legitimate entities for some HTML versions. The unusual aspect is
the seemingly random nature of the conversion. This is only one example we
have encountered. We have seen the same anomolous results with other
entities listed below.
® -> ®
Á -> Á
É -> É
Í -> Í
&#;211 -> Ó
Ú -> Ú
á -> á
é -> é
í -> &icaute;
ñ -> ñ
ó -> ó
ú -> ú
ü -> ü
Steve Ogden
(303) 486-9069
Micromedex
6200 S. Syracuse Way, #300
Greenwood Village, CO 80111