[ https://issues.apache.org/activemq/browse/SM-414?page=comments#action_36106 ]
Juergen Mayrbaeurl commented on SM-414: --------------------------------------- Since my Eclipse setup is getting better and better, I think I could provide a patch for the SourceTransformer class soon. Anyone interested? Kind regards Juergen > SourceTransformer cant transform to DOM with non US ASCII characters like 'ä' > or 'ü' > ------------------------------------------------------------------------------------ > > Key: SM-414 > URL: https://issues.apache.org/activemq/browse/SM-414 > Project: ServiceMix > Type: Bug > Components: servicemix-core > Versions: 3.0-M1, 3.0-M2, 3.0, incubation > Environment: W2K, J2SE 1.4.2, Xerces 2.7.1, default locale of OS with > character set 'windows-1252' > Reporter: Juergen Mayrbaeurl > Priority: Blocker > Fix For: 3.0, incubation > Attachments: SampleInMessage.xml, SourceTransformer-sources.zip, > SourceTransformerTest_patch.txt > > > The class org.apache.servicemix.jbi.jaxp.SourceTransformer, which belongs to > the core classes of ServiceMix and is used very often, has major problems > transforming Source to DOM data structures, when the source contains non > US-ASCII charactes like 'ä' or 'ü'. > The class uses DocumentBuilders (see method 'public DOMSource > toDOMSourceFromStream(StreamSource source) throws > ParserConfigurationException, IOException, SAXException') for the > transformation and uses the method 'public Document parse(InputStream is, > String systemId) throws SAXException, IOException' without explicitly telling > the DocumentBuilder the character encoding it should use. This results in > fatal errors (exceptions) returned by the DocumentBuilder (Xerces 2.7.1), > because it encounters invalid character code sequences (especially with UTF-8 > and multi-byte characters like 'ä' or 'ö'). This means that you can't use non > US-ASCII characters in messages, as soon as ServiceMix uses an instance of > the class SourceTransformer to do any transformation to DOM. This is the case > when tracing messages in the DeliveryChannel or evaluating an XPath > expression for e.g. Content based routing. > The solution to this problem is straight forward: Tell the DocumentBuilder > the character encoding it has to use. Looks like: > public DOMSource toDOMSourceFromStream(StreamSource source) throws > ParserConfigurationException, IOException, > SAXException { > DocumentBuilder builder = createDocumentBuilder(); > String systemId = source.getSystemId(); > Document document = null; > InputStream inputStream = source.getInputStream(); > if (inputStream != null) { > InputSource inputsource = new InputSource(inputStream); > inputsource.setSystemId(systemId); > inputsource.setEncoding(defaultCharEncodingName); // <-- Very > important > > document = builder.parse(inputsource); > } > else { > Reader reader = source.getReader(); > if (reader != null) { > document = builder.parse(new InputSource(reader)); > } > else { > throw new IOException("No input stream or reader available"); > } > } > return new DOMSource(document, systemId); > } > I've attached the original source file of SourceTransformer (3.0 SNAPSHOT, > 2006-04-20) and the changed (Unfortunately I can't create a real patch). > Kind regards > Juergen -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/activemq/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira