[ 
https://issues.apache.org/activemq/browse/SM-414?page=comments#action_36106 ] 

Juergen Mayrbaeurl commented on SM-414:
---------------------------------------

Since my Eclipse setup is getting better and better, I think I could provide a 
patch for the SourceTransformer class soon. Anyone interested?

Kind regards
Juergen

> SourceTransformer cant transform to DOM with non US ASCII characters like 'ä' 
> or 'ü'
> ------------------------------------------------------------------------------------
>
>          Key: SM-414
>          URL: https://issues.apache.org/activemq/browse/SM-414
>      Project: ServiceMix
>         Type: Bug

>   Components: servicemix-core
>     Versions: 3.0-M1, 3.0-M2, 3.0, incubation
>  Environment: W2K, J2SE 1.4.2, Xerces 2.7.1, default locale of OS with 
> character set 'windows-1252'
>     Reporter: Juergen Mayrbaeurl
>     Priority: Blocker
>      Fix For: 3.0, incubation
>  Attachments: SampleInMessage.xml, SourceTransformer-sources.zip, 
> SourceTransformerTest_patch.txt
>
>
> The class org.apache.servicemix.jbi.jaxp.SourceTransformer, which belongs to 
> the core classes of ServiceMix and is used very often, has major problems 
> transforming Source to DOM data structures, when the source contains non 
> US-ASCII charactes like 'ä' or 'ü'. 
> The class uses DocumentBuilders (see method 'public DOMSource 
> toDOMSourceFromStream(StreamSource source) throws 
> ParserConfigurationException, IOException, SAXException') for the 
> transformation and uses the method 'public Document parse(InputStream is, 
> String systemId) throws SAXException, IOException' without explicitly telling 
> the DocumentBuilder the character encoding it should use. This results in 
> fatal errors (exceptions) returned by the DocumentBuilder (Xerces 2.7.1), 
> because it encounters invalid character code sequences (especially with UTF-8 
> and multi-byte characters like 'ä' or 'ö'). This means that you can't use non 
> US-ASCII characters in messages, as soon as ServiceMix uses an instance of 
> the class SourceTransformer to do any transformation to DOM. This is the case 
> when tracing messages in the DeliveryChannel or evaluating an XPath 
> expression for e.g. Content based routing. 
> The solution to this problem is straight forward: Tell the DocumentBuilder 
> the character encoding it has to use. Looks like:
>     public DOMSource toDOMSourceFromStream(StreamSource source) throws 
> ParserConfigurationException, IOException,
>             SAXException {
>         DocumentBuilder builder = createDocumentBuilder();
>         String systemId = source.getSystemId();
>         Document document = null;
>         InputStream inputStream = source.getInputStream();
>         if (inputStream != null) {
>             InputSource inputsource = new InputSource(inputStream);
>             inputsource.setSystemId(systemId);
>             inputsource.setEncoding(defaultCharEncodingName);  // <-- Very 
> important
>             
>             document = builder.parse(inputsource);
>         }
>         else {
>             Reader reader = source.getReader();
>             if (reader != null) {
>                 document = builder.parse(new InputSource(reader));
>             }
>             else {
>                 throw new IOException("No input stream or reader available");
>             }
>         }
>         return new DOMSource(document, systemId);
>     }
> I've attached the original source file of SourceTransformer (3.0 SNAPSHOT, 
> 2006-04-20) and the changed (Unfortunately I can't create a real patch).
> Kind regards
> Juergen

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to