Axiom/Woodstox is of course able to handle special characters,
provided that it gets the right information about the charset encoding
of the message. To me this looks like the Content-Type header doesn't
contain the correct charset encoding.

Andreas

On Tue, Feb 14, 2012 at 17:41, Hiranya Jayathilaka <[email protected]> wrote:
> Hi Matthew,
>
> We use Axiom as the underlying XML infoset. AFAIK it usually works well
> with special characters. Not sure why it cannot handle this pound sign. May
> be Andreas, can shed some light on the matter? Actually in this case the
> exception is thrown by the Woodstox parser which is at a layer lower than
> Axiom. So this could be a Woodstox issue.
>
> However if the underlying XML parser cannot handle this payload, then I
> don't think any of our built-in utils will be able to parse it without
> throwing an error. So your best option is to serialize this into a string
> buffer or a byte buffer and run the necessary replacement operations.
> Anyway lets wait and see what others have to say.
>
> Thanks,
> Hiranya
>
> On Tue, Feb 14, 2012 at 7:55 PM, Matthew Clark
> <[email protected]>wrote:
>
>> Sure - the service I'm looking at right now is very simple - the input just
>> looks like this:
>>
>> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
>>   <function>findOrderByReference</function>
>>   <args>
>>       <arg id="1">SomeRef123</arg>
>>   </args>
>> </oxxml>
>>
>> The response then looks like this (i've removed a large chunk of it":
>>
>> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
>>   <response function="findOrderByReference" uuid="4444-4444-4444-4444">
>>        <matches count="1">
>>            <order id="1234567">
>>               <description>Some description including a £ (pound)
>> sign</description>
>>            </order>
>>       </matches>
>>   </response>
>> </oxxml>
>>
>> The pound sign causes StAX to throw an exception.. so I'd like to replace
>> it as follows:
>>
>> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/";>
>>   <response function="findOrderByReference" txn-uuid="4444-4444-4444-4444">
>>        <matches count="1">
>>            <order id="1234567">
>>               <description>Some description including a &#163;
>> (ampersandhash163;)
>> sign</description>
>>            </order>
>>       </matches>
>>   </response>
>> </oxxml>
>>
>>
>> On 14 February 2012 13:16, Hiranya Jayathilaka <[email protected]>
>> wrote:
>>
>> > On Tue, Feb 14, 2012 at 5:01 PM, Matthew Clark
>> > <[email protected]>wrote:
>> >
>> > > Hi thanks for that - for some reason I had overlooked the message
>> > > builders..
>> > >
>> > > I have a rudimentary version of this working now but given the various
>> > > classes available (XMLStreamReader, StAXbuilder and so on), what would
>> be
>> > > the most efficient way to do the replacement?
>> > >
>> >
>> > If the input byte stream contains invalid characters then I don't think
>> you
>> > can use any of the above classes to process your inputs.
>> >
>> >
>> > >
>> > > I have about 40 characters (such as the pound sign) that I would like
>> to
>> > > replace with entity references... For the first version, I simply
>> > converted
>> > > to a string used StringUtils.replaceEach() but this is obviously not
>> > > ideal..
>> > >
>> >
>> > Can you please share an input message and a preprocessed message for us
>> to
>> > get a better understanding of your requirement?
>> >
>> > Thanks,
>> > Hiranya
>> >
>> >
>> > >
>> > >
>> > > On 14 February 2012 04:32, Hiranya Jayathilaka <[email protected]>
>> > > wrote:
>> > >
>> > > > Hi Mark,
>> > > >
>> > > > If you want to preprocess the responses then I'd recommend you to
>> > write a
>> > > > custom message builder. You can register the custom message builder
>> in
>> > > the
>> > > > axis2.xml file against the content type of your responses. There you
>> > will
>> > > > be able to include any custom logic along with code for handling
>> > invalid
>> > > > characters in the payload.
>> > > >
>> > > > Here are some useful resources I found on the web:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> http://charithwiki.blogspot.com/2010/11/how-to-write-axis2-message-builder.html
>> > > >
>> > > >
>> > >
>> >
>> http://wso2.org/library/articles/axis2-configuration-part2-learning-axis2-xml
>> > > >
>> > > > Thanks,
>> > > > Hiranya
>> > > >
>> > > > On Tue, Feb 14, 2012 at 4:34 AM, Matthew Clark
>> > > > <[email protected]>wrote:
>> > > >
>> > > > > Hi all, I'd really appreciate some help with this one... it's
>> hurting
>> > > my
>> > > > > brain!
>> > > > >
>> > > > > We have a legacy service that I would like to include in some of
>> our
>> > > ESB
>> > > > > operations.
>> > > > > The legacy service uses XML for both request and response payloads
>> > > making
>> > > > > it a very easy integration.
>> > > > >
>> > > > > I've created a very simple proxy service (see below).
>> > > > >
>> > > > > The problem I am having is that the legacy service can return some
>> > > > invalid
>> > > > > characters and is causing the stax parser to blow up in such a way
>> > > that I
>> > > > > can't even handle it gracefully with a fault sequence.  I'd really
>> > like
>> > > > to
>> > > > > pre-process the responses (before they are parsed/built) as 99% of
>> > the
>> > > > time
>> > > > > it is simply a case of replacing characters with numeric character
>> > > > > references or character entity references..
>> > > > >
>> > > > > We are unable to modify the legacy service to remove these
>> erroneous
>> > > > > responses.
>> > > > >
>> > > > > Heres the proxy config (I said it was simple!!) followed by the
>> > > Exception
>> > > > > thrown...  The exception causes the service to hang and the fault
>> > > > sequence
>> > > > > is only entered after a 60 second timeout.
>> > > > >
>> > > > > <proxy xmlns="http://ws.apache.org/ns/synapse";
>> name="legacyservice"
>> > > > > transports="http" startOnLoad="true">
>> > > > >
>> > > > >   <target endpoint="legacyXMLReceiver">
>> > > > >
>> > > > >      <inSequence>
>> > > > >
>> > > > >         <log level="full">
>> > > > >
>> > > > >            <property name="MESSAGE" value="InSequence" />
>> > > > >
>> > > > >         </log>
>> > > > >
>> > > > >      </inSequence>
>> > > > >
>> > > > >      <outSequence>
>> > > > >
>> > > > >         <log level="full">
>> > > > >
>> > > > >            <property name="MESSAGE" value="OutSequence" />
>> > > > >
>> > > > >         </log>
>> > > > >
>> > > > >            <send />
>> > > > >
>> > > > >         </outSequence>
>> > > > >
>> > > > >         <faultSequence>
>> > > > >
>> > > > >            <makefault version="soap11">
>> > > > >
>> > > > >               <code xmlns:soap11Env="
>> > > > > http://schemas.xmlsoap.org/soap/envelope/";
>> value="soap11Env:Server"
>> > />
>> > > > >
>> > > > >               <reason expression="get-property('ERROR_MESSAGE')" />
>> > > > >
>> > > > >               <role />
>> > > > >
>> > > > >            </makefault>
>> > > > >
>> > > > >            <log level="full">
>> > > > >
>> > > > >               <property name="MESSAGE" value="FaultSequence" />
>> > > > >
>> > > > >            </log>
>> > > > >
>> > > > >            <property name="HTTP_SC" value="500" scope="axis2" />
>> > > > >
>> > > > >            <send />
>> > > > >
>> > > > >         </faultSequence>
>> > > > >
>> > > > >      </target>
>> > > > >
>> > > > >   </proxy>
>> > > > >
>> > > > >
>> > > > > <endpoint xmlns="http://ws.apache.org/ns/synapse";
>> > > > > name="legacyXMLReceiver">
>> > > > >
>> > > > >   <address uri="http://a.b.c.d:8080/legacyService/LegacyServlet";
>> > > > > format="pox" >
>> > > > >
>> > > > >   </address>
>> > > > >
>> > > > > </endpoint>
>> > > > >
>> > > > >
>> > > > > ERROR {org.apache.axis2.transport.base.threads.NativeWorkerPool} -
>> > > > >  Uncaught exception
>> > > > > {org.apache.axis2.transport.base.threads.NativeWorkerPool}
>> > > > > *org.apache.axiom.om.OMException: com.ctc.wstx.exc.WstxIOException:
>> > > > Invalid
>> > > > > UTF-8 middle byte 0x3c (at char #714, byte #127)*
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:296)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.buildNext(OMElementImpl.java:653)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMNodeImpl.getNextOMSibling(OMNodeImpl.java:122)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java:343)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.traverse.OMChildrenIterator.getNextNode(OMChildrenIterator.java:36)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.traverse.OMAbstractIterator.hasNext(OMAbstractIterator.java:58)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:555)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(SOAPEnvelopeImpl.java:230)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:125)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:113)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.llom.OMElementImpl.toString(OMElementImpl.java:988)
>> > > > > at java.lang.String.valueOf(String.java:2826)
>> > > > > at java.lang.StringBuffer.append(StringBuffer.java:219)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.mediators.builtin.LogMediator.getFullLogMessage(LogMediator.java:184)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.mediators.builtin.LogMediator.getLogMessage(LogMediator.java:123)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.mediators.builtin.LogMediator.mediate(LogMediator.java:91)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:60)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:114)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:229)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:370)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:160)
>> > > > > at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:181)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.synapse.transport.nhttp.ClientWorker.run(ClientWorker.java:275)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:173)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > > > at java.lang.Thread.run(Thread.java:680)
>> > > > > *Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle
>> > byte
>> > > > > 0x3c (at char #714, byte #127)*
>> > > > > at
>> com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
>> > > > > at
>> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.util.stax.dialect.DisallowDoctypeDeclStreamReaderWrapper.next(DisallowDoctypeDeclStreamReaderWrapper.java:34)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.builder.StAXOMBuilder.parserNext(StAXOMBuilder.java:681)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:214)
>> > > > > ... 31 more
>> > > > > *Caused by: java.io.CharConversionException: Invalid UTF-8 middle
>> > byte
>> > > > 0x3c
>> > > > > (at char #714, byte #127)*
>> > > > > at
>> com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
>> > > > > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
>> > > > > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
>> > > > > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1046)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1053)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.StreamScanner.getNextInCurrAfterWS(StreamScanner.java:892)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:2963)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
>> > > > > at
>> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Hiranya Jayathilaka
>> > > > Associate Technical Lead;
>> > > > WSO2 Inc.;  http://wso2.org
>> > > > E-mail: [email protected];  Mobile: +94 77 633 3491
>> > > > Blog: http://techfeast-hiranya.blogspot.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Hiranya Jayathilaka
>> > Associate Technical Lead;
>> > WSO2 Inc.;  http://wso2.org
>> > E-mail: [email protected];  Mobile: +94 77 633 3491
>> > Blog: http://techfeast-hiranya.blogspot.com
>> >
>>
>
>
>
> --
> Hiranya Jayathilaka
> Associate Technical Lead;
> WSO2 Inc.;  http://wso2.org
> E-mail: [email protected];  Mobile: +94 77 633 3491
> Blog: http://techfeast-hiranya.blogspot.com

Reply via email to