Hi - yes sorry I should have been more specific - the invalid characters I mentioned are invalid for the Content-Type set by the response headers. We're told to expect UTF-8 yet these characters are from the Latin character set. Woodstox is justified in throwing an Exception so I really just want to make sure we're extracting the illegal characters in the most efficient way.
The easiest/best solution would be to fix the legacy service but we unfortunately don't have that as an option at this point.. sadly I think that because this is a content type issue, serialize to string, extract/replace and deserialize is probably our best option. Kind Regards Matthew On 14 February 2012 18:42, Andreas Veithen <[email protected]>wrote: > Axiom/Woodstox is of course able to handle special characters, > provided that it gets the right information about the charset encoding > of the message. To me this looks like the Content-Type header doesn't > contain the correct charset encoding. > > Andreas > > On Tue, Feb 14, 2012 at 17:41, Hiranya Jayathilaka <[email protected]> > wrote: > > Hi Matthew, > > > > We use Axiom as the underlying XML infoset. AFAIK it usually works well > > with special characters. Not sure why it cannot handle this pound sign. > May > > be Andreas, can shed some light on the matter? Actually in this case the > > exception is thrown by the Woodstox parser which is at a layer lower than > > Axiom. So this could be a Woodstox issue. > > > > However if the underlying XML parser cannot handle this payload, then I > > don't think any of our built-in utils will be able to parse it without > > throwing an error. So your best option is to serialize this into a string > > buffer or a byte buffer and run the necessary replacement operations. > > Anyway lets wait and see what others have to say. > > > > Thanks, > > Hiranya > > > > On Tue, Feb 14, 2012 at 7:55 PM, Matthew Clark > > <[email protected]>wrote: > > > >> Sure - the service I'm looking at right now is very simple - the input > just > >> looks like this: > >> > >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/"> > >> <function>findOrderByReference</function> > >> <args> > >> <arg id="1">SomeRef123</arg> > >> </args> > >> </oxxml> > >> > >> The response then looks like this (i've removed a large chunk of it": > >> > >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/"> > >> <response function="findOrderByReference" uuid="4444-4444-4444-4444"> > >> <matches count="1"> > >> <order id="1234567"> > >> <description>Some description including a £ (pound) > >> sign</description> > >> </order> > >> </matches> > >> </response> > >> </oxxml> > >> > >> The pound sign causes StAX to throw an exception.. so I'd like to > replace > >> it as follows: > >> > >> <oxxml version="1.0" xmlns="http://xyz.com/xmlapi/"> > >> <response function="findOrderByReference" > txn-uuid="4444-4444-4444-4444"> > >> <matches count="1"> > >> <order id="1234567"> > >> <description>Some description including a £ > >> (ampersandhash163;) > >> sign</description> > >> </order> > >> </matches> > >> </response> > >> </oxxml> > >> > >> > >> On 14 February 2012 13:16, Hiranya Jayathilaka <[email protected]> > >> wrote: > >> > >> > On Tue, Feb 14, 2012 at 5:01 PM, Matthew Clark > >> > <[email protected]>wrote: > >> > > >> > > Hi thanks for that - for some reason I had overlooked the message > >> > > builders.. > >> > > > >> > > I have a rudimentary version of this working now but given the > various > >> > > classes available (XMLStreamReader, StAXbuilder and so on), what > would > >> be > >> > > the most efficient way to do the replacement? > >> > > > >> > > >> > If the input byte stream contains invalid characters then I don't > think > >> you > >> > can use any of the above classes to process your inputs. > >> > > >> > > >> > > > >> > > I have about 40 characters (such as the pound sign) that I would > like > >> to > >> > > replace with entity references... For the first version, I simply > >> > converted > >> > > to a string used StringUtils.replaceEach() but this is obviously not > >> > > ideal.. > >> > > > >> > > >> > Can you please share an input message and a preprocessed message for > us > >> to > >> > get a better understanding of your requirement? > >> > > >> > Thanks, > >> > Hiranya > >> > > >> > > >> > > > >> > > > >> > > On 14 February 2012 04:32, Hiranya Jayathilaka < > [email protected]> > >> > > wrote: > >> > > > >> > > > Hi Mark, > >> > > > > >> > > > If you want to preprocess the responses then I'd recommend you to > >> > write a > >> > > > custom message builder. You can register the custom message > builder > >> in > >> > > the > >> > > > axis2.xml file against the content type of your responses. There > you > >> > will > >> > > > be able to include any custom logic along with code for handling > >> > invalid > >> > > > characters in the payload. > >> > > > > >> > > > Here are some useful resources I found on the web: > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > http://charithwiki.blogspot.com/2010/11/how-to-write-axis2-message-builder.html > >> > > > > >> > > > > >> > > > >> > > >> > http://wso2.org/library/articles/axis2-configuration-part2-learning-axis2-xml > >> > > > > >> > > > Thanks, > >> > > > Hiranya > >> > > > > >> > > > On Tue, Feb 14, 2012 at 4:34 AM, Matthew Clark > >> > > > <[email protected]>wrote: > >> > > > > >> > > > > Hi all, I'd really appreciate some help with this one... it's > >> hurting > >> > > my > >> > > > > brain! > >> > > > > > >> > > > > We have a legacy service that I would like to include in some of > >> our > >> > > ESB > >> > > > > operations. > >> > > > > The legacy service uses XML for both request and response > payloads > >> > > making > >> > > > > it a very easy integration. > >> > > > > > >> > > > > I've created a very simple proxy service (see below). > >> > > > > > >> > > > > The problem I am having is that the legacy service can return > some > >> > > > invalid > >> > > > > characters and is causing the stax parser to blow up in such a > way > >> > > that I > >> > > > > can't even handle it gracefully with a fault sequence. I'd > really > >> > like > >> > > > to > >> > > > > pre-process the responses (before they are parsed/built) as 99% > of > >> > the > >> > > > time > >> > > > > it is simply a case of replacing characters with numeric > character > >> > > > > references or character entity references.. > >> > > > > > >> > > > > We are unable to modify the legacy service to remove these > >> erroneous > >> > > > > responses. > >> > > > > > >> > > > > Heres the proxy config (I said it was simple!!) followed by the > >> > > Exception > >> > > > > thrown... The exception causes the service to hang and the > fault > >> > > > sequence > >> > > > > is only entered after a 60 second timeout. > >> > > > > > >> > > > > <proxy xmlns="http://ws.apache.org/ns/synapse" > >> name="legacyservice" > >> > > > > transports="http" startOnLoad="true"> > >> > > > > > >> > > > > <target endpoint="legacyXMLReceiver"> > >> > > > > > >> > > > > <inSequence> > >> > > > > > >> > > > > <log level="full"> > >> > > > > > >> > > > > <property name="MESSAGE" value="InSequence" /> > >> > > > > > >> > > > > </log> > >> > > > > > >> > > > > </inSequence> > >> > > > > > >> > > > > <outSequence> > >> > > > > > >> > > > > <log level="full"> > >> > > > > > >> > > > > <property name="MESSAGE" value="OutSequence" /> > >> > > > > > >> > > > > </log> > >> > > > > > >> > > > > <send /> > >> > > > > > >> > > > > </outSequence> > >> > > > > > >> > > > > <faultSequence> > >> > > > > > >> > > > > <makefault version="soap11"> > >> > > > > > >> > > > > <code xmlns:soap11Env=" > >> > > > > http://schemas.xmlsoap.org/soap/envelope/" > >> value="soap11Env:Server" > >> > /> > >> > > > > > >> > > > > <reason > expression="get-property('ERROR_MESSAGE')" /> > >> > > > > > >> > > > > <role /> > >> > > > > > >> > > > > </makefault> > >> > > > > > >> > > > > <log level="full"> > >> > > > > > >> > > > > <property name="MESSAGE" value="FaultSequence" /> > >> > > > > > >> > > > > </log> > >> > > > > > >> > > > > <property name="HTTP_SC" value="500" scope="axis2" /> > >> > > > > > >> > > > > <send /> > >> > > > > > >> > > > > </faultSequence> > >> > > > > > >> > > > > </target> > >> > > > > > >> > > > > </proxy> > >> > > > > > >> > > > > > >> > > > > <endpoint xmlns="http://ws.apache.org/ns/synapse" > >> > > > > name="legacyXMLReceiver"> > >> > > > > > >> > > > > <address uri="http://a.b.c.d:8080/legacyService/LegacyServlet > " > >> > > > > format="pox" > > >> > > > > > >> > > > > </address> > >> > > > > > >> > > > > </endpoint> > >> > > > > > >> > > > > > >> > > > > ERROR > {org.apache.axis2.transport.base.threads.NativeWorkerPool} - > >> > > > > Uncaught exception > >> > > > > {org.apache.axis2.transport.base.threads.NativeWorkerPool} > >> > > > > *org.apache.axiom.om.OMException: > com.ctc.wstx.exc.WstxIOException: > >> > > > Invalid > >> > > > > UTF-8 middle byte 0x3c (at char #714, byte #127)* > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:296) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.buildNext(OMElementImpl.java:653) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMNodeImpl.getNextOMSibling(OMNodeImpl.java:122) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java:343) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.traverse.OMChildrenIterator.getNextNode(OMChildrenIterator.java:36) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.traverse.OMAbstractIterator.hasNext(OMAbstractIterator.java:58) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:555) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:875) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(SOAPEnvelopeImpl.java:230) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:125) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMSerializableImpl.serialize(OMSerializableImpl.java:113) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.llom.OMElementImpl.toString(OMElementImpl.java:988) > >> > > > > at java.lang.String.valueOf(String.java:2826) > >> > > > > at java.lang.StringBuffer.append(StringBuffer.java:219) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.mediators.builtin.LogMediator.getFullLogMessage(LogMediator.java:184) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.mediators.builtin.LogMediator.getLogMessage(LogMediator.java:123) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.mediators.builtin.LogMediator.mediate(LogMediator.java:91) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:60) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:114) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:229) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:370) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:160) > >> > > > > at > org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:181) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.synapse.transport.nhttp.ClientWorker.run(ClientWorker.java:275) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:173) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> > > > > at java.lang.Thread.run(Thread.java:680) > >> > > > > *Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 > middle > >> > byte > >> > > > > 0x3c (at char #714, byte #127)* > >> > > > > at > >> com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708) > >> > > > > at > >> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.util.stax.dialect.DisallowDoctypeDeclStreamReaderWrapper.next(DisallowDoctypeDeclStreamReaderWrapper.java:34) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.util.stax.wrapper.XMLStreamReaderWrapper.next(XMLStreamReaderWrapper.java:225) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.builder.StAXOMBuilder.parserNext(StAXOMBuilder.java:681) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:214) > >> > > > > ... 31 more > >> > > > > *Caused by: java.io.CharConversionException: Invalid UTF-8 > middle > >> > byte > >> > > > 0x3c > >> > > > > (at char #714, byte #127)* > >> > > > > at > >> com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313) > >> > > > > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204) > >> > > > > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) > >> > > > > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1046) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1053) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.StreamScanner.getNextInCurrAfterWS(StreamScanner.java:892) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:2963) > >> > > > > at > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848) > >> > > > > at > >> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Hiranya Jayathilaka > >> > > > Associate Technical Lead; > >> > > > WSO2 Inc.; http://wso2.org > >> > > > E-mail: [email protected]; Mobile: +94 77 633 3491 > >> > > > Blog: http://techfeast-hiranya.blogspot.com > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Hiranya Jayathilaka > >> > Associate Technical Lead; > >> > WSO2 Inc.; http://wso2.org > >> > E-mail: [email protected]; Mobile: +94 77 633 3491 > >> > Blog: http://techfeast-hiranya.blogspot.com > >> > > >> > > > > > > > > -- > > Hiranya Jayathilaka > > Associate Technical Lead; > > WSO2 Inc.; http://wso2.org > > E-mail: [email protected]; Mobile: +94 77 633 3491 > > Blog: http://techfeast-hiranya.blogspot.com >
