This is most likely because the document was read
in using an incorrect code page.  Java will represent
characters that are not valid Unicode characters.
If you have such a character in your document, and
try to serialize the CAS to XML, you will see the
error you list below.  However, getting a non-Unicode
character into you in-memory document is very likely
because it wasn't read in with the correct code
page.

Make sure that when you run your documents in DocumentAnalyzer,
you use the same code page as outside.

--Thilo

SAITO, Isao Isaac wrote:
> Hi all,
> 
> Could anyone give me a help for the trouble below?
> 
> Thanks in adv,
>  Isaac
> 
> <phenomenon>
> - While I am using Document Analyzer and running analysis for japanese
> document, I catch an exception shown on list-1 and the process halts
> - english docs are processed properly
> - Both from eclipse and documentAnalyzer.bat in %UIMA_HOME%/bin, same result
> - When I run the component I developed without Document Analyzer, even
> japanese docs are processed properly
> 
> - result of system.out using VM Argument "-Djaxp.debug=1" is on list-2
>  (I did this because I referred to the following thread:
>    http://www.mail-archive.com/[email protected]/msg00810.html)
> 
> 
> <environment>
> - uimaj-2.2.0-incubating
> - jre1.5.0_14
> - windows xp pro sp2
> 
> 
> 
> <list-1>
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>       at 
> org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:101)
>       at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:371)
>       at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:291)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:556)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:398)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:331)
>       at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:261)
>       at 
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:217)
>       at 
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1221)
>       at 
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
> Caused by: org.apache.uima.resource.ResourceProcessException
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:122)
>       at 
> org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:99)
>       ... 9 more
> Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML
> 1.0 character: , 0x2
>       at 
> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:235)
>       at 
> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:155)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:839)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:592)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:537)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:221)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$500(XmiCasSerializer.java:99)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1324)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1304)
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.writeXmi(XmiWriterCasConsumer.java:146)
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:118)
>       ... 10 more
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>       at 
> org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:101)
>       at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:371)
>       at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:291)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:556)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:398)
>       at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:331)
>       at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:261)
>       at 
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:217)
>       at 
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1221)
>       at 
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
> Caused by: org.apache.uima.resource.ResourceProcessException
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:122)
>       at 
> org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:99)
>       ... 9 more
> Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML
> 1.0 character: , 0x2
>       at 
> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:235)
>       at 
> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:155)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:839)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:592)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:537)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:221)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$500(XmiCasSerializer.java:99)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1324)
>       at 
> org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1304)
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.writeXmi(XmiWriterCasConsumer.java:146)
>       at 
> org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:118)
>       ... 10 more
> </list-1>
> 
> <list-2>
> (please refer to the URL below because the list is too long for posting on ML)
> http://bruch.sfc.keio.ac.jp:5130/dsweb/Get/Document-18049/stacktrace_uima-saxparceexception_2007dec18.txt
> </list-2>
> 
> - ---   - ---   - ---   - ---   - ---   - ---   - ---
>  (Mr.) SAITO, Isao Isaac / [EMAIL PROTECTED]
>     Research assistant
>   Keio University, DMC Research Institute
>    http://www.dmc.keio.ac.jp/en/
> - ---   - ---   - ---   - ---   - ---   - ---   - ---

Reply via email to