Thilo,
With your help I solved my problem.
On Document Analyzer's UI, Changing "Character Encoding" to the
certain code("Shift_JIS", for this time) had taken effect.
Thanks and I'm sorry for such an easy mistake of me
because it seems the phenomenon I have seen is a feature of Document
Analyzer(or UIMA Framework?)...
Regards,
Isaac
On Dec 18, 2007 9:20 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:
> This is most likely because the document was read
> in using an incorrect code page. Java will represent
> characters that are not valid Unicode characters.
> If you have such a character in your document, and
> try to serialize the CAS to XML, you will see the
> error you list below. However, getting a non-Unicode
> character into you in-memory document is very likely
> because it wasn't read in with the correct code
> page.
>
> Make sure that when you run your documents in DocumentAnalyzer,
> you use the same code page as outside.
>
> --Thilo
>
>
> SAITO, Isao Isaac wrote:
> > Hi all,
> >
> > Could anyone give me a help for the trouble below?
> >
> > Thanks in adv,
> > Isaac
> >
> > <phenomenon>
> > - While I am using Document Analyzer and running analysis for japanese
> > document, I catch an exception shown on list-1 and the process halts
> > - english docs are processed properly
> > - Both from eclipse and documentAnalyzer.bat in %UIMA_HOME%/bin, same result
> > - When I run the component I developed without Document Analyzer, even
> > japanese docs are processed properly
> >
> > - result of system.out using VM Argument "-Djaxp.debug=1" is on list-2
> > (I did this because I referred to the following thread:
> > http://www.mail-archive.com/[email protected]/msg00810.html)
> >
> >
> > <environment>
> > - uimaj-2.2.0-incubating
> > - jre1.5.0_14
> > - windows xp pro sp2
> >
> >
> >
> > <list-1>
> > org.apache.uima.analysis_engine.AnalysisEngineProcessException
> > at
> > org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:101)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:371)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:291)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:556)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:398)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:331)
> > at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:261)
> > at
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:217)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1221)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
> > Caused by: org.apache.uima.resource.ResourceProcessException
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:122)
> > at
> > org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:99)
> > ... 9 more
> > Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML
> > 1.0 character: , 0x2
> > at
> > org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:235)
> > at
> > org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:155)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:839)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:592)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:537)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:221)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$500(XmiCasSerializer.java:99)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1324)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1304)
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.writeXmi(XmiWriterCasConsumer.java:146)
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:118)
> > ... 10 more
> > org.apache.uima.analysis_engine.AnalysisEngineProcessException
> > at
> > org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:101)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:371)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:291)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:556)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:398)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:331)
> > at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:261)
> > at
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:217)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:1221)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:668)
> > Caused by: org.apache.uima.resource.ResourceProcessException
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:122)
> > at
> > org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:99)
> > ... 9 more
> > Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML
> > 1.0 character: , 0x2
> > at
> > org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:235)
> > at
> > org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:155)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:839)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:592)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:537)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:221)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$500(XmiCasSerializer.java:99)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1324)
> > at
> > org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1304)
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.writeXmi(XmiWriterCasConsumer.java:146)
> > at
> > org.apache.uima.tools.components.XmiWriterCasConsumer.processCas(XmiWriterCasConsumer.java:118)
> > ... 10 more
> > </list-1>
> >
> > <list-2>
> > (please refer to the URL below because the list is too long for posting on
> > ML)
> > http://bruch.sfc.keio.ac.jp:5130/dsweb/Get/Document-18049/stacktrace_uima-saxparceexception_2007dec18.txt
> > </list-2>
> >
> > - --- - --- - --- - --- - --- - --- - ---
> > (Mr.) SAITO, Isao Isaac / [EMAIL PROTECTED]
> > Research assistant
> > Keio University, DMC Research Institute
> > http://www.dmc.keio.ac.jp/en/
> > - --- - --- - --- - --- - --- - --- - ---
>