Hey folks,
I've got a problem with the UIMA SimpleServer[1][5] not being able to
correctly run an aggregate analysis engine[6]. The aggregate AE works as
expected however when I test it with the "UIMA CAS Visual Debugger", the
"UIMA Run AE" and the "UIMA Document Analyzer".
The analysis engine[2] is relatively simple (as of yet). It is composed
of the following components:
AE PDF Text Extractor[3]
:: gets a URL as the "initial view" and downloads
the file, extracts the text and puts it in a new
view by the name of "extractedText".
-> Input Sofa: urlString
-> Output Sofa: extractedText
AE Email Annotator[4]
:: simple annotator, just annotates email addresses.
When I run the aggregate analysis engine, it terminates before giving
any results with an error (taken from the Tomcat log file):
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
...
Caused by: org.apache.uima.cas.CASRuntimeException:
No sofaFS with name plainText found.
...
"plainText" is the Sofa in the aggregate analysis engine which is linked
to the output of the PDF Text Extractor "extractedText".
I took the aggregate analysis engine apart piece by piece, and I started
with the Email Annotator AE. That worked fine with the SimpleServer.
Then I tested the PDF Text Extractor (I changed the input view to
_InitialView). When I tested a URL, it came through as XML, but only
with the intial view and not with the extracted text. In fact, when
testing the text extractor otherwise, it would take around 3 seconds to
download the pdf file, while the SimpleServer sent back its results
immediately (so what is that all about? Does it not even run the code in
the function process()?).
That's my problem, and I wonder if there is something special you need
to do, when there are views or different output sofas. I can not for the
life of me figure out, what is wrong and why it does not work.
Thanks for your help,
Ben Morgan
_______________________________________________________________________________
1:
http://uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html
2: Aggregate AE Descriptor:
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/referenceAnnotatorDescriptor.xml
3: PDF Extractor descriptor:
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/PDFTextExtractorDescriptor.xml
PDF Extractor java source:
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/src/de/uniwue/informatik/bibrefext/pdf/TextExtractor.java
4: Email Annotator descriptor:
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/EmailAnnotatorDescriptor.xml
5: SimpleServer web.xml:
https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceWebService/WebContent/WEB-INF/web.xml
6: Complete WAR file:
https://github.com/downloads/cassava/bibrefext/bibrefext.war