Hey folks,

I've got a problem with the UIMA SimpleServer[1][5] not being able to correctly run an aggregate analysis engine[6]. The aggregate AE works as expected however when I test it with the "UIMA CAS Visual Debugger", the "UIMA Run AE" and the "UIMA Document Analyzer".

The analysis engine[2] is relatively simple (as of yet). It is composed of the following components:

    AE PDF Text Extractor[3]
        :: gets a URL as the "initial view" and downloads
           the file, extracts the text and puts it in a new
           view by the name of "extractedText".
        -> Input Sofa: urlString
        -> Output Sofa: extractedText
    AE Email Annotator[4]
        :: simple annotator, just annotates email addresses.

When I run the aggregate analysis engine, it terminates before giving any results with an error (taken from the Tomcat log file):

    SEVERE: Exception occurred
    org.apache.uima.analysis_engine.AnalysisEngineProcessException:
        Annotator processing failed.
    ...
    Caused by: org.apache.uima.cas.CASRuntimeException:
        No sofaFS with name plainText found.
    ...

"plainText" is the Sofa in the aggregate analysis engine which is linked to the output of the PDF Text Extractor "extractedText".

I took the aggregate analysis engine apart piece by piece, and I started with the Email Annotator AE. That worked fine with the SimpleServer.

Then I tested the PDF Text Extractor (I changed the input view to _InitialView). When I tested a URL, it came through as XML, but only with the intial view and not with the extracted text. In fact, when testing the text extractor otherwise, it would take around 3 seconds to download the pdf file, while the SimpleServer sent back its results immediately (so what is that all about? Does it not even run the code in the function process()?).

That's my problem, and I wonder if there is something special you need to do, when there are views or different output sofas. I can not for the life of me figure out, what is wrong and why it does not work.

Thanks for your help,
Ben Morgan

_______________________________________________________________________________

1: http://uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html

2: Aggregate AE Descriptor: https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/referenceAnnotatorDescriptor.xml

3: PDF Extractor descriptor: https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/PDFTextExtractorDescriptor.xml PDF Extractor java source: https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/src/de/uniwue/informatik/bibrefext/pdf/TextExtractor.java

4: Email Annotator descriptor: https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceAnnotator/desc/EmailAnnotatorDescriptor.xml

5: SimpleServer web.xml: https://github.com/cassava/bibrefext/blob/uima/UIMA/workspace/ReferenceWebService/WebContent/WEB-INF/web.xml

6: Complete WAR file: https://github.com/downloads/cassava/bibrefext/bibrefext.war

Reply via email to