Problem using Capabilities - OutputSofa

Christoph Büscher Wed, 04 Jun 2008 04:54:39 -0700

Hi,

I ignored the analysis engines "capabilities" section so far, but after I trieddeclaring an "outputSofa" for the first time, I ran into trouble using theanalysis engine in a CPE.

I have an AE that takes webpages in HTML format as input and removes theHTML-Tags etc... The result is stored in a new CAS view named "plainTextView".So far I didn't declare any capabilities in the AEs descriptor, but now I triedthis:


<capabilities>
      <capability>
        <inputs/>
        <outputs/>
        <outputSofas>
          <sofaName>plainTextView</sofaName>
        </outputSofas>
        <languagesSupported/>
      </capability>
</capabilities>

The AEs process() method usually acesses the default view of the JCas, does someprocessing and stores the result in the new view. The code goes something like this:


 // get the text from the default CAS view
 String originalText = jcas.getDocumentText();
 JCas plainTextView = null;

// Extract plain text from original document
documentWithoutHTML = someProcessing();

// create view for stripped HTML document
try {
   plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME);
   plainTextView.setDocumentText("plainTextView");
} catch (CASException e) {
   logger.warn(e.getMessage());
   throw new AnalysisEngineProcessException(e);
}

Using this AE in a CPE (inside an aggregate AE) was working until I declared theoutputSofa like described above. Now when trying to retrieve the original textfrom the default view with "jcas.getDocumentText()" always returns "null". Somedebugging shows that the reason for this is that in CASImpl.getSofaDataString()it appears this branch is used:


if (this == this.svd.baseCAS) {
      // base CAS has no document
      return null;
}

What am I missing when I declare the output sofa capability of the AE? Why doesthe JCas default view seems to be inaccessible after I declared the outputSofa?


Thanks for any hints and information!


--
--------------------------------
Christoph Büscher

Problem using Capabilities - OutputSofa

Reply via email to