Hi,
I ignored the analysis engines "capabilities" section so far, but after I tried
declaring an "outputSofa" for the first time, I ran into trouble using the
analysis engine in a CPE.
I have an AE that takes webpages in HTML format as input and removes the
HTML-Tags etc... The result is stored in a new CAS view named "plainTextView".
So far I didn't declare any capabilities in the AEs descriptor, but now I tried
this:
<capabilities>
<capability>
<inputs/>
<outputs/>
<outputSofas>
<sofaName>plainTextView</sofaName>
</outputSofas>
<languagesSupported/>
</capability>
</capabilities>
The AEs process() method usually acesses the default view of the JCas, does some
processing and stores the result in the new view. The code goes something like this:
// get the text from the default CAS view
String originalText = jcas.getDocumentText();
JCas plainTextView = null;
// Extract plain text from original document
documentWithoutHTML = someProcessing();
// create view for stripped HTML document
try {
plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME);
plainTextView.setDocumentText("plainTextView");
} catch (CASException e) {
logger.warn(e.getMessage());
throw new AnalysisEngineProcessException(e);
}
Using this AE in a CPE (inside an aggregate AE) was working until I declared the
outputSofa like described above. Now when trying to retrieve the original text
from the default view with "jcas.getDocumentText()" always returns "null". Some
debugging shows that the reason for this is that in CASImpl.getSofaDataString()
it appears this branch is used:
if (this == this.svd.baseCAS) {
// base CAS has no document
return null;
}
What am I missing when I declare the output sofa capability of the AE? Why does
the JCas default view seems to be inaccessible after I declared the outputSofa?
Thanks for any hints and information!
--
--------------------------------
Christoph Büscher