Re: Problem using Capabilities - OutputSofa

Thilo Goetz Wed, 04 Jun 2008 08:21:24 -0700

And here's some background if you're interested:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00945.html


There's a lot of discussion before that message,
and a lot afterwards.

So we were mostly agreed that this was broken, but
couldn't agree on the proper fix and finally gave
up.  If we ever do a UIMA 3, we'll have the same
discussion all over :-)

--Thilo

Eddie Epstein wrote:

The CAS reference passed to the annotator process method changes when
Sofa capabilities are declared. See
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.deciding_multi_view

After declaring an output Sofa, process gets the "base CAS". To get
the text from the "default" view, try

String originalText = jcas.getCas().getCurrentView().getDocumentText();

Eddie

PS looks like the JCas interface is missing the getCurrentView() method.

On 6/4/08, Christoph Büscher <[EMAIL PROTECTED]> wrote:

Hi,

I ignored the analysis engines "capabilities" section so far, but after I
tried
declaring an "outputSofa" for the first time, I ran into trouble using the
analysis engine in a CPE.

I have an AE that takes webpages in HTML format as input and removes the
HTML-Tags etc... The result is stored in a new CAS view named
"plainTextView".
So far I didn't declare any capabilities in the AEs descriptor, but now I
tried
this:

<capabilities>
       <capability>
         <inputs/>
         <outputs/>
         <outputSofas>
           <sofaName>plainTextView</sofaName>
         </outputSofas>
         <languagesSupported/>
       </capability>
</capabilities>

The AEs process() method usually acesses the default view of the JCas, does
some
processing and stores the result in the new view. The code goes something
like this:

  // get the text from the default CAS view
  String originalText = jcas.getDocumentText();
  JCas plainTextView = null;

// Extract plain text from original document
documentWithoutHTML = someProcessing();

// create view for stripped HTML document
try {
    plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME);
    plainTextView.setDocumentText("plainTextView");
} catch (CASException e) {
    logger.warn(e.getMessage());
    throw new AnalysisEngineProcessException(e);
}


Using this AE in a CPE (inside an aggregate AE) was working until I declared
the
outputSofa like described above. Now when trying to retrieve the original
text
from the default view with "jcas.getDocumentText()" always returns "null".
Some
debugging shows that the reason for this is that in
CASImpl.getSofaDataString()
it appears this branch is used:

if (this == this.svd.baseCAS) {
       // base CAS has no document
       return null;
}


What am I missing when I declare the output sofa capability of the AE? Why
does
the JCas default view seems to be inaccessible after I declared the
outputSofa?

Thanks for any hints and information!


--
--------------------------------
Christoph Büscher

Re: Problem using Capabilities - OutputSofa

Reply via email to