The CAS reference passed to the annotator process method changes when
Sofa capabilities are declared. See
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.deciding_multi_view

After declaring an output Sofa, process gets the "base CAS". To get
the text from the "default" view, try

String originalText = jcas.getCas().getCurrentView().getDocumentText();

Eddie

PS looks like the JCas interface is missing the getCurrentView() method.

On 6/4/08, Christoph Büscher <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I ignored the analysis engines "capabilities" section so far, but after I
> tried
> declaring an "outputSofa" for the first time, I ran into trouble using the
> analysis engine in a CPE.
>
> I have an AE that takes webpages in HTML format as input and removes the
> HTML-Tags etc... The result is stored in a new CAS view named
> "plainTextView".
> So far I didn't declare any capabilities in the AEs descriptor, but now I
> tried
> this:
>
> <capabilities>
>        <capability>
>          <inputs/>
>          <outputs/>
>          <outputSofas>
>            <sofaName>plainTextView</sofaName>
>          </outputSofas>
>          <languagesSupported/>
>        </capability>
> </capabilities>
>
> The AEs process() method usually acesses the default view of the JCas, does
> some
> processing and stores the result in the new view. The code goes something
> like this:
>
>   // get the text from the default CAS view
>   String originalText = jcas.getDocumentText();
>   JCas plainTextView = null;
>
> // Extract plain text from original document
> documentWithoutHTML = someProcessing();
>
> // create view for stripped HTML document
> try {
>     plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME);
>     plainTextView.setDocumentText("plainTextView");
> } catch (CASException e) {
>     logger.warn(e.getMessage());
>     throw new AnalysisEngineProcessException(e);
> }
>
>
> Using this AE in a CPE (inside an aggregate AE) was working until I declared
> the
> outputSofa like described above. Now when trying to retrieve the original
> text
> from the default view with "jcas.getDocumentText()" always returns "null".
> Some
> debugging shows that the reason for this is that in
> CASImpl.getSofaDataString()
> it appears this branch is used:
>
> if (this == this.svd.baseCAS) {
>        // base CAS has no document
>        return null;
> }
>
>
> What am I missing when I declare the output sofa capability of the AE? Why
> does
> the JCas default view seems to be inaccessible after I declared the
> outputSofa?
>
> Thanks for any hints and information!
>
>
> --
> --------------------------------
> Christoph Büscher
>

Reply via email to