And here's some background if you're interested: http://www.mail-archive.com/[EMAIL PROTECTED]/msg00945.html
There's a lot of discussion before that message, and a lot afterwards. So we were mostly agreed that this was broken, but couldn't agree on the proper fix and finally gave up. If we ever do a UIMA 3, we'll have the same discussion all over :-) --Thilo Eddie Epstein wrote:
The CAS reference passed to the annotator process method changes when Sofa capabilities are declared. See http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.deciding_multi_view After declaring an output Sofa, process gets the "base CAS". To get the text from the "default" view, try String originalText = jcas.getCas().getCurrentView().getDocumentText(); Eddie PS looks like the JCas interface is missing the getCurrentView() method. On 6/4/08, Christoph Büscher <[EMAIL PROTECTED]> wrote:Hi, I ignored the analysis engines "capabilities" section so far, but after I tried declaring an "outputSofa" for the first time, I ran into trouble using the analysis engine in a CPE. I have an AE that takes webpages in HTML format as input and removes the HTML-Tags etc... The result is stored in a new CAS view named "plainTextView". So far I didn't declare any capabilities in the AEs descriptor, but now I tried this: <capabilities> <capability> <inputs/> <outputs/> <outputSofas> <sofaName>plainTextView</sofaName> </outputSofas> <languagesSupported/> </capability> </capabilities> The AEs process() method usually acesses the default view of the JCas, does some processing and stores the result in the new view. The code goes something like this: // get the text from the default CAS view String originalText = jcas.getDocumentText(); JCas plainTextView = null; // Extract plain text from original document documentWithoutHTML = someProcessing(); // create view for stripped HTML document try { plainTextView = jcas.createView(DOCUMENT_PLAINTEXT_VIEWNAME); plainTextView.setDocumentText("plainTextView"); } catch (CASException e) { logger.warn(e.getMessage()); throw new AnalysisEngineProcessException(e); } Using this AE in a CPE (inside an aggregate AE) was working until I declared the outputSofa like described above. Now when trying to retrieve the original text from the default view with "jcas.getDocumentText()" always returns "null". Some debugging shows that the reason for this is that in CASImpl.getSofaDataString() it appears this branch is used: if (this == this.svd.baseCAS) { // base CAS has no document return null; } What am I missing when I declare the output sofa capability of the AE? Why does the JCas default view seems to be inaccessible after I declared the outputSofa? Thanks for any hints and information! -- -------------------------------- Christoph Büscher
