Re: Dictionary annotator - added parameter to make analysis on different SOFAs

Richard Eckart de Castilho Tue, 04 Feb 2014 13:42:06 -0800

On 04.02.2014, at 16:44, Luca Foppiano <l...@foppiano.org> wrote:
> On Tue, Feb 4, 2014 at 4:14 PM, Luca Foppiano <l...@foppiano.org> wrote:
> I've tried to specify two different View names, or the same (input/output
>> views) but without success. It seems that either the mapping is not
>> effective or I'm doing something wrong.
>> 
>> If you could quickly have a look, here is what I've changed:
>> 
>> https://github.com/lfoppiano/uima-fit-sample-pipeline/commit/148ef74601d28d2c3781786160121c94dde487dd
> 
> Apologize for the high amount of emails.
> 
> Might be that I in the Dictionary Annotator do I have to use the
> @SofaCapability to enable it?
> 
> If so, how I could possibly integrate a AE that I don't have control over
> the code, into uima fit?


Adding a @SofaCapability should only have one effect: UIMA will not try to 
supply the default view "_InitialView" (constant: CAS.NAME_DEFAULT_SOFA) to the 
AE, but rather expect that the AE fetches the views it needs from the CAS.

If you to not specify a @SofaCapability, then UIMA should supply the default 
view CAS.NAME_DEFAULT_SOFA to the AE.

Let's look at your project:

You read a TEI file with annotations in to the CAS.

Then you create a new view (SOFA_NAME_TEXT_ONLY) containing only the text from 
the TEI file - no markup.

AnalysisEngineDescription whitespaceEngine = createEngineDescription(
  WhitespaceTokenizer.class,
  "SofaNames", new String[]{SimpleParserAE.SOFA_NAME_TEXT_ONLY});

It looks like the SofaNames parameter for the WhitespaceTokenizer should be 
used when multiple views are to be processed at once. This parameter allows to 
have a single tokenizer in the pipeline to affect multiple views. With view 
mappings, the tokenizer would need to be added to the pipeline once per view. 
Instead of using this parameter, you could also use a mapping.

Finally, you write the result out.

Now to the view:

The reader loads the TEI data into the default view (CAS.NAME_DEFAULT_SOFA).
The SimpleParserAE fetches the data from the default view and stores it into 
SOFA_NAME_TEXT_ONLY.
The WhitespaceTokenizer operates on the SOFA_NAME_TEXT_ONLY (currently via 
parameter).
The DictionaryAnnotator knows nothing about views - thus it operates on the 
default view (CAS.NAME_DEFAULT_SOFA).
The consumer explicitly fetches the SOFA_NAME_TEXT_ONLY in its code and works 
on that.

Now to the mappings. Currently you have this mapping:

builder.add(preparationEngine);
builder.add(whitespaceEngine);
builder.add(dictionaryEngine, 
  SimpleParserAE.SOFA_NAME_TEXT_ONLY, SimpleParserAE.SOFA_NAME_TEXT_ONLY);

This means that view SOFA_NAME_TEXT_ONLY is renamed to SOFA_NAME_TEXT_ONLY for 
the dictionaryEngine (so actually this has no effect at all). All other AEs 
have no mappings.

The correct mapping for the dictionaryEngine should be 

builder.add(dictionaryEngine, 
  CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);

so the SOFA_NAME_TEXT_ONLY is supplied as the default view to the 
dictionaryEngine.

Similarly, it should be possible to remove the view parameter from 
whitespaceEngine and the getView call from the consumer and use these mappings:

builder.add(preparationEngine);
builder.add(whitespaceEngine,
  CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
builder.add(dictionaryEngine, 
  CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
builder.add(casConsumer, 
  CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);

I didn't actually try to modify your code and run this, because your code uses 
absolute paths.

Cheers,

-- Richard

Re: Dictionary annotator - added parameter to make analysis on different SOFAs

Reply via email to