On 04.02.2014, at 16:44, Luca Foppiano <[email protected]> wrote:
> On Tue, Feb 4, 2014 at 4:14 PM, Luca Foppiano <[email protected]> wrote:
> I've tried to specify two different View names, or the same (input/output
>> views) but without success. It seems that either the mapping is not
>> effective or I'm doing something wrong.
>>
>> If you could quickly have a look, here is what I've changed:
>>
>> https://github.com/lfoppiano/uima-fit-sample-pipeline/commit/148ef74601d28d2c3781786160121c94dde487dd
>
> Apologize for the high amount of emails.
>
> Might be that I in the Dictionary Annotator do I have to use the
> @SofaCapability to enable it?
>
> If so, how I could possibly integrate a AE that I don't have control over
> the code, into uima fit?
Adding a @SofaCapability should only have one effect: UIMA will not try to
supply the default view "_InitialView" (constant: CAS.NAME_DEFAULT_SOFA) to the
AE, but rather expect that the AE fetches the views it needs from the CAS.
If you to not specify a @SofaCapability, then UIMA should supply the default
view CAS.NAME_DEFAULT_SOFA to the AE.
Let's look at your project:
You read a TEI file with annotations in to the CAS.
Then you create a new view (SOFA_NAME_TEXT_ONLY) containing only the text from
the TEI file - no markup.
AnalysisEngineDescription whitespaceEngine = createEngineDescription(
WhitespaceTokenizer.class,
"SofaNames", new String[]{SimpleParserAE.SOFA_NAME_TEXT_ONLY});
It looks like the SofaNames parameter for the WhitespaceTokenizer should be
used when multiple views are to be processed at once. This parameter allows to
have a single tokenizer in the pipeline to affect multiple views. With view
mappings, the tokenizer would need to be added to the pipeline once per view.
Instead of using this parameter, you could also use a mapping.
Finally, you write the result out.
Now to the view:
The reader loads the TEI data into the default view (CAS.NAME_DEFAULT_SOFA).
The SimpleParserAE fetches the data from the default view and stores it into
SOFA_NAME_TEXT_ONLY.
The WhitespaceTokenizer operates on the SOFA_NAME_TEXT_ONLY (currently via
parameter).
The DictionaryAnnotator knows nothing about views - thus it operates on the
default view (CAS.NAME_DEFAULT_SOFA).
The consumer explicitly fetches the SOFA_NAME_TEXT_ONLY in its code and works
on that.
Now to the mappings. Currently you have this mapping:
builder.add(preparationEngine);
builder.add(whitespaceEngine);
builder.add(dictionaryEngine,
SimpleParserAE.SOFA_NAME_TEXT_ONLY, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
This means that view SOFA_NAME_TEXT_ONLY is renamed to SOFA_NAME_TEXT_ONLY for
the dictionaryEngine (so actually this has no effect at all). All other AEs
have no mappings.
The correct mapping for the dictionaryEngine should be
builder.add(dictionaryEngine,
CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
so the SOFA_NAME_TEXT_ONLY is supplied as the default view to the
dictionaryEngine.
Similarly, it should be possible to remove the view parameter from
whitespaceEngine and the getView call from the consumer and use these mappings:
builder.add(preparationEngine);
builder.add(whitespaceEngine,
CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
builder.add(dictionaryEngine,
CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
builder.add(casConsumer,
CAS.NAME_DEFAULT_SOFA, SimpleParserAE.SOFA_NAME_TEXT_ONLY);
I didn't actually try to modify your code and run this, because your code uses
absolute paths.
Cheers,
-- Richard