Re: How to process structured input with UIMA?

Jörn Kottmann Wed, 02 Mar 2011 07:10:00 -0800

On 3/2/11 3:25 PM, Andreas Kahl wrote:

Anuj and Jan,


Thank you very much for your tips. I think, I will try the annotation-way:
Use an CollectionProcessingEngine to iterate all the Docs in my input-XML.
Instatiate a CAS with the input-XML as text.
Then run an Annotator converting all XML-Tags into Annotations (I think I am 
going to set annotation.setBegin() and .setEnd() to something generic like 0).
Based on that I'm going to build up my Pipeline.
I'll keep you posted as soon as I have some results.

The idea of an annotation is really that it is bound to a span of text.If you donot want that, then just use a type which is directly derived fromFeature Structure.

Most text processing assumes that you have annotations which mark apiece of text, then

retrieve the text, process it and output annotations.

Lets say you want to use a tokenizer, it needs an annotation (e.g. asentence) as input and might

output token annotations within the input annotation span.

Jörn

Re: How to process structured input with UIMA?

Reply via email to