Jörn, thanks for your hint. I will consider that, too. If those Feature Structure variables work well, they sound to me like a good alternative to 'misusing' the span-bound Annotations. I really have to practice a bit.
Best Regards Andreas -------- Original-Nachricht -------- > Datum: Wed, 02 Mar 2011 16:09:25 +0100 > Von: "Jörn Kottmann" <[email protected]> > An: [email protected] > Betreff: Re: How to process structured input with UIMA? > On 3/2/11 3:25 PM, Andreas Kahl wrote: > > Anuj and Jan, > > > > Thank you very much for your tips. I think, I will try the > annotation-way: > > Use an CollectionProcessingEngine to iterate all the Docs in my > input-XML. > > Instatiate a CAS with the input-XML as text. > > Then run an Annotator converting all XML-Tags into Annotations (I think > I am going to set annotation.setBegin() and .setEnd() to something generic > like 0). > > Based on that I'm going to build up my Pipeline. > > I'll keep you posted as soon as I have some results. > > > The idea of an annotation is really that it is bound to a span of text. > If you do > not want that, then just use a type which is directly derived from > Feature Structure. > > Most text processing assumes that you have annotations which mark a > piece of text, then > retrieve the text, process it and output annotations. > > Lets say you want to use a tokenizer, it needs an annotation (e.g. a > sentence) as input and might > output token annotations within the input annotation span. > > Jörn
