It really depends on the data in your CAS. As far as I know, there is typically
only one big annotation index - if you get an iterator for a specific type, a
filtered iterator is created internally and returned. The only thing to speed
up iteration is the offsets. If the annotations you are looking for are more or
less evenly distributed throughout your text, it's probably faster to use a
single filtered iterator than iterating separately for each type.
So far my understanding and experience. Any of the UIMA maintainers, please
correct my if I am wrong.
Cheers,
Richard
Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:
> Isn't this slow? Because it then needs to iterate over every
> single AnnotationFS inside my CAS.
>
> Jörn
>
>
> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>> Hi Jörn,
>>
>>> what is the best way to iterate over annotations which have
>>> different types?
>> you can use a filtered iterator - more or less like this:
>>
>> CAS cas = jcas.getCas();
>> ConstraintFactory cf = ConstraintFactory.instance();
>> FSIterator<Annotation> iterator =
>> jcas.getAnnotationIndex().iterator();
>> Type tokenType = jcas.getCasType(Token.type);
>> Type sentenceType = jcas.getCasType(Sentence.type);
>>
>> // Restrict to Tokens
>> FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>> typeConstraint.add(tokenType);
>>
>> // Restrict to Tokens
>> FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>> typeConstraint.add(sentenceType);
>>
>> // Combine both constraints using "or"
>> FSMatchConstraint disjunction = cf.or(typeConstraint1,
>> typeConstraint2);
>>
>> // Create and use the filtered iterator
>> FSIterator<Annotation> filteredIterator =
>> cas.createFilteredIterator(iterator, disjunction);
>> while(filteredIterator.hasNext()) {
>>
>> System.out.println(filteredIterator.next().getCoveredText());
>> }
>>
>> Cheers,
>>
>> Richard
>>
>
Richard Eckart de Castilho
--
-------------------------------------------------------------------
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected]
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------