It really depends on the data in your CAS. As far as I know, there is typically 
only one big annotation index - if you get an iterator for a specific type, a 
filtered iterator is created internally and returned. The only thing to speed 
up iteration is the offsets. If the annotations you are looking for are more or 
less evenly distributed throughout your text, it's probably faster to use a 
single filtered iterator than iterating separately for each type.

So far my understanding and experience. Any of the UIMA maintainers, please 
correct my if I am wrong.

Cheers,

Richard

Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:

> Isn't this slow? Because it then needs to iterate over every
> single AnnotationFS inside my CAS.
> 
> Jörn
> 
> 
> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>> Hi Jörn,
>> 
>>> what is the best way to iterate over annotations which have
>>> different types?
>> you can use a filtered iterator - more or less like this:
>> 
>>              CAS cas = jcas.getCas();
>>              ConstraintFactory cf = ConstraintFactory.instance();
>>              FSIterator<Annotation>  iterator = 
>> jcas.getAnnotationIndex().iterator();
>>              Type tokenType = jcas.getCasType(Token.type);
>>              Type sentenceType = jcas.getCasType(Sentence.type);
>> 
>>              // Restrict to Tokens
>>              FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>>              typeConstraint.add(tokenType);
>> 
>>              // Restrict to Tokens
>>              FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>>              typeConstraint.add(sentenceType);
>> 
>>              // Combine both constraints using "or"
>>              FSMatchConstraint disjunction = cf.or(typeConstraint1, 
>> typeConstraint2);
>> 
>>              // Create and use the filtered iterator
>>              FSIterator<Annotation>  filteredIterator = 
>> cas.createFilteredIterator(iterator, disjunction);
>>              while(filteredIterator.hasNext()) {
>>                      
>> System.out.println(filteredIterator.next().getCoveredText());
>>              }
>> 
>> Cheers,
>> 
>> Richard
>> 
> 

Richard Eckart de Castilho

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 




Reply via email to