Re: Iterators in CAS

Marshall Schor Fri, 12 Oct 2007 11:21:58 -0700

Here are some other approaches.

Currently, you have an approach where you iterate over [tokens] and for
each one, want to see if there is a containing [sentence].  If your
logic permits, you could iterate over [sentences], and for each,
subiterate over the [tokens]; each token found would, of course, be in
the span of the sentence you were on at that point.  This takes no more
storage, and would be faster than the method of searching using
findCoverFS.  But perhaps the program's logic doesn't permit this.


Another approach would be to do the above, once, somewhere in your
annotator chain, and save the result as an additional field in the
[token] - a reference to the [sentence] annotation that contains it. 
Then it's just a matter of dereferencing that field to find the
containing sentence.  Of course, this takes an additional 4-byte slot
per token, to hold the back reference.

One last point - the code below does an "indexed" search over all
sentences looking for the containing sentence, every time you want to
find it.  This is slower than the above 2 methods, although our
implementation is pretty fast (I think it has a log(n) kind of
performance - n being the number of things in the index).

-Marshall Schor

Michael Baessler wrote:
> Hi Ekaterina,
>
> I had the similar problem when implementing the
> RegularExpressionAnnotator - how to find the covering annotation of a
> certain type for my current FS.
>
> The code is checked in to the SVN at:
> http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/src/main/java/org/apache/uima/annotator/regex/impl/RegExAnnotator.java
>
>
> The method is called:
> findCoverFS(CAS aCAS, AnnotationFS annot, Type coverFsType)
>
> If this is exactly what you need, we may can discuss to move this to
> the core framework API.
>
> Hope that helps.
>
> -- Michael
>
> Ekaterina Buyko wrote:
>> Hi all!
>>
>> In UIMA 2.1 it is possible to create a sub-iterator in order to
>> iterate over annotations which are within the begin-end span of the
>> selected type.
>>
>> For example:
>>
>> AnnotationIndex sentenceIndex = (AnnotationIndex) aJCas
>> .getJFSIndexRepository().getAnnotationIndex(Sentence.type);
>>
>> AnnotationIndex tokenIndex = (AnnotationIndex) aJCas
>>                .getJFSIndexRepository().getAnnotationIndex(Token.type);
>>
>>        // iterate over Sentences
>>        FSIterator sentenceIterator = sentenceIndex.iterator();
>>        while (sentenceIterator.hasNext()) {
>>
>>            Sentence sentence = (Sentence) sentenceIterator.next();
>>
>>            // iterate over Tokens
>>            FSIterator tokenIterator = tokenIndex.subiterator(sentence);
>>
>>
>> I would like to have a more extended functionality. I need to know
>> the annotations which are in the span of begin-end of the selected
>> annotation type. These annotations can overlap the span of the
>> selected type.
>>
>> For example noun phrases. If I iterate over tokens, I would like to
>> know, if this token is inside a noun phrase or not. Now, I am working
>> with Hashtables. But I am looking for an other solution.
>>
>> How could I solve this problem?
>>
>> Bets regards
>>
>> Ekaterina
>>
>>
>>
>>
>
>
>

Re: Iterators in CAS

Reply via email to