Here are some other approaches. Currently, you have an approach where you iterate over [tokens] and for each one, want to see if there is a containing [sentence]. If your logic permits, you could iterate over [sentences], and for each, subiterate over the [tokens]; each token found would, of course, be in the span of the sentence you were on at that point. This takes no more storage, and would be faster than the method of searching using findCoverFS. But perhaps the program's logic doesn't permit this.
Another approach would be to do the above, once, somewhere in your annotator chain, and save the result as an additional field in the [token] - a reference to the [sentence] annotation that contains it. Then it's just a matter of dereferencing that field to find the containing sentence. Of course, this takes an additional 4-byte slot per token, to hold the back reference. One last point - the code below does an "indexed" search over all sentences looking for the containing sentence, every time you want to find it. This is slower than the above 2 methods, although our implementation is pretty fast (I think it has a log(n) kind of performance - n being the number of things in the index). -Marshall Schor Michael Baessler wrote: > Hi Ekaterina, > > I had the similar problem when implementing the > RegularExpressionAnnotator - how to find the covering annotation of a > certain type for my current FS. > > The code is checked in to the SVN at: > http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/src/main/java/org/apache/uima/annotator/regex/impl/RegExAnnotator.java > > > The method is called: > findCoverFS(CAS aCAS, AnnotationFS annot, Type coverFsType) > > If this is exactly what you need, we may can discuss to move this to > the core framework API. > > Hope that helps. > > -- Michael > > Ekaterina Buyko wrote: >> Hi all! >> >> In UIMA 2.1 it is possible to create a sub-iterator in order to >> iterate over annotations which are within the begin-end span of the >> selected type. >> >> For example: >> >> AnnotationIndex sentenceIndex = (AnnotationIndex) aJCas >> .getJFSIndexRepository().getAnnotationIndex(Sentence.type); >> >> AnnotationIndex tokenIndex = (AnnotationIndex) aJCas >> .getJFSIndexRepository().getAnnotationIndex(Token.type); >> >> // iterate over Sentences >> FSIterator sentenceIterator = sentenceIndex.iterator(); >> while (sentenceIterator.hasNext()) { >> >> Sentence sentence = (Sentence) sentenceIterator.next(); >> >> // iterate over Tokens >> FSIterator tokenIterator = tokenIndex.subiterator(sentence); >> >> >> I would like to have a more extended functionality. I need to know >> the annotations which are in the span of begin-end of the selected >> annotation type. These annotations can overlap the span of the >> selected type. >> >> For example noun phrases. If I iterate over tokens, I would like to >> know, if this token is inside a noun phrase or not. Now, I am working >> with Hashtables. But I am looking for an other solution. >> >> How could I solve this problem? >> >> Bets regards >> >> Ekaterina >> >> >> >> > > >
