Hi Katja,

I'm afraid my answer comes too late but the nice thinh is precisely that. In the chunk of code I sent you it is what I wanted to show: You iter over all the annotator if you detect a np all the tokens you are going to detect after will belong to this np till you reach another one. The iterators in UIMA are already structured according to the position but also to the hierarchy, so if you iter over the generic iterator and test the class its elements belongs to you are sure first to detect fisrt np (0,10) in your example and the other elements will be of type token till you find another np. If your np are not contiguous you always have the possibility to check token.end <= np.end but anyway the order is guaranteed.

Ekaterina Buyko wrote:
Hi Christian,

Thank you very much.

What I had orinally in mind would be a method in UIMA such as:
Sentence [] sentence = token.getOverlapAnnotation (Sentence.type);

But I have still some questions to your proposal:

If you get an iterator over all annotations, it is ok.
Do you know what is the order the annotations are in?

If I have for example the annotations (numbers are respective begin and end)
NP np (0,10)
Token token1(0,5), token2(6, 10)

Then I get index. How are they ordered?
np, token1, token2?

And what will be if they have the same span?
NP np (0,5)
Token token1(0,5)

With best regards

Katja



Christian Mauceri schrieb:
Hi Ekaterina,

if I understood your question, it is possible and even a nice feature of UIMA. I have more or less the same problems, I have two types of annotations contexts and forms (sentences and token for you). So I have TAEs which marks contexts and forms then I have another TAE (a CAS consumer in my very simple case) which do the following.:

      // A context
       TCollocation tc = null;
      // A form
       TForm f = null;

      // I first iter over all the annotations
Iterator annot = jcas.getJFSIndexRepository().getAnnotationIndex().iterator();
       while(annot.hasNext()) {
           Annotation a = (Annotation)annot.next();
// then I test if it is a context TCollocation or a form TForm
           if (a instanceof TCollocation) {
               tc = (TCollocation)a;
               //System.out.println(tc.getMatch());
           } else if (a instanceof TForm) {
               f = (TForm) a;
           }
       }

That's all the nice thing is that the iterator respects the position order in the text and the inclusion hierarchy so you are sure the current form belongs to the current context.

I hope it is helpfull and I did not say baloneys, at least works fine for me.

Regards.
Christian.


Ekaterina Buyko wrote:
Hi all!

In UIMA 2.1 it is possible to create a sub-iterator in order to iterate over annotations which are within the begin-end span of the selected type.

For example:

AnnotationIndex sentenceIndex = (AnnotationIndex) aJCas .getJFSIndexRepository().getAnnotationIndex(Sentence.type);

AnnotationIndex tokenIndex = (AnnotationIndex) aJCas
               .getJFSIndexRepository().getAnnotationIndex(Token.type);

       // iterate over Sentences
       FSIterator sentenceIterator = sentenceIndex.iterator();
       while (sentenceIterator.hasNext()) {

           Sentence sentence = (Sentence) sentenceIterator.next();

           // iterate over Tokens
           FSIterator tokenIterator = tokenIndex.subiterator(sentence);


I would like to have a more extended functionality. I need to know the annotations which are in the span of begin-end of the selected annotation type. These annotations can overlap the span of the selected type.

For example noun phrases. If I iterate over tokens, I would like to know, if this token is inside a noun phrase or not. Now, I am working with Hashtables. But I am looking for an other solution.

How could I solve this problem?

Bets regards

Ekaterina









--
Cordialement/Regards
Christian Mauceri
http://hermeneute.com/Christian

Reply via email to