Hi Peter,

I see one issue with your code: you're using the windowAnnotation
to position the iterator.  Instead, you should be using an
annotation of the type you're searching for.  This is because
annotations are sorted by type as well.  If you don't specify
explicit type priorities, it doesn't mean that no type priority
will be used.  It just means that the type priority is arbitrary.
I think our documentation is misleading in this respect, I was
confused myself when I just read it.

The net effect of this is that you can't reliably position an
iterator where you're looking for annotations of type a with an
annotation of type b.  If you were just looking for annotations
of a specific type, you could change your code like this:

// Warning: untested
windowAnnotation = getCas().createAnnotation(type, windowAnnotation.getBegin(),
windowAnnotation.getEnd());

However, you also appear to be interested in annotations of subtypes
of your input type.  Those might be ordered differently again.

So what to do?

You can either use explicit type priorities (a PITA and not recommended),
or you use an annotation for iterator positioning that's guaranteed to
position the iterator in front of the first annotation you're interested
in.  For example, you could do this:

AnnotationFS posFS = getCas().createAnnotation(type,
windowAnnotation.getBegin(), windowAnnotation.getEnd() + 1);

The type could be any annotation type, but note the increased end position.
This will get your iterator positioned before, but not necessarily on
the first desired annotation.  You would then use your original
windowAnnotation to check that the FSs you're iterating over are in
range.

This trick is necessary because you simply can't know how types are
ordered in the absence of explicit type priorities.

It seems to me like we should really have a built-in API that does
this.  Most people don't care or even know about type priorities,
so the subiterator stuff is often useless.  Or we could add yet
another flag to the subiterator API, boolean ignoreTypeOrder.

--Thilo


Peter Klügl wrote:
> Hello,
> 
> I have a problem accessing annotations, especially annotations of a
> given type in a window represented by another annotation. I am using
> type priorities but not in this concrete situation. Therefore i am not
> using the subiterator to access the contained annotations, but several
> other methods like this one:
> 
> public List<AnnotationFS> getAnnotationsInWindow(
>        AnnotationFS windowAnnotation, Type type) {
>    List<AnnotationFS> result = new ArrayList<AnnotationFS>();
>    FSIterator completeIt = getCas().getAnnotationIndex().iterator();
>    if (getDocumentAnnotation().getEnd() < windowAnnotation.getEnd()) {
>        completeIt.moveToLast();
>    } else {
>        completeIt.moveTo(windowAnnotation);
>    }
>    while (completeIt.isValid()
>        && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
>            .getBegin()) {
>        completeIt.moveToPrevious();
>    }
> 
>    if (completeIt.isValid()) {
>        completeIt.moveToNext();
>    } else {
>        completeIt.moveToFirst();
>    }
> 
>    while (completeIt.isValid()
>        && ((Annotation) completeIt.get()).getBegin() < windowAnnotation
>            .getBegin()) {
>        completeIt.moveToNext();
>    }
> 
>    while (completeIt.isValid()
>        && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
>            .getBegin()) {
>        Annotation annotation = (Annotation) completeIt.get();
>        if (getCas().getTypeSystem().subsumes(type, annotation.getType())
>            && annotation.getEnd() <= windowAnnotation.getEnd()) {
>        result.add(annotation);
>        }
>        completeIt.moveToNext();
>    }
>    return result;
>    }
> 
> I tried already many variations of this (e.g., with the index of the
> given type, using a new created frame with a higher type priority), but
> this method returns sometimes simply wrong annotations or no annotations
> at all. The only method that seems to work all the time is to iterate
> over all annotations of the given type and then filter those that are
> not contained in the window.
> 
> My problem occurs in many different shapes. Here's one short example:
> 
> Arguments:
> 
> windowAnnotation:
> Annotation
>   sofa: _InitialView
>   begin: 10899
>   end: 10906
> Covered Text: {.8}Bar
> 
> type:
> de.uniwue.casetrain.Answers.answerText
> 
> 
> Information in the eclipse debug view (Variables):
> SubAnnotations:
> [0]
> WordAnswer
>   sofa: _InitialView
>   begin: 10899
>   end: 10906
>   SimpleFeedback: <null>
>   PosFactor: 0.8
>   NegFactor: 0.0
>   InstanceIsUserAnswer: false
>   Text: <null>
>   IsRegularExpression: false
>   EditDistance: 0
> [1]
> WordAnswer
>   sofa: _InitialView
>   begin: 10886
>   end: 10892
>   SimpleFeedback: <null>
>   PosFactor: 1.0
>   NegFactor: 0.0
>   InstanceIsUserAnswer: false
>   Text: answerText
>      sofa: _InitialView
>      begin: 10889
>      end: 10892
>   IsRegularExpression: false
>   EditDistance: 0
> [2]
> paragraph
>   sofa: _InitialView
>   begin: 10899
>   end: 10906
> ...
> [20]
> answerText
>   sofa: _InitialView
>   begin: 10903
>   end: 10906
> 
> ...and so on....
> 
> In this example the method above simply stops at the second annotations
> since start offset is smaller than the one of the window. How can this
> be? Is the position in the index depending on the values of the
> features? Did I miss something here? In other examples the method above
> just steps over the important annotations. Using the brute force
> approach is not really a solution for me; it slows down my system by
> more than 50%.
> 
> I would really appreciate any hints to solve this problem since this
> method/functionality is quite essential for my application. Of course, i
> would also not reject any advice about performance.
> 
> Best regards
> 
> Peter

Reply via email to