Hi Thilo,

first of all, thanks for the detailed answer.

I will try the approach with the increased end offset of the window, but I fear that this won't change the results (but should result in some speedup). I already added the additional code fragment in the methods in order to get the correct start position irrespective of the type order:

while (completeIt.isValid()
      && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
          .getBegin()) {
      completeIt.moveToPrevious();
  }

  if (completeIt.isValid()) {
      completeIt.moveToNext();
  } else {
      completeIt.moveToFirst();
  }

This should ignore all different orders of types in the index. Beside that, I also tried 
to use the same type in the window annotation and even a new window with an explictly 
defined type priority resulting in a (theoretically) correct starting position. But none 
of those methods returns always the correct list of contained annotations. I must have 
made a mistake somewhere... for example see the example with the debug information I 
added to my last mail. How can it be possible that an annotation is 
"disrespecting" the begin/end indexing strategy? Is it possible that changing a 
feature value without removeFromIndex/addToIndex can confuse the index in this way? I 
already started to make a guess.

I would love such an ignoreTypeOrder flag.

I will try some things now and get back to you with the results.


Peter



Thilo Goetz schrieb:
Hi Peter,

I see one issue with your code: you're using the windowAnnotation
to position the iterator.  Instead, you should be using an
annotation of the type you're searching for.  This is because
annotations are sorted by type as well.  If you don't specify
explicit type priorities, it doesn't mean that no type priority
will be used.  It just means that the type priority is arbitrary.
I think our documentation is misleading in this respect, I was
confused myself when I just read it.

The net effect of this is that you can't reliably position an
iterator where you're looking for annotations of type a with an
annotation of type b.  If you were just looking for annotations
of a specific type, you could change your code like this:

// Warning: untested
windowAnnotation = getCas().createAnnotation(type, windowAnnotation.getBegin(),
windowAnnotation.getEnd());

However, you also appear to be interested in annotations of subtypes
of your input type.  Those might be ordered differently again.

So what to do?

You can either use explicit type priorities (a PITA and not recommended),
or you use an annotation for iterator positioning that's guaranteed to
position the iterator in front of the first annotation you're interested
in.  For example, you could do this:

AnnotationFS posFS = getCas().createAnnotation(type,
windowAnnotation.getBegin(), windowAnnotation.getEnd() + 1);

The type could be any annotation type, but note the increased end position.
This will get your iterator positioned before, but not necessarily on
the first desired annotation.  You would then use your original
windowAnnotation to check that the FSs you're iterating over are in
range.

This trick is necessary because you simply can't know how types are
ordered in the absence of explicit type priorities.

It seems to me like we should really have a built-in API that does
this.  Most people don't care or even know about type priorities,
so the subiterator stuff is often useless.  Or we could add yet
another flag to the subiterator API, boolean ignoreTypeOrder.

--Thilo


Peter Klügl wrote:
Hello,

I have a problem accessing annotations, especially annotations of a
given type in a window represented by another annotation. I am using
type priorities but not in this concrete situation. Therefore i am not
using the subiterator to access the contained annotations, but several
other methods like this one:

public List<AnnotationFS> getAnnotationsInWindow(
       AnnotationFS windowAnnotation, Type type) {
   List<AnnotationFS> result = new ArrayList<AnnotationFS>();
   FSIterator completeIt = getCas().getAnnotationIndex().iterator();
   if (getDocumentAnnotation().getEnd() < windowAnnotation.getEnd()) {
       completeIt.moveToLast();
   } else {
       completeIt.moveTo(windowAnnotation);
   }
   while (completeIt.isValid()
       && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
           .getBegin()) {
       completeIt.moveToPrevious();
   }

   if (completeIt.isValid()) {
       completeIt.moveToNext();
   } else {
       completeIt.moveToFirst();
   }

   while (completeIt.isValid()
       && ((Annotation) completeIt.get()).getBegin() < windowAnnotation
           .getBegin()) {
       completeIt.moveToNext();
   }

   while (completeIt.isValid()
       && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
           .getBegin()) {
       Annotation annotation = (Annotation) completeIt.get();
       if (getCas().getTypeSystem().subsumes(type, annotation.getType())
           && annotation.getEnd() <= windowAnnotation.getEnd()) {
       result.add(annotation);
       }
       completeIt.moveToNext();
   }
   return result;
   }

I tried already many variations of this (e.g., with the index of the
given type, using a new created frame with a higher type priority), but
this method returns sometimes simply wrong annotations or no annotations
at all. The only method that seems to work all the time is to iterate
over all annotations of the given type and then filter those that are
not contained in the window.

My problem occurs in many different shapes. Here's one short example:

Arguments:

windowAnnotation:
Annotation
  sofa: _InitialView
  begin: 10899
  end: 10906
Covered Text: {.8}Bar

type:
de.uniwue.casetrain.Answers.answerText


Information in the eclipse debug view (Variables):
SubAnnotations:
[0]
WordAnswer
  sofa: _InitialView
  begin: 10899
  end: 10906
  SimpleFeedback: <null>
  PosFactor: 0.8
  NegFactor: 0.0
  InstanceIsUserAnswer: false
  Text: <null>
  IsRegularExpression: false
  EditDistance: 0
[1]
WordAnswer
  sofa: _InitialView
  begin: 10886
  end: 10892
  SimpleFeedback: <null>
  PosFactor: 1.0
  NegFactor: 0.0
  InstanceIsUserAnswer: false
  Text: answerText
     sofa: _InitialView
     begin: 10889
     end: 10892
  IsRegularExpression: false
  EditDistance: 0
[2]
paragraph
  sofa: _InitialView
  begin: 10899
  end: 10906
...
[20]
answerText
  sofa: _InitialView
  begin: 10903
  end: 10906

...and so on....

In this example the method above simply stops at the second annotations
since start offset is smaller than the one of the window. How can this
be? Is the position in the index depending on the values of the
features? Did I miss something here? In other examples the method above
just steps over the important annotations. Using the brute force
approach is not really a solution for me; it slows down my system by
more than 50%.

I would really appreciate any hints to solve this problem since this
method/functionality is quite essential for my application. Of course, i
would also not reject any advice about performance.

Best regards

Peter

Reply via email to