Hi Thilo,
first of all, thanks for the detailed answer.
I will try the approach with the increased end offset of the window, but
I fear that this won't change the results (but should result in some
speedup). I already added the additional code fragment in the methods in
order to get the correct start position irrespective of the type order:
while (completeIt.isValid()
&& ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
.getBegin()) {
completeIt.moveToPrevious();
}
if (completeIt.isValid()) {
completeIt.moveToNext();
} else {
completeIt.moveToFirst();
}
This should ignore all different orders of types in the index. Beside that, I also tried
to use the same type in the window annotation and even a new window with an explictly
defined type priority resulting in a (theoretically) correct starting position. But none
of those methods returns always the correct list of contained annotations. I must have
made a mistake somewhere... for example see the example with the debug information I
added to my last mail. How can it be possible that an annotation is
"disrespecting" the begin/end indexing strategy? Is it possible that changing a
feature value without removeFromIndex/addToIndex can confuse the index in this way? I
already started to make a guess.
I would love such an ignoreTypeOrder flag.
I will try some things now and get back to you with the results.
Peter
Thilo Goetz schrieb:
Hi Peter,
I see one issue with your code: you're using the windowAnnotation
to position the iterator. Instead, you should be using an
annotation of the type you're searching for. This is because
annotations are sorted by type as well. If you don't specify
explicit type priorities, it doesn't mean that no type priority
will be used. It just means that the type priority is arbitrary.
I think our documentation is misleading in this respect, I was
confused myself when I just read it.
The net effect of this is that you can't reliably position an
iterator where you're looking for annotations of type a with an
annotation of type b. If you were just looking for annotations
of a specific type, you could change your code like this:
// Warning: untested
windowAnnotation = getCas().createAnnotation(type, windowAnnotation.getBegin(),
windowAnnotation.getEnd());
However, you also appear to be interested in annotations of subtypes
of your input type. Those might be ordered differently again.
So what to do?
You can either use explicit type priorities (a PITA and not recommended),
or you use an annotation for iterator positioning that's guaranteed to
position the iterator in front of the first annotation you're interested
in. For example, you could do this:
AnnotationFS posFS = getCas().createAnnotation(type,
windowAnnotation.getBegin(), windowAnnotation.getEnd() + 1);
The type could be any annotation type, but note the increased end position.
This will get your iterator positioned before, but not necessarily on
the first desired annotation. You would then use your original
windowAnnotation to check that the FSs you're iterating over are in
range.
This trick is necessary because you simply can't know how types are
ordered in the absence of explicit type priorities.
It seems to me like we should really have a built-in API that does
this. Most people don't care or even know about type priorities,
so the subiterator stuff is often useless. Or we could add yet
another flag to the subiterator API, boolean ignoreTypeOrder.
--Thilo
Peter Klügl wrote:
Hello,
I have a problem accessing annotations, especially annotations of a
given type in a window represented by another annotation. I am using
type priorities but not in this concrete situation. Therefore i am not
using the subiterator to access the contained annotations, but several
other methods like this one:
public List<AnnotationFS> getAnnotationsInWindow(
AnnotationFS windowAnnotation, Type type) {
List<AnnotationFS> result = new ArrayList<AnnotationFS>();
FSIterator completeIt = getCas().getAnnotationIndex().iterator();
if (getDocumentAnnotation().getEnd() < windowAnnotation.getEnd()) {
completeIt.moveToLast();
} else {
completeIt.moveTo(windowAnnotation);
}
while (completeIt.isValid()
&& ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
.getBegin()) {
completeIt.moveToPrevious();
}
if (completeIt.isValid()) {
completeIt.moveToNext();
} else {
completeIt.moveToFirst();
}
while (completeIt.isValid()
&& ((Annotation) completeIt.get()).getBegin() < windowAnnotation
.getBegin()) {
completeIt.moveToNext();
}
while (completeIt.isValid()
&& ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
.getBegin()) {
Annotation annotation = (Annotation) completeIt.get();
if (getCas().getTypeSystem().subsumes(type, annotation.getType())
&& annotation.getEnd() <= windowAnnotation.getEnd()) {
result.add(annotation);
}
completeIt.moveToNext();
}
return result;
}
I tried already many variations of this (e.g., with the index of the
given type, using a new created frame with a higher type priority), but
this method returns sometimes simply wrong annotations or no annotations
at all. The only method that seems to work all the time is to iterate
over all annotations of the given type and then filter those that are
not contained in the window.
My problem occurs in many different shapes. Here's one short example:
Arguments:
windowAnnotation:
Annotation
sofa: _InitialView
begin: 10899
end: 10906
Covered Text: {.8}Bar
type:
de.uniwue.casetrain.Answers.answerText
Information in the eclipse debug view (Variables):
SubAnnotations:
[0]
WordAnswer
sofa: _InitialView
begin: 10899
end: 10906
SimpleFeedback: <null>
PosFactor: 0.8
NegFactor: 0.0
InstanceIsUserAnswer: false
Text: <null>
IsRegularExpression: false
EditDistance: 0
[1]
WordAnswer
sofa: _InitialView
begin: 10886
end: 10892
SimpleFeedback: <null>
PosFactor: 1.0
NegFactor: 0.0
InstanceIsUserAnswer: false
Text: answerText
sofa: _InitialView
begin: 10889
end: 10892
IsRegularExpression: false
EditDistance: 0
[2]
paragraph
sofa: _InitialView
begin: 10899
end: 10906
...
[20]
answerText
sofa: _InitialView
begin: 10903
end: 10906
...and so on....
In this example the method above simply stops at the second annotations
since start offset is smaller than the one of the window. How can this
be? Is the position in the index depending on the values of the
features? Did I miss something here? In other examples the method above
just steps over the important annotations. Using the brute force
approach is not really a solution for me; it slows down my system by
more than 50%.
I would really appreciate any hints to solve this problem since this
method/functionality is quite essential for my application. Of course, i
would also not reject any advice about performance.
Best regards
Peter