Peter Klügl wrote:
> Hi Thilo,
>
> first of all, thanks for the detailed answer.
>
> I will try the approach with the increased end offset of the window, but
> I fear that this won't change the results (but should result in some
> speedup). I already added the additional code fragment in the methods in
> order to get the correct start position irrespective of the type order:
>
> while (completeIt.isValid()
> && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
> .getBegin()) {
> completeIt.moveToPrevious();
> }
>
> if (completeIt.isValid()) {
> completeIt.moveToNext();
> } else {
> completeIt.moveToFirst();
> }
>
> This should ignore all different orders of types in the index. Beside
> that, I also tried to use the same type in the window annotation and
> even a new window with an explictly defined type priority resulting in a
> (theoretically) correct starting position. But none of those methods
> returns always the correct list of contained annotations. I must have
> made a mistake somewhere... for example see the example with the debug
> information I added to my last mail. How can it be possible that an
> annotation is "disrespecting" the begin/end indexing strategy? Is it
> possible that changing a feature value without
> removeFromIndex/addToIndex can confuse the index in this way? I already
> started to make a guess.
Hi Peter,
yes, if you change the begin/end position of an annotation without
removeFromIndex/addToIndex, that will invalidate the whole index.
The index has no way of knowing that certain feature values changed,
and so can't do automatic reordering of the index. Indexes are
kept sorted, so you can't modify members without telling the
index about it. This is a speed/convenience trade-off, and we
opted for speed at the time...
--Thilo
>
> I would love such an ignoreTypeOrder flag.
>
> I will try some things now and get back to you with the results.
>
>
> Peter
>
>
>
> Thilo Goetz schrieb:
>> Hi Peter,
>>
>> I see one issue with your code: you're using the windowAnnotation
>> to position the iterator. Instead, you should be using an
>> annotation of the type you're searching for. This is because
>> annotations are sorted by type as well. If you don't specify
>> explicit type priorities, it doesn't mean that no type priority
>> will be used. It just means that the type priority is arbitrary.
>> I think our documentation is misleading in this respect, I was
>> confused myself when I just read it.
>>
>> The net effect of this is that you can't reliably position an
>> iterator where you're looking for annotations of type a with an
>> annotation of type b. If you were just looking for annotations
>> of a specific type, you could change your code like this:
>>
>> // Warning: untested
>> windowAnnotation = getCas().createAnnotation(type,
>> windowAnnotation.getBegin(),
>> windowAnnotation.getEnd());
>>
>> However, you also appear to be interested in annotations of subtypes
>> of your input type. Those might be ordered differently again.
>>
>> So what to do?
>>
>> You can either use explicit type priorities (a PITA and not recommended),
>> or you use an annotation for iterator positioning that's guaranteed to
>> position the iterator in front of the first annotation you're interested
>> in. For example, you could do this:
>>
>> AnnotationFS posFS = getCas().createAnnotation(type,
>> windowAnnotation.getBegin(), windowAnnotation.getEnd() + 1);
>>
>> The type could be any annotation type, but note the increased end
>> position.
>> This will get your iterator positioned before, but not necessarily on
>> the first desired annotation. You would then use your original
>> windowAnnotation to check that the FSs you're iterating over are in
>> range.
>>
>> This trick is necessary because you simply can't know how types are
>> ordered in the absence of explicit type priorities.
>>
>> It seems to me like we should really have a built-in API that does
>> this. Most people don't care or even know about type priorities,
>> so the subiterator stuff is often useless. Or we could add yet
>> another flag to the subiterator API, boolean ignoreTypeOrder.
>>
>> --Thilo
>>
>>
>> Peter Klügl wrote:
>>
>>> Hello,
>>>
>>> I have a problem accessing annotations, especially annotations of a
>>> given type in a window represented by another annotation. I am using
>>> type priorities but not in this concrete situation. Therefore i am not
>>> using the subiterator to access the contained annotations, but several
>>> other methods like this one:
>>>
>>> public List<AnnotationFS> getAnnotationsInWindow(
>>> AnnotationFS windowAnnotation, Type type) {
>>> List<AnnotationFS> result = new ArrayList<AnnotationFS>();
>>> FSIterator completeIt = getCas().getAnnotationIndex().iterator();
>>> if (getDocumentAnnotation().getEnd() < windowAnnotation.getEnd()) {
>>> completeIt.moveToLast();
>>> } else {
>>> completeIt.moveTo(windowAnnotation);
>>> }
>>> while (completeIt.isValid()
>>> && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
>>> .getBegin()) {
>>> completeIt.moveToPrevious();
>>> }
>>>
>>> if (completeIt.isValid()) {
>>> completeIt.moveToNext();
>>> } else {
>>> completeIt.moveToFirst();
>>> }
>>>
>>> while (completeIt.isValid()
>>> && ((Annotation) completeIt.get()).getBegin() < windowAnnotation
>>> .getBegin()) {
>>> completeIt.moveToNext();
>>> }
>>>
>>> while (completeIt.isValid()
>>> && ((Annotation) completeIt.get()).getBegin() >= windowAnnotation
>>> .getBegin()) {
>>> Annotation annotation = (Annotation) completeIt.get();
>>> if (getCas().getTypeSystem().subsumes(type, annotation.getType())
>>> && annotation.getEnd() <= windowAnnotation.getEnd()) {
>>> result.add(annotation);
>>> }
>>> completeIt.moveToNext();
>>> }
>>> return result;
>>> }
>>>
>>> I tried already many variations of this (e.g., with the index of the
>>> given type, using a new created frame with a higher type priority), but
>>> this method returns sometimes simply wrong annotations or no annotations
>>> at all. The only method that seems to work all the time is to iterate
>>> over all annotations of the given type and then filter those that are
>>> not contained in the window.
>>>
>>> My problem occurs in many different shapes. Here's one short example:
>>>
>>> Arguments:
>>>
>>> windowAnnotation:
>>> Annotation
>>> sofa: _InitialView
>>> begin: 10899
>>> end: 10906
>>> Covered Text: {.8}Bar
>>>
>>> type:
>>> de.uniwue.casetrain.Answers.answerText
>>>
>>>
>>> Information in the eclipse debug view (Variables):
>>> SubAnnotations:
>>> [0]
>>> WordAnswer
>>> sofa: _InitialView
>>> begin: 10899
>>> end: 10906
>>> SimpleFeedback: <null>
>>> PosFactor: 0.8
>>> NegFactor: 0.0
>>> InstanceIsUserAnswer: false
>>> Text: <null>
>>> IsRegularExpression: false
>>> EditDistance: 0
>>> [1]
>>> WordAnswer
>>> sofa: _InitialView
>>> begin: 10886
>>> end: 10892
>>> SimpleFeedback: <null>
>>> PosFactor: 1.0
>>> NegFactor: 0.0
>>> InstanceIsUserAnswer: false
>>> Text: answerText
>>> sofa: _InitialView
>>> begin: 10889
>>> end: 10892
>>> IsRegularExpression: false
>>> EditDistance: 0
>>> [2]
>>> paragraph
>>> sofa: _InitialView
>>> begin: 10899
>>> end: 10906
>>> ...
>>> [20]
>>> answerText
>>> sofa: _InitialView
>>> begin: 10903
>>> end: 10906
>>>
>>> ...and so on....
>>>
>>> In this example the method above simply stops at the second annotations
>>> since start offset is smaller than the one of the window. How can this
>>> be? Is the position in the index depending on the values of the
>>> features? Did I miss something here? In other examples the method above
>>> just steps over the important annotations. Using the brute force
>>> approach is not really a solution for me; it slows down my system by
>>> more than 50%.
>>>
>>> I would really appreciate any hints to solve this problem since this
>>> method/functionality is quite essential for my application. Of course, i
>>> would also not reject any advice about performance.
>>>
>>> Best regards
>>>
>>> Peter
>>>