When I use a filtered FSIterator it's an order of magnitude slower than a
non-filtered iterator. Here's my code:
Create the iterator:
private FSIterator<Annotation> createConstrainedIterator(JCas aJCas)
throws CASException {
FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
FSTypeConstraint constraint =
aJCas.getConstraintFactory().createTypeConstraint();
constraint.add((new
TitlePersonHonorificAnnotation(aJCas)).getType());
constraint.add((new MeasurementAnnotation(aJCas)).getType());
constraint.add((new ProgFactorTerm(aJCas)).getType());
it = aJCas.createFilteredIterator(it, constraint);
return it;
}
Use the iterator:
public void process(JCas aJCas) throws AnalysisEngineProcessException {
...
// The following is done in a loop
if (shouldSkip(dictTerm, skipIter))
continue;
...
}
Here's the method called:
private boolean shouldSkip(G2DictTerm dictTerm, FSIterator<Annotation>
skipIter) throws CASException {
boolean shouldSkip = false;
skipIter.moveToFirst();
while (skipIter.hasNext()) {
Annotation annotation = skipIter.next();
if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
shouldSkip = true;
break;
}
}
return shouldSkip;
}
If I change the method, createConstrainedIterator(), to this (that is, no
constraints):
private FSIterator<Annotation> createConstrainedIterator(JCas aJCas)
throws CASException {
FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
return it;
}
It runs literally 10 times faster. Doing some profiling I see that all of the
time is spent in the skipIter.moveToFirst() call. I also tried creating the
filtered iterator each time anew in the shouldSkip() method instead of passing
it in, but that has even slightly worse performance.
Given this performance I suppose I should probably use a non-filtered iterator
and just check for the types I'm interested in inside the loop.
Any other suggestions welcome.
Thanks,
Larry Kline