When I use a filtered FSIterator it's an order of magnitude slower than a 
non-filtered iterator.  Here's my code:

Create the iterator:
       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) 
throws CASException {
              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
              FSTypeConstraint constraint = 
aJCas.getConstraintFactory().createTypeConstraint();
              constraint.add((new 
TitlePersonHonorificAnnotation(aJCas)).getType());
              constraint.add((new MeasurementAnnotation(aJCas)).getType());
              constraint.add((new ProgFactorTerm(aJCas)).getType());
              it = aJCas.createFilteredIterator(it, constraint);
              return it;
       }
Use the iterator:
       public void process(JCas aJCas) throws AnalysisEngineProcessException {
              ...
// The following is done in a loop
                           if (shouldSkip(dictTerm, skipIter))
                                  continue;
              ...
       }
Here's the method called:
       private boolean shouldSkip(G2DictTerm dictTerm, FSIterator<Annotation> 
skipIter) throws CASException {
              boolean shouldSkip = false;
              skipIter.moveToFirst();
              while (skipIter.hasNext()) {
                     Annotation annotation = skipIter.next();
                     if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
                           shouldSkip = true;
                           break;
                     }
              }
              return shouldSkip;
       }

If I change the method, createConstrainedIterator(), to this (that is, no 
constraints):
       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) 
throws CASException {
              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
              return it;
       }

It runs literally 10 times faster.  Doing some profiling I see that all of the 
time is spent in the skipIter.moveToFirst() call.  I also tried creating the 
filtered iterator each time anew in the shouldSkip() method instead of passing 
it in, but that has even slightly worse performance.

Given this performance I suppose I should probably use a non-filtered iterator 
and just check for the types I'm interested in inside the loop.

Any other suggestions welcome.

Thanks,
Larry Kline


Reply via email to