Concurrent modification checks dominate index iteration time.
-------------------------------------------------------------
Key: UIMA-1364
URL: https://issues.apache.org/jira/browse/UIMA-1364
Project: UIMA
Issue Type: Improvement
Reporter: Branimir Lambov
Attachments: ConcurrentModificationPatch.txt
Iterating over the annotation index with even a moderate number of defined
types is dominated by the time spent checking individual indexes for concurrent
modification. This is due to the fact that concurrent modification checks are
done on all types being iterated over, even if the iteration only needs to
process a couple of iterators. In fact, checking all iterators for modification
has linear complexity in the number of subiterators used, while the actual
iteration can be implemented with logarithmic complexity using e.g. a binary
heap.
The UIMA documentation and JavaDoc do not state that the iterators should
always recognize concurrent modification (FSIterator JavaDoc states
"Implementations of this interface are not required to be fail-fast. That is,
if the iterator's collection is modified, the effects on the iterator are in
general undefined."). It thus makes sense to reduce the number of iterators
being tested for concurrent modification at each moveToNext() step.
The attached patch replaces the checkConcurrentModificationAll() call in
FSIndexRepositoryImpl.PointerIterator.moveToNext() with concurrent modification
checks on only the iterators being used by the step; as the iterator becomes
invalid it also checks all involved iterators for modification. By doing this
it should be able to catch almost all concurrent modification without the
excessive overhead.
In one of our performance tests iterating over the annotation index with 140
types defined is more than twice faster after the attached patch is applied.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.