[
https://issues.apache.org/jira/browse/UIMA-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Branimir Lambov updated UIMA-1364:
----------------------------------
Attachment: ConcurrentModificationPatch.txt
> Concurrent modification checks dominate index iteration time.
> -------------------------------------------------------------
>
> Key: UIMA-1364
> URL: https://issues.apache.org/jira/browse/UIMA-1364
> Project: UIMA
> Issue Type: Improvement
> Reporter: Branimir Lambov
> Attachments: ConcurrentModificationPatch.txt
>
>
> Iterating over the annotation index with even a moderate number of defined
> types is dominated by the time spent checking individual indexes for
> concurrent modification. This is due to the fact that concurrent modification
> checks are done on all types being iterated over, even if the iteration only
> needs to process a couple of iterators. In fact, checking all iterators for
> modification has linear complexity in the number of subiterators used, while
> the actual iteration can be implemented with logarithmic complexity using
> e.g. a binary heap.
> The UIMA documentation and JavaDoc do not state that the iterators should
> always recognize concurrent modification (FSIterator JavaDoc states
> "Implementations of this interface are not required to be fail-fast. That is,
> if the iterator's collection is modified, the effects on the iterator are in
> general undefined."). It thus makes sense to reduce the number of iterators
> being tested for concurrent modification at each moveToNext() step.
> The attached patch replaces the checkConcurrentModificationAll() call in
> FSIndexRepositoryImpl.PointerIterator.moveToNext() with concurrent
> modification checks on only the iterators being used by the step; as the
> iterator becomes invalid it also checks all involved iterators for
> modification. By doing this it should be able to catch almost all concurrent
> modification without the excessive overhead.
> In one of our performance tests iterating over the annotation index with 140
> types defined is more than twice faster after the attached patch is applied.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.