Re: CASImpl.createFilteredIterator performance / Eclipse source plugin

Thilo Goetz Thu, 28 Jan 2010 02:45:11 -0800

Hi Peter,

On 1/27/2010 13:37, Peter Klügl wrote:
> Hello,
> 
> I have some performance issues with a current application. The profiler
> tells me that over 80% of the execution time was spent on the about 200
> calls of the method CASImpl.createFilteredIterator(). These 80% are
> sometimes more than 1000s for one AE.process() and there is a lot more
> moving on the index going on within those 20%.
> 
> I can't investigate the cause for this performace hot spot any further,
> also because I am missing the source plugins for UIMA runtime plugin.
> The application is running within Eclipse. My first question: Is there
> an easy way to get/create a source plugin for the UIMA core/runtime? At
> best without using maven? Any best practices for profiling UIMA in Eclipse?
> 
> My second question: Is that a normal behavior or can anyone give me a
> hint how I could increase the performance?
> 
> Some exemplary information about the usage of the method:
> The CAS contains about 40 pages of plain text with about 50 lines per
> page. Part of the text (maybe 3 pages) is annotated and for each line of
> the segment the methods createFilteredIterator() is called with some
> constraints about types and of course about the window of the iterator
> (that is the line). I also tried the replace the filtered iterator with
> a window constraint with a filtered iterator of a subiterator of the
> annotation index resulting in no real improvement of performance. The
> UIMA version is 2.2.2
> 
> Looking forward to some hint or directions.
> 
> Peter
>


I'm surprised that creating a filtered iterator from a
subiterator did not improve performance.  A window
constraint in a filtered iterator will generally yield
bad performance because the filter is just that: a filter.
There is no intelligence behind it.  What you see in
createFilteredIterator() is the iterator being advanced
to the first annotation that passes the filter.  This is
done by simply starting at the beginning and looking at
each annotation in turn.  A subiterator is smarter, it
uses binary search to find its starting position and
should be significantly faster.

Do you have lots of different annotation types?  That's
also a performance killer, but we've made some improvements
here in 2.3.  You may wish to try the latest code from
trunk and see if it gives you any improvements.  The code
is stable, the release has been approved, it's only a
matter of days until it's generally available.

If your type organization allows it, you can also ask
the CAS for a more specific index/iterator, as opposed
to an iterator over all annotations.  Iterating over
annotations of a leaf type is generally much faster.
That's another thing you don't want to do in a filter
if performance is critical.

So basically, try to get the iterator that goes into
createFilteredIterator() to be as small as possible
to begin with.  Anything you can do not by filtering,
but instead by starting with a smaller collection,
should help.

--Thilo

Re: CASImpl.createFilteredIterator performance / Eclipse source plugin

Reply via email to