Hi Peter, On 1/27/2010 13:37, Peter Klügl wrote: > Hello, > > I have some performance issues with a current application. The profiler > tells me that over 80% of the execution time was spent on the about 200 > calls of the method CASImpl.createFilteredIterator(). These 80% are > sometimes more than 1000s for one AE.process() and there is a lot more > moving on the index going on within those 20%. > > I can't investigate the cause for this performace hot spot any further, > also because I am missing the source plugins for UIMA runtime plugin. > The application is running within Eclipse. My first question: Is there > an easy way to get/create a source plugin for the UIMA core/runtime? At > best without using maven? Any best practices for profiling UIMA in Eclipse? > > My second question: Is that a normal behavior or can anyone give me a > hint how I could increase the performance? > > Some exemplary information about the usage of the method: > The CAS contains about 40 pages of plain text with about 50 lines per > page. Part of the text (maybe 3 pages) is annotated and for each line of > the segment the methods createFilteredIterator() is called with some > constraints about types and of course about the window of the iterator > (that is the line). I also tried the replace the filtered iterator with > a window constraint with a filtered iterator of a subiterator of the > annotation index resulting in no real improvement of performance. The > UIMA version is 2.2.2 > > Looking forward to some hint or directions. > > Peter >
I'm surprised that creating a filtered iterator from a subiterator did not improve performance. A window constraint in a filtered iterator will generally yield bad performance because the filter is just that: a filter. There is no intelligence behind it. What you see in createFilteredIterator() is the iterator being advanced to the first annotation that passes the filter. This is done by simply starting at the beginning and looking at each annotation in turn. A subiterator is smarter, it uses binary search to find its starting position and should be significantly faster. Do you have lots of different annotation types? That's also a performance killer, but we've made some improvements here in 2.3. You may wish to try the latest code from trunk and see if it gives you any improvements. The code is stable, the release has been approved, it's only a matter of days until it's generally available. If your type organization allows it, you can also ask the CAS for a more specific index/iterator, as opposed to an iterator over all annotations. Iterating over annotations of a leaf type is generally much faster. That's another thing you don't want to do in a filter if performance is critical. So basically, try to get the iterator that goes into createFilteredIterator() to be as small as possible to begin with. Anything you can do not by filtering, but instead by starting with a smaller collection, should help. --Thilo
