Very nice, thank you so much :-)
> On 21 Oct 2015, at 18:26, Richard Eckart de Castilho <[email protected]> wrote:
>
> Hi,
>
> 1 uses the UIMA indexes which I believe use a binary search, so it should
> be something like O(log n).
>
> 2 is in principle O(n) but since it does a linear scan from the beginning
> and stops when no further annotations may be found, it practice O(n)
> should be the upper bound when called for annotations towards the end of
> the document.
>
> 3 is fastest for repeated use. It should be O(n) for creating
> the index and then uses hashmap lookups.
>
> So 1 and 3 are better than two.
>
> If you need speed and need coverage information a lot, 3 should be the best.
>
> 1 and 2 are more convenient for coding.
>
> If you use plain UIMA and have type priorities set up, then using an
> iterator over sentences and a subiterator over tokens is likely to
> be better than 3 because it doesn't need the initial scan that 3 does.
>
> I'm not aware that anybody did extensive performance comparisons here.
> Some are being done in
> org.apache.uima.fit.util.JCasUtilTest.testSelectCoverRandom()
> which compares 1 and 2. Here a few lines from the test output (mind to
> increase the
> ITERATIONS variable if you try):
>
> ...
> Speed up factor 5.50 [naive:11 optimized:2 diff:9]
> Speed up factor 6.67 [naive:20 optimized:3 diff:17]
> Speed up factor 4.00 [naive:16 optimized:4 diff:12]
> Speed up factor 2.50 [naive:30 optimized:12 diff:18]
> Speed up factor 7.00 [naive:35 optimized:5 diff:30]
> Speed up factor 5.63 [naive:45 optimized:8 diff:37]
> Speed up factor 7.78 [naive:70 optimized:9 diff:61]
> Speed up factor 8.09 [naive:89 optimized:11 diff:78]
> ...
>
> Cheers,
>
> -- Richard
>
>> On 21.10.2015, at 17:07, Erik Fäßler <[email protected]> wrote:
>>
>> Hi all,
>>
>> I’m wondering about the performance differences between
>>
>> 1) JCasUtil.selectCovered(JCas, Class<T>, AnnotationFS),
>> 2) JCasUtil.selectCovered(JCas, Class<T>, int, int) and
>> 3) JCasUtil.indexCovered(JCas, Class<T>, Class<S>)
>>
>> It is clear that 3) iterates once through the CAS and just returns a map.
>> Once this is done, map access is swift.
>>
>> The Javadoc of 2) states that it is slower than 1).
>> 3) states that it is preferable to 2).
>>
>> Questions:
>> Is 3) also preferable over 2) when there is only one covering annotation or
>> is the performance of 2) and 3) roughly equal then?
>> Main question: Is 3) also quicker than 1) if there are many covering
>> annotations?
>>
>> Use case: I want to iterate through all sentences in paragraphs. Normally, I
>> would use subiterators(), but the known type priority issue could be a
>> problem for me. Should I just use 1)? Or would I still benefit from 3) if I
>> have more than one paragraph?
>>
>> Thank you very much!
>>
>> Best,
>>
>> Erik
>