Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Mario Juric Fri, 06 Nov 2020 00:12:16 -0800

Thanks Richard,

I’ll fix the commit messages and do a regular PR later today, which will also 
includes some extra variants that I added. You can then let me know what 
additional changes I need to do.


Cheers
Mario


> On 6 Nov 2020, at 01.36, Richard Eckart de Castilho <r...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize 
> the sender and know that the content is safe.
>
>
> Hi,
>
> thanks for the changes. Measuring at a higher resolution makes sense and
> measuring user time instead of system time also makes sense.
>
> I tweaked the benchmark a bit more, added the option to have a max duration 
> for each
> batch and a few other changes.
>
> It would be great if we could just do a PR with your changes, but the
> commit messages are supposed to have a specific format including a
> Jira issue number. So at least the commit messages would need to be
> rewritten or the PR would need to be squash-merged. The format of the
> commit messages should look as follows:
>
> ```
> [UIMA-6291] Improve uimaFIT benchmarking module
>
> - Remove unused dependency
> - ... other changes
> - ... other changes
> ```
>
>> On 4. Nov 2020, at 17:34, Mario Juric <mario.ju...@cactusglobal.com> wrote:
>>
>> Sorted by execution time:
>> 19308000ns / 7667000ns -- WARM-UP
>> 97388000ns / 3862000ns -- 
>> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), 
>> s.getEnd()).forEach(t -> {})
>> 114012000ns / 7585000ns -- 
>> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
>> 139689000ns / 11349000ns -- JCAS.select(Token.class).coveredBy(s.getBegin(), 
>> s.getEnd()).forEach(t -> {})
>> 158952000ns / 30894000ns -- selectCovered(JCAS, Token.class, s.getBegin(), 
>> s.getEnd()).forEach(t -> {})
>> 189711000ns / 12981000ns -- selectCovered(Token.class, s).forEach(t -> {})
>> 233259000ns / 12051000ns -- JCAS.select(Token.class).coveredBy(s).forEach(t 
>> -> {})
>> 737271000ns / 30204000ns -- 
>> JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, 
>> s)).forEach(t -> {})
>> 846053000ns / 45272000ns -- JCAS.select(Token.class).filter(t -> 
>> coveredBy(t, s)).forEach(t -> {})
>
> So I didn't commit my changes anywhere yet either because I made them on top 
> of your branch for the moment ;)
>
> But here are a few runs with the setup I did. I tried to make it so that the 
> order of the methods is more stable,
> but as you can see they still switch around a bit.
>
> == Run 1
> AVG: 0.432ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> 
> {})
> AVG: 0.467ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.474ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.475ms (10 seconds total) -- selectCovered(JCAS, Token.class, 
> s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.487ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), 
> s.getEnd()).forEach(t -> {})
> AVG: 0.492ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.661ms (13 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, 
> s)).forEach(t -> {})
> AVG: 0.701ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> 
> coveredBy(t, s)).forEach(t -> {})
>
> == Run 2
> AVG: 0.429ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> 
> {})
> AVG: 0.463ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.465ms (10 seconds total) -- selectCovered(JCAS, Token.class, 
> s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.475ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), 
> s.getEnd()).forEach(t -> {})
> AVG: 0.477ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.479ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.654ms (13 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, 
> s)).forEach(t -> {})
> AVG: 0.700ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> 
> coveredBy(t, s)).forEach(t -> {})
>
> == Run 3
> AVG: 0.431ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> 
> {})
> AVG: 0.463ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.464ms (10 seconds total) -- selectCovered(JCAS, Token.class, 
> s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.471ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.488ms (10 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), 
> s.getEnd()).forEach(t -> {})
> AVG: 0.492ms (10 seconds total) -- 
> JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.653ms (13 seconds total) -- 
> JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, 
> s)).forEach(t -> {})
> AVG: 0.698ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> 
> coveredBy(t, s)).forEach(t -> {})
>
> The choice for a specific approach really needs to look at the scalability. 
> E.g.:
>
> ======================================================
> JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> ======================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.004 | max:      4.441 | median:      0.011 | 
> cumulative:    211.239 | fail:    0]
> [    100/  10000 | min:      0.052 | max:      0.456 | median:      0.057 | 
> cumulative:    681.855 | fail:    0]
> [   1000/   2493 | min:      1.723 | max:      5.437 | median:      1.891 | 
> cumulative:   4927.217 | fail:    0 | time limit exceeded]
> [  10000/     37 | min:    127.892 | max:    146.936 | median:    135.230 | 
> cumulative:   5024.777 | fail:    0 | time limit exceeded]
>
> vs.
>
> ======================================================================
> JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
> ======================================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.005 | max:      2.201 | median:      0.007 | 
> cumulative:     90.482 | fail:    0]
> [    100/  10000 | min:      0.275 | max:      0.774 | median:      0.277 | 
> cumulative:   2858.870 | fail:    0]
> [   1000/    190 | min:     25.235 | max:     28.488 | median:     26.280 | 
> cumulative:   4996.646 | fail:    0 | time limit exceeded]
> [  10000/      2 | min:   2698.091 | max:   2705.892 | median:   2701.992 | 
> cumulative:   5403.983 | fail:    0 | time limit exceeded]
>
> The micro-average which is currently in the first column of the summary 
> report doesn't really reflect how the second approach here scales in 
> comparison to the first.
>
> Btw. the time limit was configured to be ~5 secs for each batch (each row in 
> the summary).
>
> Considering the results for t-coveredBy-s, I am slightly surprised that 
> t-covering-s is per results not so much slower:
>
> =====================================================
> JCAS.select(Token.class).covering(s).forEach(t -> {})
> =====================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.004 | max:      3.467 | median:      0.009 | 
> cumulative:    167.222 | fail:    0]
> [    100/  10000 | min:      0.059 | max:      0.800 | median:      0.060 | 
> cumulative:    638.144 | fail:    0]
> [   1000/   2254 | min:      1.995 | max:     18.008 | median:      2.194 | 
> cumulative:   4978.578 | fail:    0 | time limit exceeded]
> [  10000/     33 | min:    150.032 | max:    158.695 | median:    155.340 | 
> cumulative:   5121.281 | fail:    0 | time limit exceeded]
>
> That might be because the CASes are only generated to by max 130 offset 
> positions long leading to a quite high density of annotations
> and thus t-coveredBy-s may not be able to profit too much from its abiltiy to 
> seek fast and to abort early. Probably the benchmark
> shoud be adjusted to expand the offset space as more annotations get added 
> for a lower annotation density.
>
> ... or there is some measurement/calculation bug ...
>
> Cheers,
>
> -- Richard


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed 
solely for the use of the intended addressee or addressees and may contain 
information that is legally privileged, confidential, and exempt from 
disclosure. If you have received this email in error, please notify the sender 
by telephone, fax, or return email and immediately delete this email and any 
files transmitted along with it. Unintended recipients are not authorized to 
disclose, disseminate, distribute, copy or take any action in reliance on 
information contained in this email and/or any files attached thereto, in any 
manner other than to notify the sender; any unauthorized use is subject to 
legal prosecution.

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Reply via email to