Hi Richard, This is really useful, and it makes me much more positive about the select API performance than the previous benchmarks suggested, but you need to know what calling approach works best, e.g. getting the annotation index first and doing select on this performs better.
I added support for nano seconds CPU time to Benchmark and changed the SelectBenchmark to use user timer, which I think is much more accurate about the actual time spent in the select operations. However, in this case I don’t think it really changes the overall conclusion to the normal system clock version, although I haven’t compared the results line by line. You can find the branch here: https://github.com/mjunsilo/uima-uimafit/tree/mjuric/benchmark-cpu-time Do I need to make a Jira issue for this? Let me first know if there is any interest. Cheers Mario On 4 Nov 2020, at 11.08, Richard Eckart de Castilho <r...@apache.org<mailto:r...@apache.org>> wrote: External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe. Hi all, for those who are interested - here are a few results of benchmarking the access to annotations in the CAS using different approaches. These were done on a Macbook Pro (i7 2,2 GHz) basically under working conditions (many applications open, etc.) Used versions: - uimaj-core: 3.1.2-SNAPSHOT (commit 099a2e0a9) - uimafit-core: 3.1.1-SNAPSHOT (commit 72895b5c8) The benchmarks basically fill a CAS with random annotations (Sentence and Token type, but they do not behave like sentences/tokens usually would - i.e. they are generated at random positions and so may arbitrarily overlap with each other). All annotations start/end within a range of [0-130] and have a random length between 0 and 30. The benchmarks fill the CAS multiple times with increasing numbers of annotations and perform the selections repeatedly. If you want more details, check out the uimafit-benchmark module and run them yourself ;) The first timing is the cumulative time spend by benchmark. The second timing is the longest duration of a single execution. As for insights: * Don't look at the times in terms of absolute values - rather consider how the time of one approach behaves relative to the time of another approach. * I find it quite interesting that selects are slower when using JCAS.select(type) than when using JCAS.getAnnotationIndex(type).select(). I would expect both to run at the same speed. * Contrary to previous benchmark results, we can see that the (J)CAS.select() is typically faster than its uimaFIT counterpart with a few interesting exceptions. * Note that there is no CAS.select().overlapping() equivalent to the JCasUtil.selectOverlapping (yet) If you would like to see additional approaches measured or if you have ideas of how to improve the informativeness or general setup of the benchmarks, let me know. For small changes, you could also just open a PR on GitHub against uimaFIT master. Cheers, -- Richard GROUP: select ========================= Sorted by execution time: 1136ms / 2ms -- JCAS.select(Token.class).forEach(x -> {}) 1231ms / 3ms -- JCasUtil.select(JCAS, Token.class).forEach(x -> {}) 2679ms / 4ms -- JCAS.select(TOP.class).forEach(x -> {}) 2703ms / 4ms -- JCAS.select().forEach(x -> {}) 3803ms / 6ms -- JCasUtil.select(JCAS, TOP.class).forEach(x -> {}) 3997ms / 16ms -- JCasUtil.selectAll(JCAS).forEach(x -> {}) GROUP: select covered by ========================= Sorted by execution time: 84ms / 5ms -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {}) 134ms / 11ms -- JCasUtil.selectCovered(Token.class, s).forEach(t -> {}) 159ms / 11ms -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {}) 836ms / 46ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {}) 842ms / 46ms -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {}) GROUP: select covering ========================= Sorted by execution time: 98ms / 5ms -- JCAS.getAnnotationIndex(Token.class).select().covering(s).forEach(t -> {}) 109ms / 6ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).select().covering(s).forEach(t -> {}) 157ms / 7ms -- CasUtil.selectCovering(tokenType, s).forEach(t -> {}) 170ms / 20ms -- JCasUtil.selectCovering(Token.class, s).forEach(t -> {}) 187ms / 14ms -- JCAS.select(Token.class).covering(s).forEach(t -> {}) 812ms / 47ms -- JCAS.select(Token.class).filter(t -> covering(t, s)).forEach(t -> {}) 862ms / 45ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).stream().filter(t -> covering(t, s)).forEach(t -> {}) 1039ms / 65ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> covering(t, s)).forEach(t -> {}) GROUP: select at ========================= Sorted by execution time: 31ms / 2ms -- JCAS.select(Token.class).at(s).forEach(t -> {}) 65ms / 4ms -- JCAS.select(Token.class).at(s.getBegin(), s.getEnd()).forEach(t -> {}) 109ms / 29ms -- JCasUtil.selectAt(CAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {}) 880ms / 41ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> colocated(t, s)).forEach(t -> {}) 936ms / 47ms -- JCAS.select(Token.class).filter(t -> colocated(t, s)).forEach(t -> {}) GROUP: select overlapping ========================= Sorted by execution time: 238ms / 34ms -- JCasUtil.selectOverlapping(JCAS, Token.class, s).forEach(t -> {}) 354ms / 22ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> overlapping(t, s)).forEach(t -> {}) 381ms / 24ms -- CAS.select(Token.class).filter(t -> overlapping(t, s)).forEach(t -> {}) ________________________________ Disclaimer: This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.