[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410183701 Ok, so I had to know, so I went ahead and did benchmarks if we do the other way and sort at persist time. no rollup opportunity: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt ScoreError Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 499315.212 ± 154036.971 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 449792.742 ± 28218.504 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 508051.563 ± 63033.662 us/op all size: [3038874] bytes. ``` moderate rollup opportunity: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt Score Error Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 406840.576 ± 20732.769 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 431725.214 ± 18793.693 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 494056.572 ± 34396.770 us/op rollup: size [2285574] bytes. no-rollup: size [2741399] bytes. ordered-no-rollup: size [2516639] bytes. ``` more rollup: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt Score Error Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 338251.339 ± 22031.319 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 443272.327 ± 25099.425 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 552234.263 ± 41889.207 us/op rollup: size [1755456] bytes. no-rollup: size [2741017] bytes. ordered-no-rollup: size [2346649] bytes. ``` I don't have strong feelings about the best way to do this, persist performance cost looks to be on the range of 15-20% slower here. Maybe better to sort at persist time to not risk impact to query performance? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475 I ran some additional benchmarks after realizing that the generated rows from previous benchmarks were rows with no opportunity for actual rollup to occur (all segments were approximately the same size for the numbers above). Here are timeseries benches with moderate rollup opportunity: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 663840.128 ± 26363.127 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 679784.179 ± 81577.842 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 25 62446.589 ± 2224.296 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` and heavy rollup potential: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 653316.845 ± 31964.338 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 769623.711 ± 12299.182 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 256545.777 ± 607.087 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` and TopN: moderate rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 893805.325 ± 9592.710 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 898036.822 ± 8052.554 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 25 86100.936 ± 2844.073 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` heavy rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 888967.034 ± 25098.293 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 987568.305 ± 50955.718 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 258820.929 ± 699.516 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` It would appear that performance difference is more notable when the `Deque` are deeper, at least for topN and timeseries, since previous benchmarks were basically comparing flat maps with the same number of keys and single element `Deque`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410053497 @gianm Cool, I'll go ahead and make a PR as soon as I get some tests finished up. I'm not certain the cause of slower group-by performance, group by v2 doesn't seem to have any special additional interaction with facts holder. Is this worth looking into further? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-409771369 I did some digging and the more straightforward solution to this problem is to just store the rows sorted in `PlainFactsHolder` similar to how `RollupFactsHolder` is currently doing, changing from `ConcurrentMap>` to `ConcurrentMap>` and using the `IncrementalIndexRowComparator`. I think the alternative is finding all calls to `FactsHolder.iterator` and `FactsHolder.keySet` used for persist and replacing those with a version that sorts `IncrementalIndexRow` with the comparator (unless I'm missing another simpler place where this could be done). I did some benchmarking and there does appear to be some cost to doing the sorting up front, I'm not sure on the range of what is an acceptable amount of overhead. ``` Benchmark (rollupOption) Mode Cnt Score Error Units IncrementalIndexRowTypeBenchmark.normalFloats rollup avgt 40 19.965 ± 1.780 us/op IncrementalIndexRowTypeBenchmark.normalFloats no-rollup avgt 40 10.596 ± 1.055 us/op IncrementalIndexRowTypeBenchmark.normalFloats ordered-no-rollup avgt 40 21.925 ± 1.323 us/op IncrementalIndexRowTypeBenchmark.normalLongs rollup avgt 40 19.108 ± 1.286 us/op IncrementalIndexRowTypeBenchmark.normalLongsno-rollup avgt 40 10.107 ± 1.042 us/op IncrementalIndexRowTypeBenchmark.normalLongsordered-no-rollup avgt 40 21.967 ± 1.406 us/op IncrementalIndexRowTypeBenchmark.normalStrings rollup avgt 40 20.489 ± 2.442 us/op IncrementalIndexRowTypeBenchmark.normalStrings no-rollup avgt 40 9.352 ± 0.105 us/op IncrementalIndexRowTypeBenchmark.normalStrings ordered-no-rollup avgt 40 20.986 ± 0.508 us/op ``` ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment)(schemaAndQuery) (threshold) Mode CntScore Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 950647.742 ± 20969.530 us/op TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.numericSort 10 avgt 25 230487.526 ± 29340.615 us/op TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.alphanumericSort 10 avgt 25 218782.138 ± 6203.484 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 945842.924 ± 12074.015 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.numericSort 10 avgt 25 222019.610 ± 3486.365 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.alphanumericSort 10 avgt 25 223015.114 ± 3130.184 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 25 1347085.823 ± 12655.001 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.numericSort 10 avgt 25 204926.129 ± 4846.150 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.alphanumericSort 10 avgt 25 201050.213 ± 6559.034 us/op ``` ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 921919.453 ± 25357.840 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.timeFilterNumeric avgt 25 69240.969 ± 1403.393 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.timeFilterAlphanumeric avgt 25 152974.422 ± 2181.950 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75basic.timeFilterByInterval avgt 25 16752.936 ± 406.768 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 906129.041 ± 19497.575 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.timeFilterNumeric avgt 25 66989.537 ±