[GitHub] clintropolis edited a comment on issue #6066: Sorting rows when rollup is disabled

GitBox Thu, 02 Aug 2018 23:09:07 -0700

clintropolis edited a comment on issue #6066: Sorting rows when rollup is 
disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475
 
 
   I ran some additional benchmarks after realizing that the generated rows 
from previous benchmarks were rows with no opportunity for actual rollup to 
occur (all segments were approximately the same size for the numbers above).
   
   Here are timeseries benches
   
   with moderate rollup opportunity:
   ```
   Benchmark                                        (numSegments)     
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score       
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000           basic.A  avgt   25  663840.128 ± 
26363.127  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000           basic.A  avgt   25  679784.179 ± 
81577.842  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000           basic.A  avgt   25   62446.589 ±  2224.296  
us/op
   
   no-rollup:          size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup:             size [2206430] bytes.
   ```
   
   and heavy rollup potential:
   ```
   Benchmark                                        (numSegments)     
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score       
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex              1          
no-rollup            750000           basic.A  avgt   25  653316.845 ± 
31964.338  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  
ordered-no-rollup            750000           basic.A  avgt   25  769623.711 ± 
12299.182  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             
rollup            750000           basic.A  avgt   25    6545.777 ±   607.087  
us/op
   
   no-rollup:          size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup:             size [237206] bytes.
   ```
   
   and TopN:
   moderate rollup:
   ```
   Benchmark                                  (numSegments)     (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt       Score      
Error  Units
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup  
          750000           basic.A           10  avgt   25  893805.325 ± 
9592.710  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup  
          750000           basic.A           10  avgt   25  898036.822 ± 
8052.554  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup  
          750000           basic.A           10  avgt   25   86100.936 ± 
2844.073  us/op
   
   no-rollup:          size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup:             size [2206430] bytes.
   
   ```
   
   heavy rollup:
   ```
   Benchmark                                  (numSegments)     (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt       Score       
Error  Units
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup  
          750000           basic.A           10  avgt   25  888967.034 ± 
25098.293  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup  
          750000           basic.A           10  avgt   25  987568.305 ± 
50955.718  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup  
          750000           basic.A           10  avgt   25    8820.929 ±   
699.516  us/op
   
   no-rollup:          size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup:             size [237206] bytes.
   ```
   
   It would appear that performance difference is more notable when the `Deque` 
are deeper, at least for topN and timeseries, since previous benchmarks were 
basically comparing flat maps with the same number of keys and single element 
`Deque`.
   
   Size savings will likely vary quite wildly based on dimension order and 
correlated to how effective rollup would be if were enabled at default 
millisecond granularity. In this case, with a few low cardinality dimensions 
and 1-10k events per timestamp sizes were 20-25% smaller.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] clintropolis edited a comment on issue #6066: Sorting rows when rollup is disabled

Reply via email to