[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410183701
 
 
   Ok, so I had to know, so I went ahead and did benchmarks if we do the other 
way and sort at persist time.
   
   no rollup opportunity:
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   ScoreError  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  499315.212 ± 154036.971  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  449792.742 ± 28218.504  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  508051.563 ±  63033.662  us/op
   
   all size: [3038874] bytes.
   ```
   moderate rollup opportunity:
   
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   Score   Error  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  406840.576 ± 20732.769  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  431725.214 ± 18793.693  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  494056.572 ± 34396.770  us/op
   
   rollup: size [2285574] bytes.
   no-rollup:  size [2741399] bytes.
   ordered-no-rollup:  size [2516639] bytes.
   ```
   more rollup:
   
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   Score   Error  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  338251.339 ± 22031.319  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  443272.327 ± 25099.425  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  552234.263 ± 41889.207  us/op
   
   rollup: size [1755456] bytes.
   no-rollup:  size [2741017] bytes.
   ordered-no-rollup:  size [2346649] bytes.
   ```
   
   I don't have strong feelings about the best way to do this, persist 
performance cost looks to be on the range of 15-20% slower here. Maybe better 
to sort at persist time to not risk impact to query performance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475
 
 
   I ran some additional benchmarks after realizing that the generated rows 
from previous benchmarks were rows with no opportunity for actual rollup to 
occur (all segments were approximately the same size for the numbers above).
   
   Here are timeseries benches
   
   with moderate rollup opportunity:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  663840.128 ± 
26363.127  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  679784.179 ± 
81577.842  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   25   62446.589 ±  2224.296  
us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   ```
   
   and heavy rollup potential:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  653316.845 ± 
31964.338  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  769623.711 ± 
12299.182  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   256545.777 ±   607.087  
us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   and TopN:
   moderate rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score  
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  893805.325 ± 
9592.710  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  898036.822 ± 
8052.554  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   25   86100.936 ± 
2844.073  us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   
   ```
   
   heavy rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score   
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  888967.034 ± 
25098.293  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  987568.305 ± 
50955.718  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   258820.929 ±   
699.516  us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   It would appear that performance difference is more notable when the `Deque` 
are deeper, at least for topN and timeseries, since previous benchmarks were 
basically comparing flat maps with the same number of keys and single element 
`Deque`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-02 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410053497
 
 
   @gianm Cool, I'll go ahead and make a PR as soon as I get some tests 
finished up. I'm not certain the cause of slower group-by performance, group by 
v2 doesn't seem to have any special additional interaction with facts holder. 
Is this worth looking into further?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-01 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-409771369
 
 
   I did some digging and the more straightforward solution to this problem is 
to just store the rows sorted in `PlainFactsHolder` similar to how 
`RollupFactsHolder` is currently doing, changing from `ConcurrentMap>` to `ConcurrentMap>` and using the `IncrementalIndexRowComparator`. I 
think the alternative is finding all calls to `FactsHolder.iterator` and 
`FactsHolder.keySet` used for persist and replacing those with a version that 
sorts `IncrementalIndexRow` with the comparator (unless I'm missing another 
simpler place where this could be done). 
   
   I did some benchmarking and there does appear to be some cost to doing the 
sorting up front, I'm not sure on the range of what is an acceptable amount of 
overhead.
   
   ```
   Benchmark  (rollupOption)  Mode  Cnt 
  Score   Error  Units
   IncrementalIndexRowTypeBenchmark.normalFloats  rollup  avgt   40 
 19.965 ± 1.780  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats   no-rollup  avgt   40 
 10.596 ± 1.055  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats   ordered-no-rollup  avgt   40 
 21.925 ± 1.323  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs   rollup  avgt   40 
 19.108 ± 1.286  us/op
   IncrementalIndexRowTypeBenchmark.normalLongsno-rollup  avgt   40 
 10.107 ± 1.042  us/op
   IncrementalIndexRowTypeBenchmark.normalLongsordered-no-rollup  avgt   40 
 21.967 ± 1.406  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings rollup  avgt   40 
 20.489 ± 2.442  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings  no-rollup  avgt   40 
  9.352 ± 0.105  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings  ordered-no-rollup  avgt   40 
 20.986 ± 0.508  us/op
   ```
   
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)(schemaAndQuery)  (threshold)  Mode  CntScore   
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75 basic.A   10  avgt   25   950647.742 ± 
20969.530  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.numericSort   10  avgt   25   230487.526 ± 
29340.615  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75  basic.alphanumericSort   10  avgt   25   218782.138 ± 
 6203.484  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75 basic.A   10  avgt   25   945842.924 ± 
12074.015  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.numericSort   10  avgt   25   222019.610 ± 
 3486.365  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75  basic.alphanumericSort   10  avgt   25   223015.114 ± 
 3130.184  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75 basic.A   10  avgt   25  1347085.823 ± 
12655.001  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.numericSort   10  avgt   25   204926.129 ± 
 4846.150  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75  basic.alphanumericSort   10  avgt   25   201050.213 ± 
 6559.034  us/op
   ```
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   
 Score   Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25   
921919.453 ± 25357.840  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.timeFilterNumeric  avgt   25
69240.969 ±  1403.393  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75  basic.timeFilterAlphanumeric  avgt   25   
152974.422 ±  2181.950  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75basic.timeFilterByInterval  avgt   25
16752.936 ±   406.768  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25   
906129.041 ± 19497.575  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.timeFilterNumeric  avgt   25
66989.537 ±