[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-11-05 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-549966959
 
 
   >@clintropolis thanks for all the benchmarks, I haven't had the opportunity 
to look at the new developments yet but get back to reviewing this week.
   
   Yeah, no problem, thanks for asking the hard questions to make me collect 
them, the result is the PR is in a better state than before them :metal:. The 
production part of the code hasn't really changed much in the last couple of 
weeks other than a few lines to change behavior of the parallelism computing 
method, and mostly changes to the default values.
   
   >Given the sensitivity of performance to availableProcessors() returned 
value, it might be good to make that area a bit configurable if not already. I 
will hopefully offer more specific suggestion when reviewing again.
   
   This stuff should all be controllable via configs, 
`druid.processing.merge.pool.parallelism` to control the FJP pool size, and 
`druid.merge.pool.defaultMaxQueryParallelism` to control individual query max 
parallelism (this one can also be set on the query context through 
`parallelMergeParallelism`).
   
   In a follow-up I think I will also try to add additional information to the 
cluster tuning guide docs, since I have a pretty good idea how this 
implementation performs now which I think we can use to advise operators.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-11-05 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-54976
 
 
   ### simulated heavy load
   I collected another round of data using the same benchmarks as my 'more 
realistic worst case' comment, but this time plotting what happens when a large 
number of queries all start within a 500ms spread, which might be a more 
typical heavy load, rather than simulating a large concurrent spike of 
simultaneous queries like the last set of results.
   
   In this scenario, parallel merges outperform the same threaded merges until 
much higher concurrency than the concurrent spike model. This is at least 
partially driven by the fact that each individual thread can make a better 
estimate about utilization than is possible in the spike model.
   
   # 'small' sequences
   
![thread-groups-typical-distribution-small-500ms](https://user-images.githubusercontent.com/1577461/68194001-09a57c80-ff69-11e9-9325-d3e70b4853d2.gif)
   
   # 'moderately large' sequences
   
![thread-groups-typical-distribution-moderately-large-500ms](https://user-images.githubusercontent.com/1577461/68193956-f7c3d980-ff68-11e9-9d29-d3ee688d8cb6.gif)
   
   
   # overall average
   
![thread-groups-typical-distribution-average-500ms](https://user-images.githubusercontent.com/1577461/68194026-132ee480-ff69-11e9-9601-921a98188feb.gif)
   
   I think future work could focus on making the concurrent spike behavior a 
bit more chillax through a variety of means, but I find these results to be 
'good enough' for now. 
   
   Anyone want to see any other scenarios?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-11-04 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-549253205
 
 
   ### more realistic worst case
   I reworked the JMH thread based benchmark to use thread groups to examine 
what happens in a more realistic scenario, with the newly renamed 
`ParallelMergeCombiningSequenceThreadedBenchmark`. I find this benchmark to be 
a fair bit less scary than the previous 'worst case' benchmarks, which focused 
on an impossible scenario because I really wanted to dig in and see where and 
how the wheels fell off.
   
   This benchmark models a more 'typical' heavy load, where the majority of the 
queries are smaller result-sets with shorter blocking times and a smaller 
subset are larger result sets with longer initial blocking times. By using 
thread groups we can look at performance for these 'classes' of queries as load 
increases. 
   
   This set was collected with a ratio of 1 'moderately large' query for every 
8 'small' queries, where 'moderately large' is defined as input sequence row 
counts of 50k-75k rows and blocking for 1-2.5 seconds before yielding results, 
and 'small' is defined as input sequence row counts of 500-10k and blocking for 
50-200ms. Keep in mind while reviewing the result that I collected data on a 
significantly higher level of parallelism than I would expect a 16 core machine 
to be realistically configured to handle. I would probably configure an m5.8xl 
with no more than 64 http threads, but collected data points up to 128 
concurrent sequences being processed just to see where things went.
   
   The first plot shows the merge time (y axis) growth as concurrency (x axis) 
increases, animated to show the differences for a given number of input 
sequences (analagous to cluster size).
   
   
![thread-groups-typical-distribution-1-8-small](https://user-images.githubusercontent.com/1577461/68105759-6125e880-fe94-11e9-86a4-cae8fb52b92b.gif)
   
   Note that the x axis is the _total_ concurrency count, not the number of 
threads of this particular group. Also worth pointing out is that the 
degradation of performance happens at a significantly higher level of 
concurrency than the previous (unrealistic) worse case performance, but in 
terms of characteristics, it does share some aspects with the previous plots, 
such as 8 input sequences being a lot more performant than say 64, and after a 
certain threshold, the performance of the parallel approach crosses the limit 
of the same threaded serial merge approach.
   
   
   The larger 'queries' tell a similar tale:
   
   
![thread-groups-typical-distribution-1-8-moderately-large](https://user-images.githubusercontent.com/1577461/68106055-4142f480-fe95-11e9-897b-57c7cf8b4ace.gif)
   
   The differences here when the parallel merge sequence crosses the threshold 
look to me a fair bit less dramatic than the 'small' sequences, but keep in 
mind the 'big jump' in the small sequences only amount to a few hundred 
milliseconds, so it's not quite as dramatic as it appears.
   
   The final plot shows the overall average between both groups:
   
   
![thread-groups-typical-distribution-1-8-average](https://user-images.githubusercontent.com/1577461/68105727-46ec0a80-fe94-11e9-9854-aaae9d8405c7.gif)
   
   which I find a bit less useful than the other 2 plots, but included anyway 
for completeness.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-10-31 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-548611095
 
 
   >Thank you for the detailed benchmark results! It looks great but I wonder 
how the default configuration works under more realistic load. For example, it 
would be more realistic if there are like 80% of light queries and 20% of heavy 
queries that have a shorter delay and a larger delay, respectively.
   
   This sounds good. I think i went a bit hard on this PR in the benchmarks I 
have presented so far in terms of targeting the worst cases which aren't super 
realistic, which I think maybe looks a lot scarier than a typical heavy load 
will appear practice. The existing worst case benchmarks are basically 
depicting what happens if a bunch of moderate to large result set sized queries 
_all happen simultaneously_ and even more _all simultaneously have work to do 
instead of some of them blocking waiting for input_, which should very rarely 
(if ever) happen in the real world.
   
   I will throw together another benchmark to try and plot out a more realistic 
heavy load case to see how that looks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-10-31 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-548611095
 
 
   >Thank you for the detailed benchmark results! It looks great but I wonder 
how the default configuration works under more realistic load. For example, it 
would be more realistic if there are like 80% of light queries and 20% of heavy 
queries that have a shorter delay and a larger delay, respectively.
   
   This sounds good. I think i went a bit hard on this PR in the benchmarks I 
have presented so far in terms of targeting the worst cases which aren't super 
realistic, which I think maybe looks a lot scarier than a typical heavy load 
will appear practice. The existing worst case benchmarks are basically 
depicting what happens if a bunch of moderate to large result set sized queries 
_all happen simultaneously_ and even more _all simultaneously have work to do 
instead of some of them blocking waiting for input_, which should very rarely 
(if ever) happen in the real world.
   
   I'l throw together another benchmark to try and plot out a more realistic 
heavy load case to see how that looks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org