GitHub user mayuehappy opened a pull request:

    https://github.com/apache/spark/pull/19233

    [Spark-22008][Streaming]Spark Streaming Dynamic Allocation auto fix 
maxNumExecutors

    In SparkStreaming DRA .The metric we use to add or remove executor is the 
ratio of batch processing time / batch duration (R). And we use the parameter 
"spark.streaming.dynamicAllocation.maxExecutors" to set the max Num of executor 
.Currently it doesn't work well with Spark streaming because of several reasons:
    (1) For example if the max nums of executor we need is 10 and we set 
"spark.streaming.dynamicAllocation.maxExecutors" to 15,Obviously ,We wasted 5 
executors.
    (2) If the number of topic partition changes ,then the partition of 
KafkaRDD or the num of tasks in a stage changes too.And the max executor we 
need will also change,so the num of maxExecutors should change with the nums of 
Task .
    The goal of this JIRA is to auto fix maxNumExecutors . Using a 
SparkListerner when Stage Submitted ,first figure out the num executor we need 
, then update the maxNumExecutor

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mayuehappy/spark my-spark2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19233
    
----
commit b687b5897c7d8a11d5ace8ce2667774f3b6c4fd6
Author: mayue13 <[email protected]>
Date:   2017-09-14T07:07:58Z

    SparkStreaming DRA auto set maxNumExecutor

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to