t oo created SPARK-30610:
----------------------------

             Summary: spark worker graceful shutdown
                 Key: SPARK-30610
                 URL: https://issues.apache.org/jira/browse/SPARK-30610
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 2.4.4
            Reporter: t oo


I am not talking about spark streaming! just regular batch jobs using 
spark-submit that may try to read large csv (100+gb) then write it out as 
parquet. In an autoscaling cluster would be nice to be able to scale down (ie 
terminate) ec2s without slowing down active spark applications.

for example:
1. start spark cluster with 8 ec2s
2. submit 6 spark apps
3. 1 spark app completes, so 5 apps still running
4. cluster can scale down 1 ec2 (to save $) but don't want to make the existing 
apps running on the (soon to be terminated) ec2 have to make its csv read, RDD 
processing steps.etc start from the beginning on different ec2's executors. 
Instead want to have a 'graceful shutdown' command so that the 8th ec2 does not 
accept new spark-submit apps to it (ie don't start new executors on it) but 
finish the ones that have already launched on it, then exit the worker pid. 
then the ec2 can be terminated


I thought stop-slave.sh could do this but looks like it just kills the pid




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to