t oo created SPARK-30610: ---------------------------- Summary: spark worker graceful shutdown Key: SPARK-30610 URL: https://issues.apache.org/jira/browse/SPARK-30610 Project: Spark Issue Type: Improvement Components: Scheduler Affects Versions: 2.4.4 Reporter: t oo
I am not talking about spark streaming! just regular batch jobs using spark-submit that may try to read large csv (100+gb) then write it out as parquet. In an autoscaling cluster would be nice to be able to scale down (ie terminate) ec2s without slowing down active spark applications. for example: 1. start spark cluster with 8 ec2s 2. submit 6 spark apps 3. 1 spark app completes, so 5 apps still running 4. cluster can scale down 1 ec2 (to save $) but don't want to make the existing apps running on the (soon to be terminated) ec2 have to make its csv read, RDD processing steps.etc start from the beginning on different ec2's executors. Instead want to have a 'graceful shutdown' command so that the 8th ec2 does not accept new spark-submit apps to it (ie don't start new executors on it) but finish the ones that have already launched on it, then exit the worker pid. then the ec2 can be terminated I thought stop-slave.sh could do this but looks like it just kills the pid -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org