vanzin commented on issue #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service. URL: https://github.com/apache/spark/pull/24817#issuecomment-505194298 > From my understanding that job will be failed to fetch the shuffle data and rerun the parent stages. I think that the scheduler will just detect that the needed map outputs don't exist (instead of failing to fetch data), but the end result is the same - parent stages will be re-run. Note that by default the shuffle timeout is basically "infinite", so the situation you describe wouldn't happen. > that it will not scale down aggressively if a long ETL job, with an early stage which is really big (say it uses 1000 executors) Yeah, that's a little trickier. You could start tracking things based on which stage is currently active (and the shuffle it needs), instead of per-job, and that would make this particular problem go away, at the expense of being worse when failures happen and you end up needing the shuffle data from an earlier stage. But I think the current version is good enough to start playing with.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
