vanzin commented on issue #24817: [WIP][SPARK-27963][core] Allow dynamic 
allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-505194298
 
 
   > From my understanding that job will be failed to fetch the shuffle data 
and rerun the parent stages. 
   
   I think that the scheduler will just detect that the needed map outputs 
don't exist (instead of failing to fetch data), but the end result is the same 
- parent stages will be re-run. 
   
   Note that by default the shuffle timeout is basically "infinite", so the 
situation you describe wouldn't happen.
   
   > that it will not scale down aggressively if a long ETL job, with an early 
stage which is really big (say it uses 1000 executors)
   
   Yeah, that's a little trickier. You could start tracking things based on 
which stage is currently active (and the shuffle it needs), instead of per-job, 
and that would make this particular problem go away, at the expense of being 
worse when failures happen and you end up needing the shuffle data from an 
earlier stage. But I think the current version is good enough to start playing 
with.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to