bsidhom commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks 
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-487639909
 
 
   To give context: this is to support a HDFS shuffle implementation, which I 
have not yet had a chance to upload. This shuffle implementation could live 
outside of Spark itself, but it does need to have this configuration param 
added.
   
   I've been following along with SPARK-25299, but it looks like these 
scheduler changes have not yet been introduced. This is one component of 
SPARK-25299 that we need to figure out, and I think it makes sense to have it 
broken out as its own blocking issue. (In general, that jira should be 
decomposed into issues that can be addressed/discussed with smaller scope.) 
   
   @attilapiros I'm aware of the issues you point out here. If the in-Spark 
"external" shuffle service is used, you need to manually set this new property 
to `false`. Setting it to `true` would be incorrect since the shuffle manager 
you're using in this case is not _really_ "external". Unfortunately, "external" 
in this case is very overloaded and I'd love to hear suggestions for clearer 
names.
   
   This approach requires users to understand the implications of the new 
configurations, but only those who wish to use HDFS shuffle (or some other 
external shuffle implementation).
   
   Adding a flag is the simplest way to allow the flexibility required to 
introduce an "external-to-Spark" shuffle manager without introducing core 
interface changes. While this could be configured incorrectly, it's unlikely 
and many of our configurations require understanding before changing from the 
default value.
   
   A better long-term approach could be something like: add a method to the 
`ShuffleManager` interface that allows the implementation to indicate whether 
it can serve blocks without an executor (and also remove the hard dependency on 
BlockManager since it isn't needed in this case).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to