squito commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always 
active for MesosCoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/20640#issuecomment-537238923
 
 
   my reluctance to merge this in the past is that without SPARK-24567, it 
kinda seems like a step backwards to me.  If mesos can't start a mesos-task 
(aka a spark-executor), before this change, we'd stop trying to place more 
executors there.
   
   The problem with the existing code is that the node is rejected 
indefinitely; I understand why you want to bring in the timeout.  But you're 
only bringing in the timeout for those cases where the executors start 
successfully, but tasks still consistently fail.  Eg. if there is one bad disk 
out of 10.  But OTOH, if there is something else fundamentally wrong with the 
node that prevents any executor from starting (eg. missing libs), then the 
behavior becomes worse.
   
   I don't use mesos at all myself so I dunno whether mesos somehow shields you 
from one type of problem which still makes this worthwhile.  But without that 
confidence, I didn't feel great about merging this. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to