[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

vanzin Thu, 16 Jul 2015 12:36:51 -0700

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/7431#issuecomment-122062974
  
    So now that I tried the new code path (which works), I'm a little skeptical 
that sending a message back to the driver is really needed. The driver already 
removes the executor when the RPC connection is reset:
    
        15/07/16 12:30:15 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 
4, vanzin-st1-3.vpc.cloudera.com): ExecutorLostFailure (executor 3 lost)
        15/07/16 12:30:15 WARN ReliableDeliverySupervisor: Association with 
remote system [akka.tcp://[email protected]:36279] 
has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
        15/07/16 12:30:15 INFO DAGScheduler: Executor lost: 3 (epoch 0)
        15/07/16 12:30:15 INFO BlockManagerMasterEndpoint: Trying to remove 
executor 3 from BlockManagerMaster.
        15/07/16 12:30:15 INFO BlockManagerMasterEndpoint: Removing block 
manager BlockManagerId(3, vanzin-st1-3.vpc.cloudera.com, 37469)
        15/07/16 12:30:15 INFO BlockManagerMaster: Removed 3 successfully in 
removeExecutor
    
    The new message ends up being a no-op:
    
        15/07/16 12:30:18 ERROR YarnClientSchedulerBackend: Asked to remove 
non-existent executor 3
    
    See `CoarseGrainedSchedulerBackend::DriverEndpoint::removeExecutor`.
    
    So I'm a little confused about how this change is fixing anything. The bug 
talks about "repeated re-execution of stages" - isn't that the correct way of 
handling executor failures? You retry tasks or stages depending on what the 
failure is.
    
    Perhaps the real issue you ran into is something like #6750 instead?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

Reply via email to