GitHub user lianhuiwang opened a pull request:

    https://github.com/apache/spark/pull/4367

    [SPARK-5529][Core]Replace blockManager's timeoutChecking with executor's 
timeoutChecking

    the phenomenon is:
    blockManagerSlave is timeout and BlockManagerMasterActor will remove this 
blockManager, but executor on this blockManager is not timeout because akka's 
heartbeat is normal.
    Because blockManager is in executor, if blockManager is removed, executor 
on this blockManager should be removed too.
    Especially when dynamicAllocation is enabled, allocationManager listen 
onBlockManagerRemoved and remove this executor. but actually in 
CoarseGrainedSchedulerBackend it is still in executorDataMap.
    
    so i think that we can remove timeoutChecking of BlockManagerMasterActor. 
and add executor's timeoutChecking of HeartbeatReceiver.
    if executor is timeout in HeartbeatReceiver, 
    Firstly,we tell TaskSchedulerImpl to executorLost and TaskSchedulerImpl 
will tell dagScheduler executorLost, then dagScheduler will tell 
blockManagerMaster to remove BlockManager of this executor.
    Next, we tell CoarseGrainedSchedulerBackend to kill executor that is 
timeout by SparkContext.killExecutor api.
    In the future, if we remove akka and implement ourself RPC, we just need to 
replace akka. and timeoutChecking to HeartbeatReceiver can be reserved for 
other RPC.
    Maybe we should change "spark.storage.blockManagerSlaveTimeoutMs" to 
"spark.executor.slaveTimeoutMs", "spark.storage.blockManagerTimeoutIntervalMs" 
to "spark.executor.timeoutIntervalMs"?
    @rxin @tdas @sryza @andrewor14 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lianhuiwang/spark SPARK-5529

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4367
    
----
commit aeb74b02a5521185c2cb571388b577a7af4e8da9
Author: lianhuiwang <[email protected]>
Date:   2015-02-04T12:27:33Z

    Replace blockManager's timeoutChecking of BlockManagerMasterActor with 
executor's timeoutChecking of HeartbeatReceiver

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to