[GitHub] spark issue #17750: [SPARK-4899][MESOS] Support for checkpointing on Coarse ...

lins05 Fri, 12 May 2017 09:32:00 -0700

Github user lins05 commented on the issue:

    https://github.com/apache/spark/pull/17750
  
    > Do you then think it would be a viable option to enable it by default on 
Coarse grained and have it not used in Fine-grained.
    
    SGTM, especially considering fine-grained mode is already deprecated. 
    
    > Could you expand on this a bit more, I assume we could maintain the state 
of the tasks similar to how driver state is maintained in 
MesosClusterScheduler, and accordingly update state at crucial points, like 
start.
    
    I don't think it's an easy task at all, because the spark driver is not 
designed to recover from crash. 
    
    The state in the MesosClusterScheduler is pretty simple. It's just a REST 
server that accepts requests from clients and launches spark drivers on their 
behalf.  And it just need to persist its mesos framework id, because it need to 
re-register with mesos master with the same framework id if it's restarted. In 
the current implementation MesosClusterScheduler uses zookeeper as the persist 
storage. Aside from that, the MesosClusterScheduler has no other stateful 
information.
    
    The spark driver is totally different, because it contains lots of stateful 
information:  the job/stage/task info, executors info, catalog that holds 
temporary views, to name a few. And all those are kept in the driver's memory 
and would be lost whenever the driver crashes. So it doesn't make sense to set 
`failover_timeout` at all, because spark driver doesn't support fail over.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17750: [SPARK-4899][MESOS] Support for checkpointing on Coarse ...

Reply via email to