[GitHub] spark pull request: [SPARK-3110][YARN] Add a "ha" mode in YARN mod...

harishreedharan Mon, 18 Aug 2014 19:20:12 -0700

Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/2024#issuecomment-52582448
  
    I should have made it clearer. The idea is for long running processes like 
streaming, you'd want the AM to come back up and reuse the same executors, so 
it can get the blocks from the memory of the executors because many streaming 
systems like Flume cannot really replay the data once it has been taken out. 
Even for others which can, the time period before data "expires" can mean some 
data could be lost. This is the first step in a series of patches for this one. 
The next is to get the AM to find the executors. My current plan is to use HDFS 
to keep track of where the executors are running and then communicate to them 
via Akka, to get a block list.
    
    I plan to expose this via SparkSubmit as the last step once we have all of 
the other pieces in place.
    
    You are right, we should add this in Cluster mode too - I will take a look 
at updating it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3110][YARN] Add a "ha" mode in YARN mod...

Reply via email to