[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-05-18 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-103157190 I think this has timed out. Would you close this PR for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-05-18 Thread liyezhang556520
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/3825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-05 Thread liyezhang556520
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68684212 @JoshRosen , If we want to use the supervision mechanism. We need to add another actor level as parent of the current Master actor. I don't know if that is

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-05 Thread liyezhang556520
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68828355 @markhamstra , thanks for reminder, I'll update this PR by making a try to introduce the supervision. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-05 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68721665 @liyezhang556520 That's been done already in the DAGScheduler. If we need another level of supervision for Master or other actors, we should consider whether these

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-04 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68648457 @markhamstra From that page: Depending on the nature of the work to be supervised and the nature of the failure, the supervisor has a choice of the following

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68650323 @JoshRosen your thinking is that Master will be in good shape even though an exception has been thrown? If you can guarantee that, then resuming the actor while

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68429462 I'd rather use fewer Akka features than more, since this will make it easier to replace Akka with our own RPC layer in the future. Therefore, I'd much prefer to not

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-31 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68445280 It doesn't seem to me that usage of the newer Akka persistence API is called for, but it does seem that wrapping the `receive` in a try-catch is trying to do the job

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-30 Thread liyezhang556520
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68424069 Hi @JoshRosen , it'll be a little weird to wrap the `receive` or `receiveWithLogging` method with try-catch blocks. And also this is conflict with the fault

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-30 Thread liyezhang556520
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68428197 I think a better way is to use the Akka's [persistence](http://doc.akka.io/docs/akka/snapshot/scala/persistence.html) feature, recover the actor's state when

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread liyezhang556520
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/3825 [SPARK-4991][CORE] Worker should reconnect to Master when Master actor restart This is a following JIRA of [SPARK-4989](https://issues.apache.org/jira/browse/SPARK-4991). when Master akka

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68241590 [Test build #24859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull) for PR 3825 at commit

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68246470 [Test build #24859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull) for PR 3825 at commit

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68246474 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68284859 Wouldn't it be better to ensure that actors like Master and DAGScheduler never die due to uncaught exceptions? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68286943 More specifically, I guess I'm suggesting that we modify wrap the `receive` and `receiveWithLogging` methods of our actors with try-catch blocks to log any exceptions