[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2608 [SPARK-1860] Worker better app cleanup First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories are now

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
GitHub user mccheah reopened a pull request: https://github.com/apache/spark/pull/2608 [SPARK-1860] Worker better app cleanup First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories are now

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/2608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/2608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-09-30 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2609 [SPARK-1860] More conservative app directory cleanup. First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18316820 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -22,6 +22,7 @@ import java.text.SimpleDateFormat import java.util.Date

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18365112 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -233,8 +244,15 @@ private[spark] class Worker( } else

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18365510 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala --- @@ -174,7 +168,7 @@ private[spark] class ExecutorRunner

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-03 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18407971 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -191,6 +194,8 @@ private[spark] class Worker( changeMaster

[GitHub] spark pull request: SPARK-3794 [CORE] Building spark core fails du...

2014-10-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2662#issuecomment-58056000 Sorry about that. I think Jenkins should be catching these kinds of build failures though. Jenkins should attempt to build the project against multiple versions

[GitHub] spark pull request: SPARK-3794 [CORE] Building spark core fails du...

2014-10-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2662#issuecomment-58057718 Fair enough. The bottom line is that we could be more explicit about this. Perhaps something in the documentation? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2828 [SPARK-3736] Workers reconnect when disassociated from the master. Before, if the master node is killed and restarted, the worker nodes would not attempt to reconnect to the Master. Therefore

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59408043 One remark is that there are no automated tests in this commit for now. I was unsuccessful in setting up TestKit to emulate a worker and master sending messages

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18977288 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -341,7 +341,11 @@ private[spark] class Master( case Some

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18978981 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986188 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986702 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986941 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18988742 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59803881 @JoshRosen agreed with @ash211, this is really good. Are there any actual comments on the PR, or can it be merged? =) --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59824518 The PR doesn't seem to be related to the unit tests that failed. How shall we tackle this issue? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-28 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2984 Shading the Jetty dependency. Jetty is a common dependency in projects. Spark is sensitive to the version of Jetty that is used, but version conflicts should be avoided. Shading Jetty in a similar

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-28 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-60848427 Broken for now, investigating --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-28 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-60849409 False alarm. This should be okay for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-28 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-60850543 @vanzin Is there a slated timeline for spark.files.userClassPathFirst to be done? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-28 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-60859256 @JoshRosen any comments? I know you've participated in the dependency discussion in the past. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-29 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-60968644 It's sounding like Spark's dependency tree is so large that we eventually want a solution that prevents any collision at all whatsoever; a holistic solution, if you

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-30 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-61133142 Any update on this? @JoshRosen @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-31 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-61301013 Requesting an update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Shading the Jetty dependency.

2014-10-31 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-61339002 Except we also need this to get into 1.2. Can we get this bumped up to be merged in for that release? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: Shading the Jetty dependency.

2014-11-03 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-61582083 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62029607 @ash211 please take a look at this as well. Going to test now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62057981 Working on it. Will let you know shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62070813 I'm pretty sure this doesn't work when Spark is built with maven. I'm going to try with sbt, but this is what I've found so far. I used make-distribution.sh

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62080459 This is not the way that we pull in the Spark dependency. We launch our server as a standalone application, and specify that the spark core jar is a library

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62195376 @pwendell I tried to run a mvn compile and it broke trying to compile Spark-SQL. Can you verify you're getting the same behavior? I'm taking your changes and playing

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62201800 I also built the project with sbt/sbt clean compile assembly, and when starting the master, I got the following stack trace: :43:14 ERROR ActorSystemImpl

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62210767 It passed mvn compile after clearing the caches once, but now it's failing on make-distribution in packaging GraphX. Do I need to clear the local caches before each

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62211101 Scratch that, it failed on streaming, not GraphX --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-11 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62608315 Any update on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...

2014-11-14 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/3275 [SPARK-4349] Checking if parallel collection partition is serializable Before, the DAGScheduler would determine if a task is serializable by doing a dry-run serialization of the first task

[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...

2014-11-14 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3275#issuecomment-63149863 Please consider the design issues that I think this bug uncovers before providing comment on the PR. From what I understand, the original design was to catch

[GitHub] spark pull request: Shading the Jetty dependency.

2014-11-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2984#issuecomment-63396867 That is what this PR is trying to address, but it is superceded by another PR now. https://github.com/apache/spark/issues/3130 --- If your project is set up

[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...

2014-11-26 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3275#issuecomment-64712768 Hi @pwendell or anyone, is there an update on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-01 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-65126080 Update on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...

2014-12-04 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/3275 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...

2014-12-04 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3275#issuecomment-65709183 We want a more generic fix than this. I'll push something new which will be completely different, addressing the issue further down in the stack. --- If your project

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-05 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-65878359 Totally makes sense. I don't think I have enough context in the Spark world as a whole to suggest a holistic build design, but I agree that this is where the disconnect

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-08 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/3638 [SPARK-4737] Task set manager properly handles serialization errors Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-12-08 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-66189849 Wanted to follow up on this - the priority of getting this done was just increased for us. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-09 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66357595 Anyone have any comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-09 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66366316 This is ready for further review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-12 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66823756 Hi, it would be appreciated if someone could give this patch some love. Thanks! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/3638#discussion_r21938291 --- Diff: core/src/test/scala/org/apache/spark/SharedSparkContext.scala --- @@ -30,7 +30,7 @@ trait SharedSparkContext extends BeforeAndAfterAll { self

[GitHub] spark pull request: Shading the Jetty dependency.

2014-12-31 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/2984 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23478673 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71253931 I'm also concerned about the performance ramifications of this. We need to run performance benchmarks. However, the only critical path that is affected by this are tasks

[GitHub] spark pull request: [SPARK-4808] Remove Spillable minimum threshol...

2015-01-26 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3656#issuecomment-71526176 Seeing some problems that this PR could address so reviving this thread. @lawlerd the configurable count would help because if it is known that the individual

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23560945 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71530343 @vanzin that's pretty much what I went with. The actor will receive the message and for commit permission requests they're farmed off to a thread pool. --- If your

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23567732 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-04 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r24072691 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -596,7 +597,9 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72557914 @pwendell do we need this for Spark 1.3.0? Is the feature merge deadline already past? I'm uncertain of what my bandwidth will be like but if it needs to be sped up I

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72435710 Suggestions make sense. I'm currently on a business trip so it might be a bit of time before I can get back to this. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-02-05 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-73025607 I think your latest comment is correct, @JoshRosen . We shouldn't hit an infinite loop because the failing authorized committers will eventually cause the task set

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23106734 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23108057 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23108302 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2015-01-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-70321240 Looks like we're on the same page. However I believe this still raises the question of how to best do the shading itself. It looks like the short-term solution

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70710069 Instead of having every task require a call back to the driver or master, can the master broadcast to the executor that a task is being speculated and any executor

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70723749 Did you think of any corner cases that you might have missed? In terms of correctness, this seems okay (although the Jenkins build indicates there are some issues). Have

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23326963 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70891705 That¹s correct. Definitely a work-in-progress so if there¹s another security model you¹d recommend I¹m all ears! -Matt Cheah From: Tom Graves

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23317321 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23320905 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23270381 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -113,7 +115,7 @@ class DAGScheduler( private val failedEpoch = new

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23255988 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,24 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-21 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70959680 The linked pull request takes your ideas, makes them compatible with master, and adds unit tests. Feel free to take a look. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-21 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4155 [SPARK-4879] Use the Spark driver to authorize Hadoop commits. This is a version of https://github.com/apache/spark/pull/4066/ which is up to date with master and has unit tests

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4106 [SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster Previously, Kerberos secured Hadoop clusters could only be accessed by Spark running

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553364 Suggestions to unit test are welcome. This should not be merged until it is unit-tested. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553855 One other caveat I forgot to mention, and the commit message should be updated and this reflected in the docs: User proxying needs to be enabled. Basically, the user

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71116954 Looks like the tests timed out. This change is probably a large performance bottleneck, as communication back to the driver on every commit task is expensive

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71130249 Actually it just looks like one test is hanging, so likely something not being shut down properly. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23265225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -113,7 +115,7 @@ class DAGScheduler( private val failedEpoch = new

[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...

2015-02-18 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74827293 If every single object is large though, then in that case after we've spilled the 32nd object, there would still be an OOM before we check for spilling again, right? I

[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...

2015-02-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74558171 @andrewor14 what do you think about the comments from @mingyukim ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74732535 There is a case where map-side-combine is actually not the right thing to do in some of my workflows. map-side-combine makes sense if the overall amount of data

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-16 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4634 [SPARK-5843] Allowing map-side combine to be specified in Java. Specifically, when calling JavaPairRDD.combineByKey(), there is a new five-parameter method that exposes the map-side-combine

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74752839 We want to take advantage of the distributed reduce functionality of combineByKey when computing the other aggregation metrics as well. Is this not lost if we do a map

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74756966 You lose the parallelism that's inherent in computing the reduce as a parallel operation, as opposed to computing it on a list in a single task. For more context

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-74762686 Able to come back to this now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74580824 groupBy and reduceByKey in the Scala API are actually just convenience methods that call through to combineByKey with parameters that make sense. Given that, perhaps

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4634#discussion_r24786524 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -25,17 +25,17 @@ import java.util.concurrent.*; import

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74579047 Can we also allow map-side-combine to be specified without specifying the partitioner? In general since Java doesn't offer the luxury of default-values

[GitHub] spark pull request: [SPARK-4808] Removing minimum number of elemen...

2015-02-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-75170336 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r24860380 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -193,17 +193,21 @@ class HadoopRDD[K, V]( override def getPartitions: Array

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23581966 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-27 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23634059 --- Diff: core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-27 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71743261 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

  1   2   3   4   5   6   7   8   9   10   >