[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195512430 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala --- @@ -0,0 +1,141

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195512808 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-14 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 > @mccheah could you add a design doc for future reference and so that new contributors can understand better the rationale behind this. There is some description in the JIRA ticket but not eno

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195513995 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -56,17 +58,44

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195542619 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195561927 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195566124 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala --- @@ -0,0 +1,88

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195567079 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-14 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195567414 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-14 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 Ok, addressed comments. The latest patch also makes it so that the subscribers run in a thread pool instead of just on a single thread. We have two subscribers so now they can run concurrently, if

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-14 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-14 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 Ok, I'm merging to master. Thanks everyone for contributing to review - @foxish, @liyinan926 , @skonto , @dvogelbacher, @erikerlandson. As discussed earlier, I will post a design document fo

[GitHub] spark issue #21551: Fix issue in 'docker-image-tool.sh'

2018-06-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21551 @rxin for SA that this is going into branch-2.3. Should be fine - we can ship this if/when we cut 2.3.2. Merging. --- - To

[GitHub] spark issue #21551: Fix issue in 'docker-image-tool.sh'

2018-06-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21551 Missed this before merging, but @fabriziocucci for future reference please put the ticket number along with `[K8S]` in the PR description. Sorry I didn't catch this before me

[GitHub] spark pull request #21584: [SPARK-24433][K8S][WIP] Initial R Bindings for Sp...

2018-06-18 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21584#discussion_r196237071 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile --- @@ -0,0 +1,29 @@ +# +# Licensed to the Apache

[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'

2018-06-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21551 The PR is already merged so it's too late now. Next time, please create a ticket if one does not exist and put it in the PR descri

[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'

2018-06-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21551 asfgit is showing the commit as pushed above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'

2018-06-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21551 Hm, I thought I did. I don't quite know what's going on. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apac

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...

2018-06-18 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r196257515 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -154,6 +154,24 @@ private[spark] object Config

[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...

2018-06-19 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21555 This change makes it such that using the tool forces building and pushing both Python and non-Python, but, what if the user wants to only build one to save time? I can imagine that being the case

[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...

2018-06-20 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21555 Yup feel free to merge and follow up separately. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21508: [SPARK-24488] [SQL] Fix issue when generator is aliased ...

2018-06-25 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21508 @gatorsmile @hvanhovell, I'm working with @bkrieger and we need this patch soon. May we please get a sign off or else any suggested changes

[GitHub] spark pull request #21511: [SPARK-24491][Kubernetes] Configuration support f...

2018-06-27 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21511#discussion_r198659423 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -104,6 +104,20 @@ private[spark] object Config

[GitHub] spark pull request #21660: [SPARK-24683][K8S] Fix k8s no resource

2018-06-28 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/21660 [SPARK-24683][K8S] Fix k8s no resource ## What changes were proposed in this pull request? Make SparkSubmit pass in the main class even if `SparkLauncher.NO_RESOURCE` is the primary

[GitHub] spark issue #21660: [SPARK-24683][K8S] Fix k8s no resource

2018-06-28 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21660 @ifilonenko --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-06-29 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21462#discussion_r199228479 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -46,8 +46,6

[GitHub] spark pull request #21660: [SPARK-24683][K8S] Fix k8s no resource

2018-07-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21660#discussion_r199562616 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala --- @@ -21,17

[GitHub] spark pull request #21660: [SPARK-24683][K8S] Fix k8s no resource

2018-07-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21660#discussion_r199596517 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala --- @@ -21,17

[GitHub] spark issue #21743: [SPARK-24767][Launcher] Propagate MDC to spark-submit th...

2018-07-10 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21743 @vanzin would it be possible for you to please take a look at this? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21743: [SPARK-24767][Launcher] Propagate MDC to spark-submit th...

2018-07-10 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21743 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request: [SPARK-5697] Configurable registration retry i...

2015-04-15 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/4481 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2015-01-12 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-69662311 Hi, I was wondering if this is still being updated? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23106734 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23108057 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23108302 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2015-01-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-70321240 Looks like we're on the same page. However I believe this still raises the question of how to best do the shading itself. It looks like the short-term solution

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4106 [SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster Previously, Kerberos secured Hadoop clusters could only be accessed by Spark running

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553364 Suggestions to unit test are welcome. This should not be merged until it is unit-tested. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553855 One other caveat I forgot to mention, and the commit message should be updated and this reflected in the docs: User proxying needs to be enabled. Basically, the user

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70710069 Instead of having every task require a call back to the driver or master, can the master broadcast to the executor that a task is being speculated and any executor with

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70723749 Did you think of any corner cases that you might have missed? In terms of correctness, this seems okay (although the Jenkins build indicates there are some issues). Have

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23255988 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,24 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23265225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -113,7 +115,7 @@ class DAGScheduler( private val failedEpoch = new

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-20 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23270381 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -113,7 +115,7 @@ class DAGScheduler( private val failedEpoch = new

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23320905 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23317321 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70891705 That¹s correct. Definitely a work-in-progress so if there¹s another security model you¹d recommend I¹m all ears! -Matt Cheah From: Tom Graves

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23326963 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-21 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4155 [SPARK-4879] Use the Spark driver to authorize Hadoop commits. This is a version of https://github.com/apache/spark/pull/4066/ which is up to date with master and has unit tests

[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...

2015-01-21 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70959680 The linked pull request takes your ideas, makes them compatible with master, and adds unit tests. Feel free to take a look. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71116954 Looks like the tests timed out. This change is probably a large performance bottleneck, as communication back to the driver on every commit task is expensive? --- If

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71130249 Actually it just looks like one test is hanging, so likely something not being shut down properly. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71155039 Lots of comments to address, thanks for the detailed feedback @JoshRosen! One hanging cause is that the actor can be stopped multiple times in some of the YARN tests, so

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-22 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71155895 Pushed to try to make Jenkins pass. If Jenkins passes I'll handle the comments so far. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71253931 I'm also concerned about the performance ramifications of this. We need to run performance benchmarks. However, the only critical path that is affected by this are

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-23 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23478673 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-25 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23507946 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-25 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71399368 @vanzin I can attempt to make OutputCommitCoordinator more multithreaded as you suggest. Do we have an example somewhere of Spark executors calling back to the driver

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-25 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23508902 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-25 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23509086 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-25 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23513157 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4808] Remove Spillable minimum threshol...

2015-01-26 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3656#issuecomment-71526176 Seeing some problems that this PR could address so reviving this thread. @lawlerd the configurable count would help because if it is known that the individual

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71530343 @vanzin that's pretty much what I went with. The actor will receive the message and for commit permission requests they're farmed off to a thread pool. -

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23560945 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23567732 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23573888 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-26 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23581966 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-27 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71707718 This is a work in progress. In particular, the OutputCommitCoordinatorSuite isn't quite testing the right thing now. I don't exactly know how to test the full

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-27 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23634059 --- Diff: core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-27 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71743261 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-71893889 Regarding the tests, I note that this is currently NOT testing the right thing. It was, until I added extra logic down in the Executors, which is now bypassed by the way

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23712915 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -63,7 +63,7 @@ class DAGScheduler( mapOutputTracker

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23743012 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-3996]: Shade Jetty in Spark deliv...

2015-01-29 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4252#issuecomment-72126451 @ash211 Thanks for doing this. My colleagues and I will test this appropriately when we find the bandwidth! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-29 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4155#issuecomment-72141862 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2608 [SPARK-1860] Worker better app cleanup First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories are now

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
GitHub user mccheah reopened a pull request: https://github.com/apache/spark/pull/2608 [SPARK-1860] Worker better app cleanup First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories are now

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/2608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1860] Worker better app cleanup

2014-09-30 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/2608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-09-30 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2609 [SPARK-1860] More conservative app directory cleanup. First contribution to the project, so apologize for any significant errors. This PR addresses [SPARK-1860]. The application directories

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-01 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18316820 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -22,6 +22,7 @@ import java.text.SimpleDateFormat import java.util.Date

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18365112 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -233,8 +244,15 @@ private[spark] class Worker( } else

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-02 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18365510 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala --- @@ -174,7 +168,7 @@ private[spark] class ExecutorRunner

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

2014-10-03 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18407971 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -191,6 +194,8 @@ private[spark] class Worker( changeMaster

[GitHub] spark pull request: SPARK-3794 [CORE] Building spark core fails du...

2014-10-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2662#issuecomment-58056000 Sorry about that. I think Jenkins should be catching these kinds of build failures though. Jenkins should attempt to build the project against multiple versions of

[GitHub] spark pull request: SPARK-3794 [CORE] Building spark core fails du...

2014-10-06 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2662#issuecomment-58057718 Fair enough. The bottom line is that we could be more explicit about this. Perhaps something in the documentation? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/2828 [SPARK-3736] Workers reconnect when disassociated from the master. Before, if the master node is killed and restarted, the worker nodes would not attempt to reconnect to the Master. Therefore

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59408043 One remark is that there are no automated tests in this commit for now. I was unsuccessful in setting up TestKit to emulate a worker and master sending messages

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18977288 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -341,7 +341,11 @@ private[spark] class Master( case Some

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18978981 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986188 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986702 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18986941 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18988742 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -362,9 +372,19 @@ private[spark] class Worker

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74732535 There is a case where map-side-combine is actually not the right thing to do in some of my workflows. map-side-combine makes sense if the overall amount of data is

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74752839 We want to take advantage of the distributed reduce functionality of combineByKey when computing the other aggregation metrics as well. Is this not lost if we do a map

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74756966 You lose the parallelism that's inherent in computing the reduce as a parallel operation, as opposed to computing it on a list in a single task. For more co

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-74762686 Able to come back to this now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r24860380 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -193,17 +193,21 @@ class HadoopRDD[K, V]( override def getPartitions: Array

[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...

2015-02-18 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74827293 If every single object is large though, then in that case after we've spilled the 32nd object, there would still be an OOM before we check for spilling again, rig

<    1   2   3   4   5   6   7   8   9   10   >