[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1486 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57593579 w00t! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57592942 Okay - gonna merge this. Glad it's in good shape now. Thanks @cmccabe for the contribution. --- If your project is set up for it, you can reply to this email and have y

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57592632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57592627 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21177/consoleFull) for PR 1486 at commit [`338d4f8`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57588340 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21177/consoleFull) for PR 1486 at commit [`338d4f8`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57588159 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57531253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-01 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57529171 I just rebased on master and re-pushed. It looks like this merge conflict was caused by another change to the MimaExcludes file, just like the previous merge conflict.

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57351505 @cmccabe if you look at the message here it is saying that it doesn't merge cleanly. --- If your project is set up for it, you can reply to this email and have your rep

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57246819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57246815 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21001/consoleFull) for PR 1486 at commit [`dfab423`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57242587 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57242585 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21004/consoleFull) for PR 1486 at commit [`f99cb60`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57242523 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21004/consoleFull) for PR 1486 at commit [`f99cb60`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57241921 Rebasing on master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fea

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57237832 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21001/consoleFull) for PR 1486 at commit [`dfab423`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57237531 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57235482 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57235474 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20996/consoleFull) for PR 1486 at commit [`dfab423`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57225796 @cmccabe you'll need to up-merge this. I guess something changed over the weekend. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57225023 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20996/consoleFull) for PR 1486 at commit [`dfab423`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-27 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57068065 Hm this exclusion might not work in the case that a class is changed to an interface. Maybe just also add the specific recommended exclusion here: ``` Probl

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57039931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57039928 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20896/consoleFull) for PR 1486 at commit [`a9b70b0`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57038176 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20896/consoleFull) for PR 1486 at commit [`a9b70b0`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57038092 Thanks, being able to run ./dev/mima helps a lot. This latest one should work with mima. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57026822 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r18110514 --- Diff: project/MimaExcludes.scala --- @@ -39,7 +39,10 @@ object MimaExcludes { MimaBuild.excludeSparkPackage("graphx") ) ++

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57012765 This code has a compile error now. You can run this locally with `./dev/mima`. --- If your project is set up for it, you can reply to this email and have your reply app

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56873516 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56873514 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20817/consoleFull) for PR 1486 at commit [`c6390f3`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56873419 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20817/consoleFull) for PR 1486 at commit [`c6390f3`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-25 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56873120 I have pushed a new version that updated the MimaExcludes.scala file with ProblemFilters.excludeSparkClass("org.apache.spark.scheduler.TaskLocation")... hopefully that wi

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56769831 @cmccabe this is still failing the MIMA checks: [error] * declaration of class org.apache.spark.scheduler.TaskLocation has changed to interface org.apache.spark

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56763276 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56763272 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20770/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56763240 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/148/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56759037 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20770/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56759058 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/148/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56758490 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-24 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56743271 Yes, let's file a follow-up JIRA to discuss a design that can take into account any kind of different replica location. This patch doesn't expose any new APIs-- it's all

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56586230 Basically my feeling is not to block user-submitted patches on someone making a broader re-design if they are fairly isolated and only change internal API's. --- If yo

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56585915 I am totally 100% in support of adding a general mechanism for this and exposing it as a public API based on URI's. And pushing this general thing into the TaskSetMangae

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56506066 @pwendell This is not hadoop RDD specific functionality - it is a general requirement which can be leveraged by any RDD in spark - and hadoop RDD currently happens to hav

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56484097 @mridulm the proposal here was to avoid proposing a generalized/public API for these and instead do something simple/internal for the case of Hadoop RDD. The underscore

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56480392 Are we proposing to introduce hdfs caching tags/idioms directly into TaskSetManager in this pr ? That does not look right. We need to generalize this so that any rdd c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56470414 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20678/

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56470411 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20678/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56467045 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20678/consoleFull) for PR 1486 at commit [`9c4933c`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56465510 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20674/consoleFull) for PR 1486 at commit [`8f9c5d6`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56465520 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20674/

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56463517 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20674/consoleFull) for PR 1486 at commit [`8f9c5d6`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17886353 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17886029 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17886024 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17885924 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17885878 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17885653 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17881709 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter c

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56135114 I only had a few minor comments about documentation while trying to do a quick read-through of this patch. No substantive comments. --- If your project is set up for it

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769069 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769055 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769053 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768989 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPa

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768962 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768928 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-5619 Yes, this appears to be an issue with our checker and adding an exclusion is fine for now. The class is private. Just had really minor comments and I can address

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768506 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768479 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768467 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPa

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56125277 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20563/consoleFull) for PR 1486 at commit [`d1f9fe3`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56120182 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20563/consoleFull) for PR 1486 at commit [`d1f9fe3`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55966988 The "unit test failure" mentioned here seems to be coming from the binary compatibility checker. The text of the error is: [error] * class org.apache.spark.schedul

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55966108 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20487/consoleFull) for PR 1486 at commit [`b95ccd7`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55957034 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20487/consoleFull) for PR 1486 at commit [`b95ccd7`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17691734 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17691691 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-17 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17691660 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,35 @@ package org.apache.spark.scheduler * of preference w

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17614113 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55771271 On the visibility stuff, understood. I actually forgot the "old API" is still supported in newer versions of Hadoop. Otherwise, you could put this all in the new hadoop

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612896 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612851 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,35 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55687848 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20359/consoleFull) for PR 1486 at commit [`0d10adb`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55683946 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20359/consoleFull) for PR 1486 at commit [`0d10adb`](https://github.com/ap

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55683469 I can see why you'd like to reduce visibility, but I don't think it's possible here. In HadoopRDD, three new things are exposed with visibility private [spark]. They ar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17579270 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17578280 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference w

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17578110 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17578061 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17578078 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17578041 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPar

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17577975 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,8 +208,10 @@ abstract class RDD[T: ClassTag]( } /** - * Get

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17577951 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference w

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17577914 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference w

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread cmccabe
Github user cmccabe commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17577892 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -248,10 +250,22 @@ class HadoopRDD[K, V]( new HadoopMapPartitionsWithSplitRD

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55638524 Added a few more comments after thinking about this some more. As it stands the current factoring opens up a bunch of things at `private[spark]` visibility. We always tr

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17561092 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17560907 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPa

  1   2   >