[GitHub] spark pull request: SPARK-2526: Simplify options in make-distribut...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1445#issuecomment-49202151 QA tests have started for PR 1445. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16738/consoleFull ---

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/1431 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49202256 SGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-2525][SQL] Remove as many compilation w...

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1444 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2525][SQL] Remove as many compilation w...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1444#issuecomment-49202444 I'v merged this in master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49202491 Merging in master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2154] Schedule next Driver when one com...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1405#issuecomment-49202648 Looks good to me -- we can put it in both 1.0.2 and 0.9.2 as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1430 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49202752 Also cc @mateiz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15016112 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -353,6 +350,14 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49203419 @mateiz I think it would mean mostly cloning `ALS.scala`, as the `Rating` object is woven throughout. Probably some large chunks could be refactored and shared. Is that

[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49203889 Actually only master. There is a conflict in branch-1.0 that I will leave it intact for now. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49203999 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-16 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49204042 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1435 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49204801 QA tests have started for PR 1409. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16740/consoleFull ---

[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1432#issuecomment-49204855 Merging in master and branch-1.0. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1432 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49205341 It doesn't bring much benefit right now, but what we are doing here is creating patterns in NullPropagation to specify the semantics of each individual expression ... not

[GitHub] spark pull request: [SPARK-2033] Automatically cleanup checkpoint

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/855#issuecomment-49205391 QA results for PR 855:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-1477]: Add the lifecycle interface

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/991#issuecomment-49205438 QA tests have started for PR 991. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16741/consoleFull ---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49205621 I'm curious about one thing here: doesn't this change mean that we might wait longer before launching a no-prefs or speculative task, due to delay scheduling? This is

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49205693 Looks like Kay pointed out the same issue while I was typing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SQL] Add HiveDecimal HiveVarchar support in...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49205734 Unit test actually failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-49206560 QA results for PR 929:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2525][SQL] Remove as many compilation w...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1444#issuecomment-49206483 QA results for PR 1444:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49206624 By the way, as far as I can tell, findSpeculativeTask also takes a locality level as an argument, and doesn't return tasks that are farther away than that. Do we need to

[GitHub] spark pull request: [SPARK-2522] set default broadcast factory to ...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1437#issuecomment-49206767 Merging in master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15017783 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49206975 BTW even in that case we could solve it by having more levels, basically pass in a different level to request speculative tasks after we requested non-speculative ones,

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49207097 My rationale for suggesting noPrefs tasks after NODE_LOCAL was to ensure that noPref tasks do not preempt locality for NODE_LOCAL tasks (they cant for PROCESS_LOCAL)

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15017975 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-16 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1362#discussion_r15018046 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1107,7 +1106,6 @@ class DAGScheduler( case shufDep:

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15018072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -344,21 +344,52 @@ private[sql] class StringColumnStats

[GitHub] spark pull request: fix compile error of streaming project

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/153#issuecomment-49207773 Merging this in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: fix compile error of streaming project

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/153#issuecomment-49207755 I think I've seeen this happening once in a while, but can't exactly reproduce after clean. Anyway it's better to explicitly define the return type for public methods, even

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1362#issuecomment-49208638 This looks good to me modulo a few comments! One other thing though, you should add a unit test for this functionality. Create a test that would result in a very long

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-16 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-49208726 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-16 Thread tsudukim
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-49207731 @pwendell I agree that there are many room for improvement about handling of stageId and attemptId. It might be better to break this problems into some sub-tasks. I

[GitHub] spark pull request: fix compile error of streaming project

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/153 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-16 Thread tsudukim
Github user tsudukim commented on a diff in the pull request: https://github.com/apache/spark/pull/1384#discussion_r15019018 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -478,6 +479,7 @@ private[spark] object JsonProtocol { def

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-16 Thread tsudukim
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-49209319 @rxin OK. After that, I think I can make this patch better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-49208410 Let's hold off merging this one until we merge #1262. Then it will be easier to index the information based on stage + attempt. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-49209789 Thanks. Merging this. We can fix the serialization error logging in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-16 Thread tsudukim
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-49209857 @rxin in #1262, can I expect the key of the stagedata in JobProgressListener become stageId + attemptId instead of stageId only? --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1259 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1238#discussion_r15019589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -26,6 +26,28 @@ import

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019606 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019630 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/PearsonCorrelation.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019612 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019642 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019620 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49210382 I'm not sure I agree with that. This is a pretty niche optimization not something fundamental about the expressions that is required for correct evaluation (and the

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49210466 Jenkins, test this please. I think I have addressed the latest round of review comments, where the biggest changes being: - Remove statistics

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49210568 Ah, I see, so you're saying it's worth to wait 3 seconds for node-local ones to be able to go instead of launching no-prefs tasks. That does make sense. We just have to

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15019889 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49210917 BTW this could also be a place to use the dreaded Scala @specialized annotation to template the code for Ints vs Longs, though as far as I know that's being deprecated by

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49210831 Yeah, that's what I meant, we can clone it at first but we might be able to share code later (at least the math code we run on each block, or stuff like that). But let's

[GitHub] spark pull request: SPARK-2098: All Spark processes should support...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1256#issuecomment-49211180 QA results for PR 1256:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1238#discussion_r15020066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -26,6 +26,28 @@ import

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49211292 QA results for PR 1022:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15020198 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49211502 Hey @mridulm I usually won't merge any code in the scheduler unless @markhamstra or @kayhousterhout has looked at it and signed off, since they are the most active

[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49211681 That is exactly the argument I made when the folding logic was added. :) I suggested that we add `deterministic` instead and then have a rule that folds things that are

[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49211732 I would be supportive of changing it to match my original proposal... --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15020402 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15020400 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15020475 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmansCorrelation.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49213268 QA results for PR 1022:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15020827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -344,21 +344,52 @@ private[sql] class StringColumnStats extends

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49214821 Thanks, but that is fine, I merged it in after I resolved my local hardware issues today. So did not need to impose on you to merge after all ! On 17-Jul-2014

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49215361 I am not sure if it is ok to wait ... This is something I never considered from the beginning when I added process_local ... Maybe it is ok ! If it is not, then we

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-16 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r15022756 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: replace println to log4j

2014-07-16 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1372#issuecomment-49217835 There has been further comments regarding this. It would be great if you address them as well. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49218047 QA results for PR 1409:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-1477]: Add the lifecycle interface

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/991#issuecomment-49218315 QA results for PR 991:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait Lifecycle

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49218616 @mridulm what I meant was that it would be good in the future if you try to have Mark or Kay look at patches in the scheduler code before you merge them. --- If your

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49219047 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLlib] SPARK-1536: multiclass classification ...

2014-07-16 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/886#issuecomment-49219179 @manishamde It looks like the MIMA issue may not be fixed right away, so let's use the exceptions in MimaExcludes.scala for now. Could you please do that? I believe

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49219540 QA tests have started for PR 1238. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16743/consoleFull ---

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1238#discussion_r15024063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -26,6 +26,28 @@ import

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49219898 QA results for PR 1238:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49220170 QA tests have started for PR 1188. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16744/consoleFull ---

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49220283 QA results for PR 1188:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49220362 Ah, that's true, it won't be as common now. Anyway I'd be okay with any solution as long as TaskSets with only no-prefs tasks, or only process-local + no-prefs, don't get

[GitHub] spark pull request: [SPARK-2411] Add a history-not-found page to s...

2014-07-16 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1336#discussion_r15024426 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/ui/HistoryNotFoundPage.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-16 Thread nevillelyh
Github user nevillelyh commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49220820 Sorry missed one. Fixed the style warning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49221370 @mengxr Scalatest 2.x has the tolerance feature, but it's absolute error not relative error. For large numbers, the absolute error may not be meaningful. With `===`, it

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49221715 Same here, Kay any thoughts ? On 17-Jul-2014 1:44 am, Matei Zaharia notificati...@github.com wrote: Ah, that's true, it won't be as common now. Anyway I'd be

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49221957 `almostEquals` reads better than `~===`. The feature we like is having the values in comparison in the error message but not the name :) --- If your project is set up

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49222667 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49222983 I learn `almostEquals` from boost library. Anyway, in this case, how do we distinguish the one with throwing out the message, and the one just returning true/false?

[GitHub] spark pull request: [SPARK-2393][SQL] Cost estimation optimization...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-49223292 QA tests have started for PR 1238. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16745/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/1393 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: Added t2 instance types

2014-07-16 Thread 24601
GitHub user 24601 opened a pull request: https://github.com/apache/spark/pull/1446 Added t2 instance types New t2 instance types require HVM amis, bailout assumption of pvm causes failures when using t2 instance types. You can merge this pull request into a Git repository by

[GitHub] spark pull request: Added t2 instance types

2014-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1446#issuecomment-49225163 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49227548 Okay I'm going to merge this into master and 1.0. We can cut a new patch release shortly for this. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-16 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1447 SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... ...s of CoGroupedRDD and PairRDDFunctions This also removes an unnecessary tuple creation in cogroup. You can

[GitHub] spark pull request: Added t2 instance types

2014-07-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1446#discussion_r15027927 --- Diff: ec2/spark_ec2.py --- @@ -240,7 +240,10 @@ def get_spark_ami(opts): r3.xlarge: hvm, r3.2xlarge: hvm,

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49227891 en...that makes sense actually we can get the situation about locality in the taskSet easily through myLocalityLevels, which is calculated when a new

<    1   2   3   4   >