[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1498 [SPARK-2521] Broadcast RDD object (instead of sending it along with every task) This is a resubmission of #1452. It was reverted because it broke the build. Currently (as of Spark

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-49539237 QA tests have started for PR 1498. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16863/consoleFull ---

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1473#issuecomment-49539328 I filed a JIRA: https://issues.apache.org/jira/browse/SPARK-2598 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15147889 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread mateiz
GitHub user mateiz opened a pull request: https://github.com/apache/spark/pull/1499 (WIP) SPARK-2045 Sort-based shuffle This adds a new ShuffleManager based on sorting, as described in https://issues.apache.org/jira/browse/SPARK-2045. The bulk of the code is in an ExternalSorter

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49539437 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16864/consoleFull ---

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15147904 --- Diff: core/src/main/scala/org/apache/spark/util/collection/SizeTrackingBuffer.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1500 [SPARK-2598] RangePartitioner's binary search does not use the given Ordering We should fix this in branch-1.0 as well. You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1473#issuecomment-49539660 @dorx can you close this PR? #1500 includes the change here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49539744 QA tests have started for PR 1500. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16866/consoleFull ---

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15147996 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry:

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry:

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148004 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry:

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry:

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry:

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-20 Thread dorx
Github user dorx closed the pull request at: https://github.com/apache/spark/pull/1473 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15148079 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15148086 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2552][MLLIB] stabilize logistic functio...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1493#issuecomment-49540207 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49540212 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148111 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1463,12 +1463,13 @@ object SparkContext extends Logging { // Regular

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148112 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1477,7 +1478,8 @@ object SparkContext extends Logging { def localCpuCount

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148114 --- Diff: docs/configuration.md --- @@ -599,6 +599,15 @@ Apart from these, the following properties are also available, and may be useful td

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49540263 QA tests have started for PR 1492. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16868/consoleFull ---

[GitHub] spark pull request: [SPARK-2552][MLLIB] stabilize logistic functio...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1493#issuecomment-49540259 QA tests have started for PR 1493. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16867/consoleFull ---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1460#issuecomment-49540306 It looks like pushing a new rebased commit hid my comments, but click on them above to make sure you see them. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49540356 Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1371#discussion_r15148140 --- Diff: python/pyspark/rdd.py --- @@ -48,6 +48,35 @@ __all__ = [RDD] +# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1447 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1371#discussion_r15148144 --- Diff: python/pyspark/rdd.py --- @@ -48,6 +48,35 @@ __all__ = [RDD] +# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49540402 Hey @davies apart from the small comments above, please add a test in `tests.py`. Jobs similar to the ones Matt posted would be great. Otherwise this might break again in

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49540404 Thanks for submitting this. I think we can still stack overflow in serialization, but I agree it's better to do this non-recursivley. --- If your project is set up for

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49540416 Actually it's late. I will review this tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [WIP][SPARK-2595:]The driver run garbage colle...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1387#issuecomment-49540472 I've talked to many JVM developers (engineers who work on the JVM) and while System.gc is advisory in the spec, it is actually a pretty reliable way of triggering GC.

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49541073 QA results for PR 1499:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49541476 QA results for PR 1499:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49541588 QA results for PR 1500:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49542166 QA results for PR 1492:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2552][MLLIB] stabilize logistic functio...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1493#issuecomment-49542158 QA results for PR 1493:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-20 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49542251 how about -Dspark.app.name=blah? because in jvm or Hadoop, they use -D flag to represent conf properties. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-20 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49542435 -D feels more natural indeed; I would expect those args to be passed through to the JVM as-is. Because that's a way to set these env properties too right? In fact,

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49542704 Thanks, @rxin! I've made these changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49542753 QA tests have started for PR 1497. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16869/consoleFull ---

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49544751 QA results for PR 1497:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49546564 cc @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread witgo
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/1501 [YARN]In some cases, pages display incorrect in WebUI The issue is caused by #1112 . You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49546613 QA tests have started for PR 1501. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16870/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49548923 QA tests have started for PR 1399. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16871/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49548926 QA results for PR 1399:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49549622 QA results for PR 1501:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49549682 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49549743 QA tests have started for PR 1501. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16872/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49549992 QA tests have started for PR 1399. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16873/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49550014 QA results for PR 1399:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49550656 QA tests have started for PR 1399. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16874/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49550677 QA results for PR 1399:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49551156 QA tests have started for PR 1399. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16875/consoleFull ---

[GitHub] spark pull request: [YARN]In some cases, pages display incorrect i...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1501#issuecomment-49552474 QA results for PR 1501:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2269 Refactor mesos scheduler resourceOf...

2014-07-20 Thread tnachen
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/1487#issuecomment-49553051 Do I need to specify someone to verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-20 Thread kanzhang
Github user kanzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/1338#discussion_r15150914 --- Diff: docs/programming-guide.md --- @@ -403,31 +403,30 @@ PySpark SequenceFile support loads an RDD within Java, and pickles the resulting

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49553961 QA results for PR 1399:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-20 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-49554126 @MLnick I'm thinking of removing the tests and programming guide entry for custom classes (JavaBeans). It seems to be a feature of Pyrolite and I can't think of any

[GitHub] spark pull request: SPARK-2564. ShuffleReadMetrics.totalBlocksRead...

2014-07-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1474#issuecomment-49554233 SPARK-2571 broke this. Upmerged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-2564. ShuffleReadMetrics.totalBlocksRead...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1474#issuecomment-49554270 QA tests have started for PR 1474. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16876/consoleFull ---

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49554380 Merged in master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1500 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-49554989 Updated patch sets the min registered executors ratio to .8 for YARN --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-49555111 QA tests have started for PR 634. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16877/consoleFull ---

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49555343 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16878/consoleFull ---

[GitHub] spark pull request: SPARK-2564. ShuffleReadMetrics.totalBlocksRead...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1474#issuecomment-49557082 QA results for PR 1474:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49557641 QA tests have started for PR 1313. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16879/consoleFull ---

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49557834 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1492 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-49558003 QA results for PR 634:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49558215 QA results for PR 1499:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49558233 QA tests have started for PR 1313. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16880/consoleFull ---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-20 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49558463 Hi, @kayousterhout @mateiz @mridulm @lirui-intel , thanks for the comments I just updated the patch, here is the basic idea of the current PR 1. added

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread aarondav
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/1502 SPARK-2047: Introduce an in-mem Sorter, and use it to reduce mem usage ### Why and what? Currently, the AppendOnlyMap performs an in-place sort by converting its array of [key, value, key,

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49558941 QA tests have started for PR 1502. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16881/consoleFull ---

[GitHub] spark pull request: SPARK-1691: Support quoted arguments inside of...

2014-07-20 Thread koertkuipers
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/609#issuecomment-49559153 on the command line i can get this to work now, but its still way beyond my bash skills to use exec spark-submit inside a script with multiple java options. this is

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49559170 Cool. What about P^3 sort? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2564. ShuffleReadMetrics.totalBlocksRead...

2014-07-20 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1474#issuecomment-49560491 LGTM, merging into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2564. ShuffleReadMetrics.totalBlocksRead...

2014-07-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1474 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49560613 LGTM, FWIW --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15152201 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15152216 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49560844 QA results for PR 1313:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-20 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49560919 OK, please ignored the results in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16879/consoleFull,

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15152231 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15152282 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1338#discussion_r15152359 --- Diff: docs/programming-guide.md --- @@ -403,31 +403,30 @@ PySpark SequenceFile support loads an RDD within Java, and pickles the resulting

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49561728 QA results for PR 1502:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):br* This trait

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49561754 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16882/consoleFull ---

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49562265 QA tests have started for PR 1502. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16883/consoleFull ---

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49562833 @Matei, our tests only run in local mode, but this issue can only be reproduced in multi-node cluster. Do we still need it ? On Sun, Jul 20, 2014 at 1:26

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49563398 Even in local mode, we launch multiple Python processes, one per core. Just set the master to local[4] or something like that. Some of our other tests do that. --- If

[GitHub] spark pull request: SPARK-2282: Reuse Socket for sending accumulat...

2014-07-20 Thread aarondav
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/1503 SPARK-2282: Reuse Socket for sending accumulator updates to Pyspark Prior to this change, every PySpark task completion opened a new socket to the accumulator server, passed its updates through,

[GitHub] spark pull request: SPARK-2282: Reuse Socket for sending accumulat...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1503#issuecomment-49563729 QA tests have started for PR 1503. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16884/consoleFull ---

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49564210 QA results for PR 1499:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49564530 QA results for PR 1502:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):br* This trait

  1   2   >