[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50257258 Since all process local tasks are also node, rack and any : we will incur node local delay also. On 27-Jul-2014 11:09 am, "Matei Zaharia" wrote: > I thoug

[GitHub] spark pull request: [SPARK-2361][MLLIB] Use broadcast instead of s...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1427 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2361][MLLIB] Use broadcast instead of s...

2014-07-26 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1427#issuecomment-50256712 We can revert this if the broadcast change gets in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-2361][MLLIB] Use broadcast instead of s...

2014-07-26 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1427#issuecomment-50256708 Merged in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2680: Lower spark.shuffle.memoryFraction...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1593 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: SPARK-2680: Lower spark.shuffle.memoryFraction...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1593#issuecomment-50256579 Alright, merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50256511 I thought that we can skip locality levels in the waiting phase if we have no tasks for them -- that's why we have `computeValidLocalityLevels`. In the case you mentioned,

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1607#issuecomment-50256373 QA results for PR 1607:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15437455 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -18,22 +18,53 @@ package org.apache.spark.sql.execution im

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15437450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -47,23 +47,29 @@ case class Generate( } } - over

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1607#issuecomment-50255801 QA tests have started for PR 1607. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17238/consoleFull --- If

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1607#discussion_r15437397 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -110,42 +110,56 @@ class ExternalAppendOnlyMap[K, V, C](

[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50255139 Hi @mateiz, I think ignoring bad dir is needed in production cluster. In production, there is a good chance for disk failures. I always love the idea that we could

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1601#issuecomment-50254990 QA results for PR 1601:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1608#discussion_r15437185 --- Diff: external/hbase/pom.xml --- @@ -0,0 +1,217 @@ +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; --- E

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1608#issuecomment-50254637 QA results for PR 1608:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):@serializable class HBaseCo

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1608#issuecomment-50254635 QA tests have started for PR 1608. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17237/consoleFull --- If

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1608#discussion_r15437173 --- Diff: external/hbase/pom.xml --- @@ -0,0 +1,217 @@ +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + x

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1608#discussion_r15437166 --- Diff: external/hbase/pom.xml --- @@ -0,0 +1,217 @@ +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + x

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-07-26 Thread tmalaska
GitHub user tmalaska opened a pull request: https://github.com/apache/spark/pull/1608 Spark-2447 : Spark on HBase Add common solution for sending upsert actions to HBase (put, deletes, and increment) This is the first pull request: mainly to test the review process, but

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1607#discussion_r15437121 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -110,42 +110,56 @@ class ExternalAppendOnlyMap[K, V, C]

[GitHub] spark pull request: SPARK-2680: Lower spark.shuffle.memoryFraction...

2014-07-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1593#issuecomment-50254298 LGTM - seems fine to lower this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1607#issuecomment-50254001 QA results for PR 1607:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253893 QA results for PR 1313:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253725 QA results for PR 1313:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1601#issuecomment-50253534 QA tests have started for PR 1601. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17236/consoleFull --- If

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread cfregly
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50253442 this PR worked for @srosenthal , btw. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread cfregly
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50253422 also, can someone address the questions i have here regarding the ec2 scripts and other peripheral aspects of this PR: https://issues.apache.org/jira/browse/SPARK-1981?f

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread cfregly
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50253400 @mateiz - this is a completely brand-new, from-scratch implementation. parviz's old code was actually a Scala port of the Java-based Kinesis sample app

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1607#issuecomment-50253396 QA tests have started for PR 1607. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull --- If

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread mateiz
GitHub user mateiz opened a pull request: https://github.com/apache/spark/pull/1607 SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input This will decrease object allocation from the "update" closure used in map.changeValue. You can merge this pull request into a G

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253330 QA tests have started for PR 1313. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17234/consoleFull --- If

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15436880 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,6 +771,8 @@ private[spark] class TaskSetManager( /**

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253135 QA tests have started for PR 1313. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17233/consoleFull --- If

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253029 Hi, @mateiz , thanks for the comments If we just adding NO_PREF level, it can avoid the unnecessary waiting when we only have no-pref tasks, however,

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1606#issuecomment-50252948 QA results for PR 1606:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: PEP8 compliance

2014-07-26 Thread bigsnarfdude
Github user bigsnarfdude closed the pull request at: https://github.com/apache/spark/pull/1540 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature i

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1082#discussion_r15436718 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -559,6 +559,19 @@ class JavaSparkContext(val sc: SparkContext) extends

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1082#issuecomment-50252530 What's the use-case for just retrieving only the ids of the persistent RDDs? If you want to check whether a particular RDD has been persisted, you can use the `getStor

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1605 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50252365 Thanks Josh; merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1606#discussion_r15436589 --- Diff: python/pyspark/context.py --- @@ -249,17 +258,14 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinP

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1606#issuecomment-50252309 QA tests have started for PR 1606. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17232/consoleFull --- If

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/1606 [SPARK-1550] [PySpark] Allow SparkContext creation after failed attempts This addresses a PySpark issue where a failed attempt to construct SparkContext would prevent any future SparkContext creat

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50251333 QA results for PR 1605:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1590#issuecomment-50250978 Thank you, merged it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread yu-iskw
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/1590 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1590#issuecomment-50250935 Thank you, merged it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250903 QA results for PR 1358:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50250570 QA tests have started for PR 1605. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17231/consoleFull --- If

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250554 So I don't quite understand, how can multiple executors be launched for the same Spark application on the same node right now? I thought we always reuse our executor acros

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1358#discussion_r15436238 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -250,7 +252,7 @@ private[spark] class MesosScheduler

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/1605 [SPARK-2601] [PySpark] Fix Py4J error when transforming pickleFiles Similar to SPARK-1034, the problem was that Py4J didn’t cope well with the fake ClassTags used in the Java API. It doesn’t

[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50250447 When did this come up? I'm actually not sure this is a good behavior, because doing this means that a user might completely miss a misconfigured directory. With the curren

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250374 QA tests have started for PR 1434. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17230/consoleFull --- If

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250292 Chris, is this new code from scratch, or is it based on Parviz's old pull request? --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250276 BTW you can run sbt scalastyle to check these style things locally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub a

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250282 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +286,48 @@ class DAGScheduler( case shufDep: Shuffl

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436181 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +286,48 @@ class DAGScheduler( case shufDep: Shuffl

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436179 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +286,48 @@ class DAGScheduler( case shufDep: Shuffl

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250242 BTW the Jenkins failure is due to a code style issue: an if block without braces Jenkins, this is ok to test --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250223 Hey @viirya, instead of modifying the PageRank example, what do you think of leaving it as-is until we have automatic checkpointing of long lineage chains? I think that wi

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250213 QA results for PR 1418:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250198 QA tests have started for PR 1358. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17229/consoleFull --- If

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250201 QA tests have started for PR 1418. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17228/consoleFull --- If

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436168 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala --- @@ -51,6 +55,11 @@ object SparkPageRank { urls.map(url => (url

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-50250159 @sryza mind adding this in Python as well? I think you need to add a `def __str__(self)` on the LinearModel class in mllib/regression.py. --- If your project is set up fo

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250175 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15436152 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250085 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: [SPARK-2361][MLLIB] Use broadcast instead of s...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1427#issuecomment-50250066 Do you guys want to merge this until we can see whether the RDD change goes into 1.1? Or wait for that? It does seem like a useful fix. --- If your project is set up for

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1604 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50249875 Oh sorry, I didn't see you wanted Tom to take a look at it too. Would be good to get his feedback. I just looked at the patch... --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50249860 Looks good, merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50249494 Hey Nan, sorry for the delay in getting to this. IMO this design is still too complicated -- we are passing so much state to resourceOffer and it's not super clear how the

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15436026 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -387,18 +403,29 @@ private[spark] class TaskSetManager( def resourc

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15436031 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,6 +771,8 @@ private[spark] class TaskSetManager( /** *

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-50248847 Also, can you delete `[WIP]` from the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-50248767 @chenghao-intel Sorry for my late reply. Other than those minor comment and format issues, it looks good to me. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435911 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Soft

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435898 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +251,37 @@ private[hive] object HadoopTableReader { val buf

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435896 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -46,7 +51,8 @@ private[hive] sealed trait TableReader { * data wareho

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435889 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala --- @@ -114,6 +77,7 @@ case class HiveTableScan( val columnI

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1590#issuecomment-50248537 I've merged this. Thanks! Do you mind closing this pull request, since it doesn't look like GitHub will do it automatically? --- If your project is set up fo

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50247796 QA results for PR 1604:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/993#issuecomment-50247741 Hey @rxin, thanks for the careful review! I think I've addressed most of your comments. Regarding the GeneratedAggregate code, I'm happy to sit down and explain in mor

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/993#issuecomment-50247318 QA tests have started for PR 993. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17227/consoleFull --- If y

[GitHub] spark pull request: PEP8 compliance

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1540#issuecomment-50247285 @bigsnarfdude Do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435630 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -51,8 +82,46 @@ abstract class SparkPlan extends QueryPlan[SparkPlan]

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -18,22 +18,53 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435611 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435577 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -47,23 +47,29 @@ case class Generate( } } -

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435573 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratedEvaluationSuite.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435560 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50246561 QA tests have started for PR 1604. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17226/consoleFull --- If

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1601#discussion_r15435519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -357,16 +357,52 @@ class SQLContext(@transient val sparkContext: SparkContext)

  1   2   >