[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49126648 QA results for PR 1313:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-07-16 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1244#issuecomment-49126716 We only want to do this if the driver shares the same directory structure as the executors. This is an assumption that is incorrect in many deployment settings.

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-49126877 That's a good point. Changing it for YARN seems like the right thing, and 80% sounds reasonable to me. Another thing is the wait time. Previously it was 6

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1431 [SPARK-2517] Removed some compiler type erasure warnings. Also took the chance to rename some variables to avoid unintentional shadowing. You can merge this pull request into a Git repository by

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49127255 QA tests have started for PR 1431. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16716/consoleFull ---

[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49127540 The code looks good to me. I am thinking if we could merge the ConstantFolding NullPropagation and make it `transformExpressionsUp` , since they kind of rely

[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1432 [SPARK-2518][SQL] Fix foldability of Substring expression. This is a follow-up of #1428. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SQL] Cleaned up ConstantFolding slightly.

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1430#issuecomment-49128331 See the discussion at #1428. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-49128371 QA results for PR 1259:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1432#issuecomment-49128436 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: SPARK-1215: Clustering: Index out of bounds er...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1407#issuecomment-49130099 QA results for PR 1407:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49131064 Besides breaking the API, I'm also worried about two things: 1. The increase in storage. We had some discussion before v1.0 about whether we should switch to long

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-16 Thread cfregly
GitHub user cfregly opened a pull request: https://github.com/apache/spark/pull/1434 [SPARK-1981] Add AWS Kinesis streaming support You can merge this pull request into a Git repository by running: $ git pull https://github.com/cfregly/spark master Alternatively you can

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-49132892 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1435 SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical... ... aggregation code You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-07-16 Thread javadba
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/194#issuecomment-49133737 Hi, the referenced PR Spark-1416 includes the following comment by @MLnick: But looking at the HBase PR you referenced, I don't see the value of having that

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49133777 QA tests have started for PR 1435. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16719/consoleFull ---

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49133829 Nice. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-16 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49133859 @tgravescs updated according to your comments and rebased to current HEAD of master branch. Thanks for following up on this PR. --- If your project is set up

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49134307 QA results for PR 1431:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1432#issuecomment-49134887 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Add HiveDecimal HiveVarchar support in unwra...

2014-07-16 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1436 Add HiveDecimal HiveVarchar support in unwrapping data You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark unwrapdata

[GitHub] spark pull request: Add HiveDecimal HiveVarchar support in unwra...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49135679 QA tests have started for PR 1436. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16720/consoleFull ---

[GitHub] spark pull request: Tightening visibility for various Broadcast re...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1438#issuecomment-49137498 QA tests have started for PR 1438. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16723/consoleFull ---

[GitHub] spark pull request: [SQL] Add HiveDecimal HiveVarchar support in...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49137154 QA tests have started for PR 1436. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16722/consoleFull ---

[GitHub] spark pull request: [SPARK-2523] [SQL] Hadoop table scan

2014-07-16 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1439 [SPARK-2523] [SQL] Hadoop table scan In HiveTableScan.scala, ObjectInspector was created for all of the partition based records, which probably causes ClassCastException if the object

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49137261 @dbtsai The assertions with `===` were all tested to work, but I agree it is more robust to allow numerical errors. One downside of this change is that `===` reports the

[GitHub] spark pull request: [SPARK-2523] [SQL] Hadoop table scan bug fixin...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49138252 QA tests have started for PR 1439. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16724/consoleFull ---

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1408#issuecomment-49138529 @yhuai @concretevitamin @rxin I've create another PR for this follow up, we can discuss this more at: https://github.com/apache/spark/pull/1439 --- If your

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49138992 `ObjectInspector` is not required by `Row` in Catalyst any more (not like in Shark), and it is tightly coupled with Deserializer the raw data, so I moved the

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1399#discussion_r14987086 --- Diff: sbin/start-thriftserver.sh --- @@ -0,0 +1,24 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49141155 QA results for PR 1435:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r14988046 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -20,8 +20,20 @@ package

[GitHub] spark pull request: [SQL] Add HiveDecimal HiveVarchar support in...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1436#issuecomment-49143482 QA results for PR 1436:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: fix compile error of streaming project

2014-07-16 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/153#issuecomment-49144416 Seems harmless as it only makes the return type of the method explicit. I can't see why it would be specific to building with one version of Hadoop though. Maybe it isn't?

[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...

2014-07-16 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/1441 SPARK-2452, create a new valid for each instead of using lineId. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1

[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49146776 QA tests have started for PR 1441. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16726/consoleFull ---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49147325 QA results for PR 1439:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49148347 QA tests have started for PR 1441. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16727/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49149396 Let me close this PR for now. I will fork or wrap as necessary. Keep it in mind, and maybe in a 2.x release this can be revisited. (Matei I ran into more problems with

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49153027 @mridulm @lirui-intel I separated the noPref tasks and those with the unavailable preferencethis will treat the tasks with the unavailable preference as non-local

[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1244#issuecomment-49153988 @andrewor14 yeah, I agree with you, I just thought in somewhere (document in the earlier versions? I cannot find it now), the user has to set this env variable? so I

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49154451 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-16 Thread chutium
Github user chutium commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49154812 hi, it is really very useful for us, i tried this implementation from @willb , in spark-shell, i still got java.lang.UnsupportedOperationException by Query Plan, i made

[GitHub] spark pull request: SPARK-2407: Added Parse of SQL SUBSTR()

2014-07-16 Thread chutium
GitHub user chutium opened a pull request: https://github.com/apache/spark/pull/1442 SPARK-2407: Added Parse of SQL SUBSTR() follow-up of #1359 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chutium/spark master Alternatively

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49155058 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49155240 Awesome! Please submit a pull request with that addition. On Jul 16, 2014 7:53 AM, Teng Qiu notificati...@github.com wrote: hi, it is really very useful for

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49155576 Hmm, just realized `Timestamp.toString` normalizes date and time according to current timezone and makes almost all timestamp related tests timezone sensitive.

[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49155771 QA results for PR 1441:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r14993202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -249,3 +263,7 @@ case class Cast(child: Expression,

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49156553 Confirmed that the following test cases are timezone sensitive and blacklisted them (by first remove all timestamp related golden answers, run them in my local

[GitHub] spark pull request: SPARK-2452, create a new valid for each instea...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1441#issuecomment-49156902 QA results for PR 1441:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r14993659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -249,3 +263,7 @@ case class Cast(child: Expression,

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-49157266 a document for newly introduced spark.deploy.retainedDrivers is missing? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49157381 QA tests have started for PR 1440. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16728/consoleFull ---

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49158865 QA tests have started for PR 1440. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16729/consoleFull ---

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-16 Thread chutium
Github user chutium commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-49159677 PR submitted #1442 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49159849 That was my thought as well, @egraldlo. Thanks for submitting this, @chutium! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread egraldlo
Github user egraldlo commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49160379 thx @willb, maybe protected val SUBSTRING = Keyword(SUBSTRING) as well, but this will cause the code redundance. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49160982 @egraldlo, couldn't it be `(SUBSTR | SUBSTRING) ~ // ... ` in that case? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread egraldlo
Github user egraldlo commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49161743 fine, that's great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49168910 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49171137 QA results for PR 1440:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-49173609 @CodingCat thanks,i have created a jire issue https://issues.apache.org/jira/browse/SPARK-2524 --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49174657 Hmmm... not sure that I would go so far as to call it nice. This does make the code slightly more difficult to read and understand, so can we hope that you've got

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49175346 Could you elaborate on when we will see an exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...

2014-07-16 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1443 [SPARK-2524] missing document about spark.deploy.retainedDrivers https://issues.apache.org/jira/browse/SPARK-2524 The configuration on spark.deploy.retainedDrivers is undocumented but

[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1443#issuecomment-49176249 QA tests have started for PR 1443. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16730/consoleFull ---

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15001961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala --- @@ -344,21 +344,52 @@ private[sql] class StringColumnStats extends

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r15002400 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -93,6 +93,10 @@ class HiveCompatibilitySuite

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-16 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49184593 Another example of this problem is the PageRank example bundled in Spark. At this time, since the problem of Java serializer still exists, to avoid causing

[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49188774 You are right that this rule does more than null propagation now. I'm not sure what a better name would be. `DegenerateExpressionSimplification`? Regarding

[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1443#issuecomment-49191235 QA results for PR 1443:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2033] Automatically cleanup checkpoint

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/855#issuecomment-49192571 QA tests have started for PR 855. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16731/consoleFull ---

[GitHub] spark pull request: [SPARK-2517] Removed some compiler type erasur...

2014-07-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1431#issuecomment-49193038 How about we close this one and merge #1444? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2525][SQL] Remove as many compilation w...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1444#issuecomment-49193205 QA tests have started for PR 1444. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16732/consoleFull ---

[GitHub] spark pull request: Improve ALS algorithm resource usage

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-49193225 QA tests have started for PR 929. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16733/consoleFull ---

[GitHub] spark pull request: [SPARK-2119][SQL] Improved Parquet performance...

2014-07-16 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1370#issuecomment-49193554 Thanks! I've merged this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2119][SQL] Improved Parquet performance...

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1370 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49194701 I'm going off of @mateiz 's report on SPARK-2048 that we found [this] to be much slower than accessing fields directly. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-2519. Eliminate pattern-matching on Tupl...

2014-07-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1435#issuecomment-49195389 Got it. Thanks. That also helps to put some bound (for now) on where we will make such performance optimizations. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread witgo
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/1022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread witgo
GitHub user witgo reopened a pull request: https://github.com/apache/spark/pull/1022 SPARK-1719: spark.*.extraLibraryPath isn't applied on yarn Fix: spark.executor.extraLibraryPath isn't applied on yarn You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49196995 QA tests have started for PR 1022. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16735/consoleFull ---

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013544 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -81,9 +82,8 @@ class LogisticRegressionSuite

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-16 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15013611 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +252,37 @@ private[hive] object HadoopTableReader {

[GitHub] spark pull request: SPARK-2098: All Spark processes should support...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1256#issuecomment-49197007 QA tests have started for PR 1256. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16734/consoleFull ---

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013786 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -20,8 +20,20 @@ package

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1212 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-16 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-49198901 Great - I will review in more detail after that. Would be great to get this merged before 1.1 freeze so PySpark I/O for inputformat and outputformat is in for the

[GitHub] spark pull request: SPARK-1719: spark.*.extraLibraryPath isn't app...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1022#issuecomment-49199676 QA tests have started for PR 1022. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16737/consoleFull ---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49200080 I just noticed that pendingTasksWithNotReadyPrefs is not being used now ? It is getting updated but never actually queried from ... Do we need to maintain it ?

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49200501 Thanks @lirui-intel merged finally :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49200595 @mridulm this is exactly what the PR is doing here? no? yes, it seems that pendingTasksWithNotReadyPrefs is redundant --- If your project is set up for

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49201615 If pendingTasksWithNotReady is never used, why was it added? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: Tightening visibility for various Broadcast re...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1438#issuecomment-4920 Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49201718 oh, just a mistake, I'm removing it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2526: Simplify options in make-distribut...

2014-07-16 Thread pwendell
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/1445 SPARK-2526: Simplify options in make-distribution.sh Right now we have a bunch of parallel logic in make-distribution.sh that's just extra work to maintain. We should just pass through

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49201924 Sean, I'd still be okay with adding a LongALS class if you see benefit for it in some use cases. Let's just see how it works in comparison. --- If your project is set up

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15015383 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -351,6 +354,14 @@ private[spark] class TaskSetManager(

  1   2   3   4   >