[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/18620 I'm not understanding why `sorted` is slower than `sortBy` - `sortBy` uses `sorted` in its implementation: ```scala def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr = sorted(ord

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18655 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18655 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18637 **[Test build #79701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79701/testReport)** for PR 18637 at commit

[GitHub] spark pull request #18635: [SPARK-21415] Triage scapegoat warnings, part 1

2017-07-18 Thread srowen
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/18635 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127903537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/18620 That would make sense. There must be something else going on. Overall, I don't think it is compelling enough evidence to make the `poll` change. (Though as mentioned it's not a huge deal so if

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18513 **[Test build #79699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79699/testReport)** for PR 18513 at commit

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18632 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18513 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79699/ Test PASSed. ---

[GitHub] spark pull request #18669: tfidf-new edit

2017-07-18 Thread chlyzzo
Github user chlyzzo closed the pull request at: https://github.com/apache/spark/pull/18669 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectori...

2017-07-18 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r127913117 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -132,6 +135,61 @@ private[sql] object ArrowConverters {

[GitHub] spark issue #18665: [SPARK-21446] [SQL] Fix setAutoCommit never executed

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18665 **[Test build #3844 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3844/testReport)** for PR 18665 at commit

[GitHub] spark pull request #15471: [SPARK-17919] Make timeout to RBackend configurab...

2017-07-18 Thread QCTW
Github user QCTW commented on a diff in the pull request: https://github.com/apache/spark/pull/15471#discussion_r127771891 --- Diff: R/pkg/R/backend.R --- @@ -108,13 +108,27 @@ invokeJava <- function(isStatic, objId, methodName, ...) { conn <- get(".sparkRCon", .sparkREnv)

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127897413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18656 No. I meant if there's a CodegenFallback expression, wholestage codegen will not be enabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread xuanyuanking
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/18654 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18633 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18669: tfidf-new edit

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18669 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18635: [SPARK-21415] Triage scapegoat warnings, part 1

2017-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18635 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMM...

2017-07-18 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18637#discussion_r127903284 --- Diff: mllib/pom.xml --- @@ -139,8 +133,38 @@ + target/scala-${scala.binary.version}/classes

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127918499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18639: [SPARK-21408][core] Better default number of RPC ...

2017-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18639#discussion_r127898848 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala --- @@ -33,7 +33,7 @@ import org.apache.spark.util.ThreadUtils /**

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127901508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18667: Fix the simpleString used in error messages

2017-07-18 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18667#discussion_r127903964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/LongType.scala --- @@ -43,7 +43,7 @@ class LongType private() extends IntegralType {

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 I am ok to close this. Thanks @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18632#discussion_r127904688 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -51,6 +51,7 @@ public

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18620 My benchmarks locally said poll() is a little faster on moderately large collections, like 100 elements in the queue. I'm really neutral. If it affords a little help, that's great. It's a natural

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Thanks @srowen , my test also said pq.poll is a little faster on some cases. One possible benefit here is if we provide pq.poll, user's first choice may use pq.poll, not pq.toArray.sorted, which

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18513 **[Test build #79699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79699/testReport)** for PR 18513 at commit

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127903005 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18667: Fix the simpleString used in error messages

2017-07-18 Thread fxbonnet
Github user fxbonnet commented on a diff in the pull request: https://github.com/apache/spark/pull/18667#discussion_r127905854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/LongType.scala --- @@ -43,7 +43,7 @@ class LongType private() extends IntegralType {

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127907280 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127909294 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18632 **[Test build #79702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79702/testReport)** for PR 18632 at commit

[GitHub] spark issue #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector to abst...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79703/testReport)** for PR 18468 at commit

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79695/ Test FAILed. ---

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-18 Thread DonnyZone
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18656 Yeah, CodegenFallback just provide a fallback mode. However, in such case, SortMergeJoinExec passes incomplete row as input to hiveUDF that implements CodegenFallback. --- If your project

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12646 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18655 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79696/ Test FAILed. ---

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)** for PR 18654 at commit

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12646 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79697/ Test FAILed. ---

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127896217 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79697/testReport)** for PR 12646 at commit

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18655 **[Test build #79696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79696/testReport)** for PR 18655 at commit

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 I also very confused about this. You can change https://github.com/apache/spark/pull/18624 to sorted and test. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127897096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18655 **[Test build #79698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79698/testReport)** for PR 18655 at commit

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127898565 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18655 @BryanCutler I'd like to share the motivation of refactoring `ArrowConverters` and `ColumnWriter`. For `ColumnWriter`, at first I'd like to support complex types like `ArrayType` and

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 My micro benchmark (write a program only test pq.toArray.sorted and pq.Array.sortBy and pq.poll), not find significant performance difference. Only in the Spark job, there is big difference.

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79700/testReport)** for PR 18654 at commit

[GitHub] spark pull request #18669: tfidf-new edit

2017-07-18 Thread chlyzzo
GitHub user chlyzzo opened a pull request: https://github.com/apache/spark/pull/18669 tfidf-new edit ## What changes were proposed in this pull request? i add a TfIdf.scala,it can compute docs tfidf's vector. i hava a case that is compute docs similarity,so i use the spark

[GitHub] spark issue #18669: tfidf-new edit

2017-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18669 @chlyzzo close this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

2017-07-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/18632#discussion_r127907518 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -51,6 +51,7 @@ public

[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

2017-07-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/18632#discussion_r127908258 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -51,6 +51,7 @@ public

[GitHub] spark issue #18669: tfidf-new edit

2017-07-18 Thread chlyzzo
Github user chlyzzo commented on the issue: https://github.com/apache/spark/pull/18669 closed, - 原始邮件 - 发件人:Sean Owen 收件人:apache/spark 抄送人:chlyzzo , Mention

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 Hi @srowen @MLnick @jkbradley @mengxr @yanboliang Is this change acceptable? if it is acceptable, I will update ALS ML code following this method. Also update Test Suite, which are too simple,

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127891910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18656 Will CodegenFallback be used in wholestage codegen? I think it's not supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127894313 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-18 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127888746 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,43 @@ +/* + *

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79697/testReport)** for PR 12646 at commit

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127892847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-07-18 Thread yaooqinn
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/18668 [SPARK-21451][SQL]get `spark.hadoop.*` properties from sysProps to hiveconf ## What changes were proposed in this pull request? get `spark.hadoop.*` properties from sysProps to

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127895586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-18 Thread DonnyZone
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18656 Hi, @cloud-fan, @vanzin , could you help to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127895248 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127895399 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127895419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)** for PR 18654 at commit

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Hi @MLnick , @srowen . My test showing: pq.poll is not significantly faster than pq.toArray.sortBy, but significantly faster than pq.toArray.sorted. Seems not each pq.toArray.sorted (such as

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18655 **[Test build #79696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79696/testReport)** for PR 18655 at commit

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893174 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-18 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18555 @gatorsmile Could you please review this code again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127894772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector ...

2017-07-18 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r127962023 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,421 @@ +/* + * Licensed to

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18656 I think the check for `SortMergeJoinExec` in `insertInputAdapter` should be corrected to: private def insertInputAdapter(plan: SparkPlan): SparkPlan = plan match { case p if

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127965749 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127965550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer(

[GitHub] spark pull request #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector ...

2017-07-18 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r127969028 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,421 @@ +/* + * Licensed to

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79704/ Test PASSed. ---

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit pr...

2017-07-18 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18641#discussion_r127982825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -273,12 +274,26 @@ case class

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79704/testReport)** for PR 18654 at commit

[GitHub] spark issue #18670: [SPARK-21455][CORE]RpcFailure should be call on RpcRespo...

2017-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18670 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18670: [SPARK-21455][CORE]RpcFailure should be call on RpcRespo...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18670 **[Test build #79706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79706/testReport)** for PR 18670 at commit

[GitHub] spark pull request #18670: [SPARK-21455][CORE]RpcFailure should be call on R...

2017-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18670#discussion_r127988207 --- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala --- @@ -624,7 +624,9 @@ abstract class RpcEnvSuite extends SparkFunSuite with

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79700/ Test FAILed. ---

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-18 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127934107 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -598,8 +598,23 @@ class LogisticRegression

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-18 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/18305 @sethah IMO we should back out the test-related bc var explicit destroy code as it complicates things. I hear that this _may_ help catch bugs... but frankly I'm not convinced. Because the

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79702/ Test PASSed. ---

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18632 **[Test build #79702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79702/testReport)** for PR 18632 at commit

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18632 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

  1   2   3   4   5   >