[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-17 Thread GitBox
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r456439649 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala ## @@ -54,45 +53,112 @@ case class

[GitHub] [spark] AmplabJenkins commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #27690: URL: https://github.com/apache/spark/pull/27690#issuecomment-660105047 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #27690: URL: https://github.com/apache/spark/pull/27690#issuecomment-660105047 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] tgravescs commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
tgravescs commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660114456 test this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
SparkQA removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660017444 **[Test build #126048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126048/testReport)** for PR 29104 at commit

[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660147728 **[Test build #126048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126048/testReport)** for PR 29104 at commit

[GitHub] [spark] maropu commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
maropu commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660152286 @gatorsmile Do we need to support the case: a configuration with spaces? This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
SparkQA commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660152612 **[Test build #126055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126055/testReport)** for PR 29146 at commit

[GitHub] [spark] SparkQA commented on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
SparkQA commented on pull request #29143: URL: https://github.com/apache/spark/pull/29143#issuecomment-660152659 **[Test build #126056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126056/testReport)** for PR 29143 at commit

[GitHub] [spark] cloud-fan commented on a change in pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29143: URL: https://github.com/apache/spark/pull/29143#discussion_r456494782 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala ## @@ -120,3 +120,10 @@ case class

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r456503592 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2665,6 +2665,16 @@ object SQLConf { .checkValue(_

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660176457 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660176457 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] revans2 commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-17 Thread GitBox
revans2 commented on a change in pull request #29067: URL: https://github.com/apache/spark/pull/29067#discussion_r456521706 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -19,84 +19,301 @@ package

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660176791 > I like this idea! But I'm a bit worried about adding a new physical join node for it. I need some time to think about if we can reuse the existing broadcast hash join.

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660176469 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660191949 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] stczwd commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
stczwd commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456536362 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TablePartition.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660191949 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] revans2 commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-17 Thread GitBox
revans2 commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-660198312 I think I have addressed all of the outstanding review comments please take another look and see if the are more changes needed.

[GitHub] [spark] attilapiros commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-17 Thread GitBox
attilapiros commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r456399850 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java ## @@ -61,4 +63,17 @@ public MetricSet

[GitHub] [spark] attilapiros commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-17 Thread GitBox
attilapiros commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r456393576 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1391,10 +1391,12 @@ package object config {

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660223741 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660223741 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29145: [WIP][SPARK-32346][SQL] Support filters pushdown in Avro datasource

2020-07-17 Thread GitBox
SparkQA removed a comment on pull request #29145: URL: https://github.com/apache/spark/pull/29145#issuecomment-660205458 **[Test build #126063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126063/testReport)** for PR 29145 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29145: [WIP][SPARK-32346][SQL] Support filters pushdown in Avro datasource

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29145: URL: https://github.com/apache/spark/pull/29145#issuecomment-660224489 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
SparkQA removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660152612 **[Test build #126055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126055/testReport)** for PR 29146 at commit

[GitHub] [spark] SparkQA commented on pull request #29145: [WIP][SPARK-32346][SQL] Support filters pushdown in Avro datasource

2020-07-17 Thread GitBox
SparkQA commented on pull request #29145: URL: https://github.com/apache/spark/pull/29145#issuecomment-660224233 **[Test build #126063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126063/testReport)** for PR 29145 at commit

[GitHub] [spark] srowen commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-17 Thread GitBox
srowen commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-660224576 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] srowen closed pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-17 Thread GitBox
srowen closed pull request #29128: URL: https://github.com/apache/spark/pull/29128 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660116404 **[Test build #126054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126054/testReport)** for PR 28708 at commit

[GitHub] [spark] cloud-fan commented on a change in pull request #29127: [SPARK-32327][SQL] Introduce UnresolvedTableOrPermanentView for commands that support a table and permanent view, but not a tem

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29127: URL: https://github.com/apache/spark/pull/29127#discussion_r456470950 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala ## @@ -42,14 +42,23 @@ case class

[GitHub] [spark] cloud-fan commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456479893 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class

[GitHub] [spark] emkornfield commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
emkornfield commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456518431 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TablePartition.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the

[GitHub] [spark] emkornfield commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
emkornfield commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456518921 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TablePartition.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-17 Thread GitBox
dongjoon-hyun commented on a change in pull request #29133: URL: https://github.com/apache/spark/pull/29133#discussion_r456519069 ## File path: project/SparkBuild.scala ## @@ -1027,6 +1027,11 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-17 Thread GitBox
dongjoon-hyun commented on a change in pull request #29133: URL: https://github.com/apache/spark/pull/29133#discussion_r456519069 ## File path: project/SparkBuild.scala ## @@ -1027,6 +1027,11 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace

[GitHub] [spark] SparkQA commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
SparkQA commented on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660191253 **[Test build #126059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126059/testReport)** for PR 29135 at commit

[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660210258 **[Test build #126064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126064/testReport)** for PR 29104 at commit

[GitHub] [spark] cloud-fan commented on pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-17 Thread GitBox
cloud-fan commented on pull request #28852: URL: https://github.com/apache/spark/pull/28852#issuecomment-660113363 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] GuoPhilipse commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-17 Thread GitBox
GuoPhilipse commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r456464691 ## File path: docs/sql-ref-syntax-qry-select-lateral-view.md ## @@ -0,0 +1,130 @@ +--- +layout: global +title: LATERAL VIEW Clause +displayTitle:

[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
c21 commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456484701 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] maropu opened a new pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
maropu opened a new pull request #29146: URL: https://github.com/apache/spark/pull/29146 ### What changes were proposed in this pull request? This PR modified the parser code to handle invalid usages of a SET command. For example; ``` SET spark.sql.ansi.enabled true

[GitHub] [spark] dongjoon-hyun closed pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-17 Thread GitBox
dongjoon-hyun closed pull request #29141: URL: https://github.com/apache/spark/pull/29141 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] revans2 commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-17 Thread GitBox
revans2 commented on a change in pull request #29067: URL: https://github.com/apache/spark/pull/29067#discussion_r456509822 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -19,84 +19,301 @@ package

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29143: URL: https://github.com/apache/spark/pull/29143#issuecomment-660170031 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29143: URL: https://github.com/apache/spark/pull/29143#issuecomment-660170031 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-07-17 Thread GitBox
SparkQA commented on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-660180150 **[Test build #126051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126051/testReport)** for PR 28885 at commit

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
dongjoon-hyun commented on a change in pull request #29089: URL: https://github.com/apache/spark/pull/29089#discussion_r456524842 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -969,11 +969,11 @@ object CombineFilters

[GitHub] [spark] srowen commented on a change in pull request #29139: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-17 Thread GitBox
srowen commented on a change in pull request #29139: URL: https://github.com/apache/spark/pull/29139#discussion_r456537560 ## File path: docs/ml-guide.md ## @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies

[GitHub] [spark] SparkQA commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
SparkQA commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-660195822 **[Test build #126060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126060/testReport)** for PR 29089 at commit

[GitHub] [spark] srowen opened a new pull request #29147: [SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remaining modules

2020-07-17 Thread GitBox
srowen opened a new pull request #29147: URL: https://github.com/apache/spark/pull/29147 ### What changes were proposed in this pull request? See again the related PRs like https://github.com/apache/spark/pull/28971 This completes fixing compilation for 2.13 for all but `repl`,

[GitHub] [spark] cloud-fan closed pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-17 Thread GitBox
cloud-fan closed pull request #28852: URL: https://github.com/apache/spark/pull/28852 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on pull request #29127: [SPARK-32327][SQL] Introduce UnresolvedTableOrPermanentView for commands that support a table and permanent view, but not a temporary view

2020-07-17 Thread GitBox
cloud-fan commented on pull request #29127: URL: https://github.com/apache/spark/pull/29127#issuecomment-660137141 At first glance, I thought the PR description is confusing, because we can specify properties of temp view but can't display it. Then I realized that we just ignore the

[GitHub] [spark] c21 commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
c21 commented on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-660137442 @cloud-fan and @imback82 - I was not aware of https://github.com/apache/spark/pull/28676 before making this PR. After checking https://github.com/apache/spark/pull/28676, TLDR is I

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660148392 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] cloud-fan commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456495926 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660153281 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] maropu commented on a change in pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
maropu commented on a change in pull request #29143: URL: https://github.com/apache/spark/pull/29143#discussion_r456495797 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala ## @@ -120,3 +120,10 @@ case class First(child:

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660153281 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
c21 commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456502678 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] emkornfield commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
emkornfield commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456516830 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitions.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the

[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
c21 commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456526858 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] cloud-fan commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
cloud-fan commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660182641 > we can always change buildSide into a HashSet, and streamedSide just need to lookup in the HashSet, then the calculation will be optimized into M*log(N). Taking the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-660183205 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660182741 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660182731 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660182731 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-660183205 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] beliefer commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-17 Thread GitBox
beliefer commented on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-660187801 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] SparkQA commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
SparkQA commented on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-660186821 **[Test build #126058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126058/testReport)** for PR 29130 at commit

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
dongjoon-hyun commented on a change in pull request #29089: URL: https://github.com/apache/spark/pull/29089#discussion_r456531088 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -969,11 +969,11 @@ object CombineFilters

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660200660 @cloud-fan And FYI. Inside BroadcastHashJoinExec, the HashedRelation is already filtered out with NULL value, so if we want to deal with this issue inside BHJ-Exec, might need

[GitHub] [spark] SparkQA commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-17 Thread GitBox
SparkQA commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-660200667 **[Test build #126062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126062/testReport)** for PR 29067 at commit

[GitHub] [spark] SparkQA commented on pull request #29147: [SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remaining modules

2020-07-17 Thread GitBox
SparkQA commented on pull request #29147: URL: https://github.com/apache/spark/pull/29147#issuecomment-660200663 **[Test build #126061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126061/testReport)** for PR 29147 at commit

[GitHub] [spark] attilapiros commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-17 Thread GitBox
attilapiros commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r456393576 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1391,10 +1391,12 @@ package object config {

[GitHub] [spark] SparkQA commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command

2020-07-17 Thread GitBox
SparkQA commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-660223481 **[Test build #126055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126055/testReport)** for PR 29146 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29147: [SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remaining modules

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29147: URL: https://github.com/apache/spark/pull/29147#issuecomment-660222658 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29147: [SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remaining modules

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29147: URL: https://github.com/apache/spark/pull/29147#issuecomment-660222658 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] cloud-fan commented on a change in pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29143: URL: https://github.com/apache/spark/pull/29143#discussion_r456466819 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -458,6 +458,7 @@ abstract class SparkStrategies

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660134481 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-660134481 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
cloud-fan commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456513361 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class

[GitHub] [spark] SparkQA commented on pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
SparkQA commented on pull request #29143: URL: https://github.com/apache/spark/pull/29143#issuecomment-660169437 **[Test build #126057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126057/testReport)** for PR 29143 at commit

[GitHub] [spark] emkornfield commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
emkornfield commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456517861 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitions.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the

[GitHub] [spark] wangyum commented on a change in pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-17 Thread GitBox
wangyum commented on a change in pull request #29101: URL: https://github.com/apache/spark/pull/29101#discussion_r456516944 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -201,126 +201,51 @@ trait PredicateHelper

[GitHub] [spark] emkornfield commented on a change in pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions Catalog APIs on DataSourceV2

2020-07-17 Thread GitBox
emkornfield commented on a change in pull request #28617: URL: https://github.com/apache/spark/pull/28617#discussion_r456518177 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitions.java ## @@ -0,0 +1,105 @@ +/* + * Licensed to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-660180874 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-660180874 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-07-17 Thread GitBox
SparkQA removed a comment on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-660037934 **[Test build #126051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126051/testReport)** for PR 28885 at commit

[GitHub] [spark] aokolnychyi commented on a change in pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
aokolnychyi commented on a change in pull request #29089: URL: https://github.com/apache/spark/pull/29089#discussion_r456537732 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -969,11 +969,11 @@ object CombineFilters

[GitHub] [spark] aokolnychyi commented on a change in pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
aokolnychyi commented on a change in pull request #29089: URL: https://github.com/apache/spark/pull/29089#discussion_r456537845 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -969,11 +969,11 @@ object CombineFilters

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-17 Thread GitBox
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-660195980 > null > > we can always change buildSide into a HashSet, and streamedSide just need to lookup in the HashSet, then the calculation will be optimized into

[GitHub] [spark] AmplabJenkins commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
AmplabJenkins commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-660196508 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-17 Thread GitBox
AmplabJenkins removed a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-660196508 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] warrenzhu25 commented on pull request #28972: [SPARK-30794][CORE] Stage Level scheduling: Add ability to set off heap memory

2020-07-17 Thread GitBox
warrenzhu25 commented on pull request #28972: URL: https://github.com/apache/spark/pull/28972#issuecomment-660205284 > **[Test build #125902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125902/testReport)** for PR 28972 at commit

[GitHub] [spark] SparkQA commented on pull request #29145: [WIP][SPARK-32346][SQL] Support filters pushdown in Avro datasource

2020-07-17 Thread GitBox
SparkQA commented on pull request #29145: URL: https://github.com/apache/spark/pull/29145#issuecomment-660205458 **[Test build #126063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126063/testReport)** for PR 29145 at commit

[GitHub] [spark] SparkQA commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-07-17 Thread GitBox
SparkQA commented on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-660215131 **[Test build #126065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126065/testReport)** for PR 29020 at commit

[GitHub] [spark] maropu commented on a change in pull request #29143: [SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates

2020-07-17 Thread GitBox
maropu commented on a change in pull request #29143: URL: https://github.com/apache/spark/pull/29143#discussion_r456468861 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -458,6 +458,7 @@ abstract class SparkStrategies extends

[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-17 Thread GitBox
c21 commented on a change in pull request #29130: URL: https://github.com/apache/spark/pull/29130#discussion_r456484701 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -47,6 +47,18 @@ case class ShuffledHashJoinExec(

<    1   2   3   4   5   6   7   8   >