[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717652874 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34947/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717652279 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
maropu commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717652557 Thanks for the review, @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717652288 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717652279 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
SparkQA commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717652255 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34946/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717651054 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717651054 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
SparkQA commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717651048 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34945/

[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r513138536 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +173,93 @@ final class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717649406 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717649431 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717649401 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717649401 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717631172 **[Test build #130344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130344/testReport)** for PR 26935 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717649431 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717649201 **[Test build #130344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130344/testReport)** for PR 26935 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717553031 **[Test build #130341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130341/testReport)** for PR 30162 at commit

[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
SparkQA commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717648777 **[Test build #130341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130341/testReport)** for PR 30162 at commit

[GitHub] [spark] SparkQA commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox
SparkQA commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717648150 **[Test build #130347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130347/testReport)** for PR 29677 at commit

[GitHub] [spark] zhengruifeng commented on pull request #30009: [SPARK-32907][ML] adaptively blockify instances - LinearSVC

2020-10-27 Thread GitBox
zhengruifeng commented on pull request #30009: URL: https://github.com/apache/spark/pull/30009#issuecomment-717647271 also ping @srowen This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] sarutak commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

2020-10-27 Thread GitBox
sarutak commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717646553 cc: @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717645869 **[Test build #130346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130346/testReport)** for PR 30166 at commit

[GitHub] [spark] sarutak opened a new pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13

2020-10-27 Thread GitBox
sarutak opened a new pull request #30166: URL: https://github.com/apache/spark/pull/30166 ### What changes were proposed in this pull request? This PR renames some part of `Seq` in `PostgresIntegrationSuite` to `scala.collection.Seq`. When I run `docker-integration-test`, I

[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717645292 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34947/

[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r513133285 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala ## @@ -57,8 +57,12 @@ import

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30079: URL: https://github.com/apache/spark/pull/30079#issuecomment-717643342 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30079: URL: https://github.com/apache/spark/pull/30079#issuecomment-717542037 **[Test build #130340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130340/testReport)** for PR 30079 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30079: URL: https://github.com/apache/spark/pull/30079#issuecomment-717643342 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
SparkQA commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717643361 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34946/

[GitHub] [spark] SparkQA commented on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier

2020-10-27 Thread GitBox
SparkQA commented on pull request #30079: URL: https://github.com/apache/spark/pull/30079#issuecomment-717642722 **[Test build #130340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130340/testReport)** for PR 30079 at commit

[GitHub] [spark] SparkQA commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
SparkQA commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717642756 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34945/

[GitHub] [spark] HyukjinKwon commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
HyukjinKwon commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717641861 Looks fine This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
viirya edited a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717630563 I don't have the numbers exactly for StateStore, but I remember we have some numbers for switching shuffle compression codec. @dbtsai Do you remember where the numbers

[GitHub] [spark] SparkQA removed a comment on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717633440 **[Test build #130345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130345/testReport)** for PR 30165 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717638813 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717638813 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
SparkQA commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717638657 **[Test build #130345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130345/testReport)** for PR 30165 at commit

[GitHub] [spark] HeartSaVioR commented on pull request #30151: [WIP][SPARK-33223][SS][UI]Structured Streaming Web UI state information

2020-10-27 Thread GitBox
HeartSaVioR commented on pull request #30151: URL: https://github.com/apache/spark/pull/30151#issuecomment-717635077 > How do you exactly mean separate? I meant having "accumulated" graphs across multiple state stores vs having graphs per state store. If you use stream-stream join,

[GitHub] [spark] SparkQA commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
SparkQA commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717633440 **[Test build #130345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130345/testReport)** for PR 30165 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717631496 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717631493 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717631481 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34944/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717631493 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu commented on a change in pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-27 Thread GitBox
maropu commented on a change in pull request #30095: URL: https://github.com/apache/spark/pull/30095#discussion_r513119837 ## File path: docs/sql-ref-syntax-qry-select.md ## @@ -85,6 +85,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } *

[GitHub] [spark] maropu commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
maropu commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717631405 cc: @huaxingao @HyukjinKwon @gatorsmile This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-10-27 Thread GitBox
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717631172 **[Test build #130344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130344/testReport)** for PR 26935 at commit

[GitHub] [spark] SparkQA commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog

2020-10-27 Thread GitBox
SparkQA commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717631159 **[Test build #130343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130343/testReport)** for PR 30147 at commit

[GitHub] [spark] maropu opened a new pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents

2020-10-27 Thread GitBox
maropu opened a new pull request #30165: URL: https://github.com/apache/spark/pull/30165 ### What changes were proposed in this pull request? This PR intends to add a dedicated page for SQL-on-file in SQL documents. This comes from the comment:

[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717630958 > That said, you'll need to either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats) or 2) add the

[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717630563 I don't have the numbers exactly for StateStore, but I remember we have some numbers for switching shuffle compression codec. @dbtsai Do you remember where the numbers are?

[GitHub] [spark] viirya commented on pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API

2020-10-27 Thread GitBox
viirya commented on pull request #29729: URL: https://github.com/apache/spark/pull/29729#issuecomment-717629594 Because I don't handle the operation of Kafka cluster, I don't have clear idea how hard to change from group.id based authorization to topic based one from Kafka infra

[GitHub] [spark] github-actions[bot] closed pull request #28731: [SPARK-31909][SQL] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-10-27 Thread GitBox
github-actions[bot] closed pull request #28731: URL: https://github.com/apache/spark/pull/28731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #28256: [SPARK-31483][PySpark] Use SPARK_PYTHON or 'python' to run find_spark_home.py

2020-10-27 Thread GitBox
github-actions[bot] closed pull request #28256: URL: https://github.com/apache/spark/pull/28256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
HeartSaVioR edited a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717626419 Before that, do you have numbers based on some experiments? It'd be nice if we get some numbers to determine how much it will help.

[GitHub] [spark] HyukjinKwon commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema

2020-10-27 Thread GitBox
HyukjinKwon commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717627342 Looks fine This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less

2020-10-27 Thread GitBox
HyukjinKwon commented on a change in pull request #30156: URL: https://github.com/apache/spark/pull/30156#discussion_r513116060 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2748,6 +2748,16 @@ object SQLConf {

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less

2020-10-27 Thread GitBox
HyukjinKwon commented on a change in pull request #30156: URL: https://github.com/apache/spark/pull/30156#discussion_r513115971 ## File path: docs/sql-migration-guide.md ## @@ -49,6 +49,8 @@ license: | - In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate

[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
HeartSaVioR commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717626419 Before that, do you have numbers based on some experiments? It'd be nice if we get some numbers to determine how it will help.

[GitHub] [spark] HyukjinKwon commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
HyukjinKwon commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717626162 cc @HeartSaVioR and @xuanyuanking FYI This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
HeartSaVioR commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717625880 At least we shouldn't rely on the configuration while reading the file - this isn't same as others, e.g. event log compression. For event log compression, the file has a

[GitHub] [spark] HyukjinKwon closed pull request #30159: [SPARK-33258][R][SQL] Add asc_nulls_* and desc_nulls_* methods to SparkR

2020-10-27 Thread GitBox
HyukjinKwon closed pull request #30159: URL: https://github.com/apache/spark/pull/30159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #30159: [SPARK-33258][R][SQL] Add asc_nulls_* and desc_nulls_* methods to SparkR

2020-10-27 Thread GitBox
HyukjinKwon commented on pull request #30159: URL: https://github.com/apache/spark/pull/30159#issuecomment-717625634 Thanks for working on this, @zero323. Merged to master. This is an automated message from the Apache Git

[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717624005 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34944/

[GitHub] [spark] HeartSaVioR commented on pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API

2020-10-27 Thread GitBox
HeartSaVioR commented on pull request #29729: URL: https://github.com/apache/spark/pull/29729#issuecomment-717622023 @zsxwing @viirya @xuanyuanking Could you please go through reviewing again? If there's no further comments in a couple of days I'll merge this in.

[GitHub] [spark] Victsm commented on a change in pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging

2020-10-27 Thread GitBox
Victsm commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r513103688 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -1252,6 +1254,28 @@ private[spark] class DAGScheduler(

[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox
viirya commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r513102275 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache

[GitHub] [spark] maropu commented on a change in pull request #30157: [DO-NOT-MERGE][SPARK-33228][SQL][2.4] Don't uncache data when replacing a view having the same logical plan

2020-10-27 Thread GitBox
maropu commented on a change in pull request #30157: URL: https://github.com/apache/spark/pull/30157#discussion_r513101527 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala ## @@ -90,7 +90,7 @@ abstract class UserDefinedType[UserType

[GitHub] [spark] maropu closed pull request #30157: [DO-NOT-MERGE][SPARK-33228][SQL][2.4] Don't uncache data when replacing a view having the same logical plan

2020-10-27 Thread GitBox
maropu closed pull request #30157: URL: https://github.com/apache/spark/pull/30157 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717609418 > How about we follow `spark.eventLog.compression.codec`, > > ``` > The codec to compress logged events. If this is not given,  spark.io.compression.codec will be used.

[GitHub] [spark] maropu commented on pull request #28923: [SPARK-32090][SQL] Improve UserDefinedType.equal() to make it be symmetrical

2020-10-27 Thread GitBox
maropu commented on pull request #28923: URL: https://github.com/apache/spark/pull/28923#issuecomment-717609168 I've backported this into branch-3.0/2.4. For the reason, please see: https://github.com/apache/spark/pull/30157/files#r512494277

[GitHub] [spark] viirya commented on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
viirya commented on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717608596 Good catch and thanks for the fix! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] viirya commented on a change in pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
viirya commented on a change in pull request #30160: URL: https://github.com/apache/spark/pull/30160#discussion_r513099620 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala ## @@ -71,7 +71,13 @@ object

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30146: [SPARK-33241][SQL] Dynamic pruning on data column

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30146: URL: https://github.com/apache/spark/pull/30146#issuecomment-717606921 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30018: [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30018: URL: https://github.com/apache/spark/pull/30018#issuecomment-717601699 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #30018: [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30018: URL: https://github.com/apache/spark/pull/30018#issuecomment-717474049 **[Test build #130339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130339/testReport)** for PR 30018 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #30146: [SPARK-33241][SQL] Dynamic pruning on data column

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30146: URL: https://github.com/apache/spark/pull/30146#issuecomment-717474027 **[Test build #130338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130338/testReport)** for PR 30146 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #30146: [SPARK-33241][SQL] Dynamic pruning on data column

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30146: URL: https://github.com/apache/spark/pull/30146#issuecomment-717606921 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717606710 **[Test build #130342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130342/testReport)** for PR 30139 at commit

[GitHub] [spark] SparkQA commented on pull request #30146: [SPARK-33241][SQL] Dynamic pruning on data column

2020-10-27 Thread GitBox
SparkQA commented on pull request #30146: URL: https://github.com/apache/spark/pull/30146#issuecomment-717606226 **[Test build #130338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130338/testReport)** for PR 30146 at commit

[GitHub] [spark] maropu commented on a change in pull request #30157: [DO-NOT-MERGE][SPARK-33228][SQL][2.4] Don't uncache data when replacing a view having the same logical plan

2020-10-27 Thread GitBox
maropu commented on a change in pull request #30157: URL: https://github.com/apache/spark/pull/30157#discussion_r513097084 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala ## @@ -90,7 +90,7 @@ abstract class UserDefinedType[UserType

[GitHub] [spark] maropu commented on pull request #30161: [SPARK-33246][SQL][DOCS] Correct documentation for null semantics of "NULL AND False"

2020-10-27 Thread GitBox
maropu commented on pull request #30161: URL: https://github.com/apache/spark/pull/30161#issuecomment-717604909 NOTE: I added your jira account in the contributor list. Thanks for the first contribution, @stwhit ! This is

[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717605006 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] maropu commented on pull request #30161: [SPARK-33246][SQL][DOCS] Correct documentation for null semantics of "NULL AND False"

2020-10-27 Thread GitBox
maropu commented on pull request #30161: URL: https://github.com/apache/spark/pull/30161#issuecomment-717602994 Thanks! Merged to master/3.0. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] maropu closed pull request #30161: [SPARK-33246][SQL][DOCS] Correct documentation for null semantics of "NULL AND False"

2020-10-27 Thread GitBox
maropu closed pull request #30161: URL: https://github.com/apache/spark/pull/30161 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] maropu commented on pull request #30161: [SPARK-33246][Docs] Correct documentation for null semantics of "NULL AND False"

2020-10-27 Thread GitBox
maropu commented on pull request #30161: URL: https://github.com/apache/spark/pull/30161#issuecomment-717602110 Nice catch! LGTM This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] AmplabJenkins commented on pull request #30018: [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30018: URL: https://github.com/apache/spark/pull/30018#issuecomment-717601699 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #30018: [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier

2020-10-27 Thread GitBox
SparkQA commented on pull request #30018: URL: https://github.com/apache/spark/pull/30018#issuecomment-717601024 **[Test build #130339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130339/testReport)** for PR 30018 at commit

[GitHub] [spark] ankurdave commented on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
ankurdave commented on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717600863 @maropu Updated the PR description: > The fix is to call `.toIndexedSeq` on `ordering` before applying the modifications. This causes the modifications to occur eagerly

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717597724 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
AmplabJenkins commented on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717597724 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
SparkQA removed a comment on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717469765 **[Test build #130337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130337/testReport)** for PR 30160 at commit

[GitHub] [spark] SparkQA commented on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
SparkQA commented on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717597002 **[Test build #130337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130337/testReport)** for PR 30160 at commit

[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox
maropu commented on a change in pull request #30093: URL: https://github.com/apache/spark/pull/30093#discussion_r513086916 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache

[GitHub] [spark] maropu commented on pull request #30160: [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

2020-10-27 Thread GitBox
maropu commented on pull request #30160: URL: https://github.com/apache/spark/pull/30160#issuecomment-717591078 Nice catch! > The fix is to check if ordering is a Stream and force the modifications to happen immediately if so. The statement above in the PR description looks

[GitHub] [spark] Victsm commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-10-27 Thread GitBox
Victsm commented on a change in pull request #30163: URL: https://github.com/apache/spark/pull/30163#discussion_r513080596 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java ## @@ -158,6 +158,42 @@ public void

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for mergi

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-717585508 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30163: URL: https://github.com/apache/spark/pull/30163#issuecomment-717582688 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717584258 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

2020-10-27 Thread GitBox
AmplabJenkins removed a comment on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-717577247 This is an automated message from the Apache Git Service. To respond to the message, please log on

<    1   2   3   4   5   6   >