[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804854322 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -64,13 +64,23 @@ private long

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35493: [SPAPRK-38192][CORE][TESTS] Fix potential resource leak in `kvstore.RocksDBSuite`

2022-02-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #35493: URL: https://github.com/apache/spark/pull/35493#discussion_r804843513 ## File path: common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java ## @@ -330,15 +330,16 @@ private int countKeys(Class

[GitHub] [spark] dongjoon-hyun commented on pull request #35493: [SPAPRK-38192][CORE][TESTS] Fix potential resource leak in `kvstore.RocksDBSuite`

2022-02-11 Thread GitBox
dongjoon-hyun commented on pull request #35493: URL: https://github.com/apache/spark/pull/35493#issuecomment-1036420045 Thank you, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] AmplabJenkins commented on pull request #35484: [SPARK-38181][SS] Update comments in KafkaDataConsumer.scala

2022-02-11 Thread GitBox
AmplabJenkins commented on pull request #35484: URL: https://github.com/apache/spark/pull/35484#issuecomment-1036341186 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #35477: [SPARK-38175][CORE][SQL][SS][DSTREAM][MESOS][WEBUI] Clean up unused parameters in private methods signature

2022-02-11 Thread GitBox
dongjoon-hyun commented on pull request #35477: URL: https://github.com/apache/spark/pull/35477#issuecomment-1036535323 Could you re-trigger GitHub Action? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
dongjoon-hyun commented on pull request #35490: URL: https://github.com/apache/spark/pull/35490#issuecomment-1036428549 Could you make a backporting PR for `branch-3.1` please, @ulysses-you ? ```scala scala> val emptyAgg = Map.empty[String, String] emptyAgg:

[GitHub] [spark] dongjoon-hyun removed a comment on pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
dongjoon-hyun removed a comment on pull request #35490: URL: https://github.com/apache/spark/pull/35490#issuecomment-1036428549 Could you make a backporting PR for `branch-3.1` please, @ulysses-you ? ```scala scala> val emptyAgg = Map.empty[String, String] emptyAgg:

[GitHub] [spark] AmplabJenkins commented on pull request #35476: [SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa…

2022-02-11 Thread GitBox
AmplabJenkins commented on pull request #35476: URL: https://github.com/apache/spark/pull/35476#issuecomment-1036472752 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] allisonwang-db commented on pull request #35469: [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates

2022-02-11 Thread GitBox
allisonwang-db commented on pull request #35469: URL: https://github.com/apache/spark/pull/35469#issuecomment-1036490564 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] allisonwang-db commented on pull request #35486: [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates

2022-02-11 Thread GitBox
allisonwang-db commented on pull request #35486: URL: https://github.com/apache/spark/pull/35486#issuecomment-1036490353 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #35352: [SPARK-38063][SQL] Support split_part Function

2022-02-11 Thread GitBox
amaliujia commented on pull request #35352: URL: https://github.com/apache/spark/pull/35352#issuecomment-1036524551 Ok I have also addressed some specification review feedback. Now this PR is ready for code review again. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35477: [SPARK-38175][CORE][SQL][SS][DSTREAM][MESOS][WEBUI] Clean up unused parameters in private methods signature

2022-02-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #35477: URL: https://github.com/apache/spark/pull/35477#discussion_r804936182 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala ## @@ -340,8 +340,7 @@ private[ui] class

[GitHub] [spark] dongjoon-hyun commented on pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
dongjoon-hyun commented on pull request #35490: URL: https://github.com/apache/spark/pull/35490#issuecomment-1036442096 The code looks good but `branch-3.2` seems not to have this issue. The UT passes without the patch in `branch-3.2`. Do you know what difference causes the issue at

[GitHub] [spark] allisonwang-db commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

2022-02-11 Thread GitBox
allisonwang-db commented on a change in pull request #35404: URL: https://github.com/apache/spark/pull/35404#discussion_r805015914 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -4249,7 +4250,30 @@ object

[GitHub] [spark] allisonwang-db commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

2022-02-11 Thread GitBox
allisonwang-db commented on a change in pull request #35404: URL: https://github.com/apache/spark/pull/35404#discussion_r805016815 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ## @@ -645,4 +645,9 @@ case class

[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

2022-02-11 Thread GitBox
amaliujia commented on a change in pull request #35404: URL: https://github.com/apache/spark/pull/35404#discussion_r805023663 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala ## @@ -645,4 +645,9 @@ case class

[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

2022-02-11 Thread GitBox
amaliujia commented on a change in pull request #35404: URL: https://github.com/apache/spark/pull/35404#discussion_r805023507 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -4249,7 +4250,30 @@ object ApplyCharTypePadding

[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

2022-02-11 Thread GitBox
itholic commented on pull request #35488: URL: https://github.com/apache/spark/pull/35488#issuecomment-1036683889 Thanks for the review! Let me update the PR description after some tests. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] parthchandra commented on a change in pull request #35262: [SPARK-37974][SQL] Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-02-11 Thread GitBox
parthchandra commented on a change in pull request #35262: URL: https://github.com/apache/spark/pull/35262#discussion_r805072581 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedDeltaLengthByteArrayReader.java ## @@ -0,0 +1,103

[GitHub] [spark] ulysses-you commented on pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
ulysses-you commented on pull request #35490: URL: https://github.com/apache/spark/pull/35490#issuecomment-1036912931 thank you @dongjoon-hyun , The rule that change aggregate if it's group only with limit 1 to project is landed at branch-3.3 see

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] Yikun opened a new pull request #35496: [SPARK-37145][K8S][FOLLOWUP] Add note for KubernetesDriverCustomFeatureConfigStep

2022-02-11 Thread GitBox
Yikun opened a new pull request #35496: URL: https://github.com/apache/spark/pull/35496 ### What changes were proposed in this pull request? Add note for developer to show how to use `KubernetesDriverCustomFeatureConfigStep` and `KubernetesExecutorCustomFeatureConfigStep`. ###

[GitHub] [spark] Yikun commented on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1037039764 @dongjoon-hyun @william-wang @martin-g Thanks for all helps, it's also works and passed all integration test on my arm64 env. Env info: ```

[GitHub] [spark] weixiuli commented on pull request #35492: [SPARK-38191][CORE] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.

2022-02-11 Thread GitBox
weixiuli commented on pull request #35492: URL: https://github.com/apache/spark/pull/35492#issuecomment-1036975951 The stagingDir will be called many times in commitJob, especially in traversing partitionPaths when the dynamicPartitionOverwrite is true.

[GitHub] [spark] github-actions[bot] closed pull request #34457: [SPARK-37178][ML] Add Target Encoding to ml.feature

2022-02-11 Thread GitBox
github-actions[bot] closed pull request #34457: URL: https://github.com/apache/spark/pull/34457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] github-actions[bot] closed pull request #34468: [SPARK-37194][SQL] Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition

2022-02-11 Thread GitBox
github-actions[bot] closed pull request #34468: URL: https://github.com/apache/spark/pull/34468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] sunchao commented on a change in pull request #35262: [SPARK-37974][SQL] Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35262: URL: https://github.com/apache/spark/pull/35262#discussion_r805092409 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedDeltaLengthByteArrayReader.java ## @@ -0,0 +1,103 @@

[GitHub] [spark] sunchao commented on a change in pull request #35262: [SPARK-37974][SQL] Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35262: URL: https://github.com/apache/spark/pull/35262#discussion_r805092339 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedDeltaLengthByteArrayReader.java ## @@ -0,0 +1,103 @@

[GitHub] [spark] LuciferYang commented on a change in pull request #35477: [SPARK-38175][CORE][SQL][SS][DSTREAM][MESOS][WEBUI] Clean up unused parameters in private methods signature

2022-02-11 Thread GitBox
LuciferYang commented on a change in pull request #35477: URL: https://github.com/apache/spark/pull/35477#discussion_r805099303 ## File path: core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala ## @@ -32,23 +32,18 @@ class

[GitHub] [spark] LuciferYang commented on a change in pull request #35493: [SPAPRK-38192][CORE][TESTS] Fix potential resource leak in `kvstore.RocksDBSuite`

2022-02-11 Thread GitBox
LuciferYang commented on a change in pull request #35493: URL: https://github.com/apache/spark/pull/35493#discussion_r805095956 ## File path: common/kvstore/src/test/java/org/apache/spark/util/kvstore/RocksDBSuite.java ## @@ -330,15 +330,16 @@ private int countKeys(Class

[GitHub] [spark] dongjoon-hyun commented on pull request #35493: [SPAPRK-38192][CORE][TESTS] Use `try-with-resources` in `Level/RocksDBSuite.java`

2022-02-11 Thread GitBox
dongjoon-hyun commented on pull request #35493: URL: https://github.com/apache/spark/pull/35493#issuecomment-1036918282 Thank you for update. Please make the PR description up-to-date too. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] LuciferYang commented on pull request #35493: [SPAPRK-38192][CORE][TESTS] Use `try-with-resources` in `Level/RocksDBSuite.java`

2022-02-11 Thread GitBox
LuciferYang commented on pull request #35493: URL: https://github.com/apache/spark/pull/35493#issuecomment-1036938739 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer opened a new pull request #35494: [SPARK-37960][SQL][FOLLOWUP] Reactor framework so as JDBC dialect could compile expression by self way

2022-02-11 Thread GitBox
beliefer opened a new pull request #35494: URL: https://github.com/apache/spark/pull/35494 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/35248 provides a new framework to represent catalyst expressions in DS V2 APIs. Because the framework

[GitHub] [spark] weixiuli removed a comment on pull request #35492: [SPARK-38191][CORE] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.

2022-02-11 Thread GitBox
weixiuli removed a comment on pull request #35492: URL: https://github.com/apache/spark/pull/35492#issuecomment-1036975951 The stagingDir will be called many times in commitJob, especially in traversing partitionPaths when the dynamicPartitionOverwrite is true.

[GitHub] [spark] weixiuli commented on a change in pull request #35492: [SPARK-38191][CORE] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.

2022-02-11 Thread GitBox
weixiuli commented on a change in pull request #35492: URL: https://github.com/apache/spark/pull/35492#discussion_r805112929 ## File path: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala ## @@ -104,7 +104,7 @@ class

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] Yikun opened a new pull request #35495: [SPARK-36061][K8S] Add `volcano` built-in module and feature step

2022-02-11 Thread GitBox
Yikun opened a new pull request #35495: URL: https://github.com/apache/spark/pull/35495 ### What changes were proposed in this pull request? This patch added volcano feature step to help user integrate spark with Volcano Scheduler. - Add a VolcanoFeatureStep, it can be used in driver

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] Yikun commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r805128127 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] beliefer commented on pull request #35494: [SPARK-37960][SQL][FOLLOWUP] Reactor framework so as JDBC dialect could compile expression by self way

2022-02-11 Thread GitBox
beliefer commented on pull request #35494: URL: https://github.com/apache/spark/pull/35494#issuecomment-1037039127 ping @huaxingao cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1037039764 @dongjoon-hyun @william-wang @martin-g Thanks for all helps, it's also works and passed all integration test on my arm64 env. Env info: ```

[GitHub] [spark] amaliujia commented on a change in pull request #35404: [SPARK-38118][SQL] Func(wrong data type) in the HAVING claus should throw data mismatch error

2022-02-11 Thread GitBox
amaliujia commented on a change in pull request #35404: URL: https://github.com/apache/spark/pull/35404#discussion_r804911364 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -4281,6 +4281,44 @@ class SQLQuerySuite extends QueryTest with

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35477: [SPARK-38175][CORE][SQL][SS][DSTREAM][MESOS][WEBUI] Clean up unused parameters in private methods signature

2022-02-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #35477: URL: https://github.com/apache/spark/pull/35477#discussion_r804936754 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamOptions.scala ## @@ -33,9 +33,9 @@ class

[GitHub] [spark] AmplabJenkins commented on pull request #35472: [SPARK-38170][TESTS][SQL] Make ThriftServerWithSparkContextInHttpSuite pass with ANSI enabled

2022-02-11 Thread GitBox
AmplabJenkins commented on pull request #35472: URL: https://github.com/apache/spark/pull/35472#issuecomment-1036534510 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35491: [SPARK-38186][SQL] Improve the README of Spark docs

2022-02-11 Thread GitBox
xinrong-databricks commented on a change in pull request #35491: URL: https://github.com/apache/spark/pull/35491#discussion_r804423977 ## File path: docs/README.md ## @@ -48,23 +48,9 @@ $ bundle install Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may

[GitHub] [spark] AngersZhuuuu commented on pull request #35476: [SPARK-38173][SQL] Quoted column cannot be recognized correctly when quotedRegexColumnNa…

2022-02-11 Thread GitBox
AngersZh commented on pull request #35476: URL: https://github.com/apache/spark/pull/35476#issuecomment-1035965380 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804428530 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -64,13 +64,23 @@ private long

[GitHub] [spark] cloud-fan commented on pull request #35335: [SPARK-38036][SQL][TESTS] Refactor `VersionsSuite` to `HiveClientSuite` and make it a subclass of `HiveVersionSuite`

2022-02-11 Thread GitBox
cloud-fan commented on pull request #35335: URL: https://github.com/apache/spark/pull/35335#issuecomment-1035960449 merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zsxwing commented on a change in pull request #35462: [WIP][SPARK-31709][SQL][FOLLOWUP] Treat table location as absolute when the first letter of its path is slash in create/alter tab

2022-02-11 Thread GitBox
zsxwing commented on a change in pull request #35462: URL: https://github.com/apache/spark/pull/35462#discussion_r804424593 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ## @@ -388,6 +388,8 @@ class SessionCatalog(

[GitHub] [spark] xinrong-databricks commented on pull request #35491: [SPARK-38186][SQL] Improve the README of Spark docs

2022-02-11 Thread GitBox
xinrong-databricks commented on pull request #35491: URL: https://github.com/apache/spark/pull/35491#issuecomment-1035960695 Thank you @gengliangwang ! That's super helpful for future onboarding, and the page looks more clear now! -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan closed pull request #35335: [SPARK-38036][SQL][TESTS] Refactor `VersionsSuite` to `HiveClientSuite` and make it a subclass of `HiveVersionSuite`

2022-02-11 Thread GitBox
cloud-fan closed pull request #35335: URL: https://github.com/apache/spark/pull/35335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] gengliangwang commented on a change in pull request #35491: [SPARK-38186][SQL] Improve the README of Spark docs

2022-02-11 Thread GitBox
gengliangwang commented on a change in pull request #35491: URL: https://github.com/apache/spark/pull/35491#discussion_r804425540 ## File path: docs/README.md ## @@ -48,23 +48,9 @@ $ bundle install Note: If you are on a system with both Ruby 1.9 and Ruby 2.0 you may need to

[GitHub] [spark] weixiuli edited a comment on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

2022-02-11 Thread GitBox
weixiuli edited a comment on pull request #35425: URL: https://github.com/apache/spark/pull/35425#issuecomment-1035958085 @cloud-fan I agree with you, BTW disable timeout for all broadcast stages in AQE is not reasonable, right? should we disable timeout for the broadcast stage that is

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804425518 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ## @@ -661,21 +671,22 @@ public final int

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804426890 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ## @@ -661,21 +671,22 @@ public final int

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804428530 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -64,13 +64,23 @@ private long

[GitHub] [spark] william-wang commented on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
william-wang commented on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036193524 > Ur, it seems to fail. @dongjoon-hyun The issue is fixed :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #35290: [SPARK-37865][SQL][3.0]Fix union bug when the first child of union has duplicate columns

2022-02-11 Thread GitBox
cloud-fan commented on pull request #35290: URL: https://github.com/apache/spark/pull/35290#issuecomment-1036198728 > Spark will check if the name could be mapped to multiple attr ids, if not(like our cases: a#1, a#1), it won't fail because Spark thinks that two columns are totally the

[GitHub] [spark] cloud-fan closed pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
cloud-fan closed pull request #35490: URL: https://github.com/apache/spark/pull/35490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #35490: [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

2022-02-11 Thread GitBox
cloud-fan commented on pull request #35490: URL: https://github.com/apache/spark/pull/35490#issuecomment-1036201736 thanks, merging to master/3.2! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] srowen commented on a change in pull request #34846: [SPARK-37593][CORE] Optimize HeapMemoryAllocator to avoid memory waste when using G1GC

2022-02-11 Thread GitBox
srowen commented on a change in pull request #34846: URL: https://github.com/apache/spark/pull/34846#discussion_r804655594 ## File path: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala ## @@ -254,10 +259,16 @@ private[spark] abstract class MemoryManager(

[GitHub] [spark] martin-g commented on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
martin-g commented on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036228443 > The issue is fixed :) Confirmed! Works here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

2022-02-11 Thread GitBox
cloud-fan commented on pull request #35425: URL: https://github.com/apache/spark/pull/35425#issuecomment-1035975940 I don't think AQE makes a big difference here. The normal broadcast can also hit this problem if there are many concurrent queries. The broadcast timeout is not a proper

[GitHub] [spark] viirya commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
viirya commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804446180 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -64,13 +64,23 @@ private long

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35415: [SPARK-37507][SQL] Add a new SQL function to_binary

2022-02-11 Thread GitBox
HyukjinKwon commented on a change in pull request #35415: URL: https://github.com/apache/spark/pull/35415#discussion_r804493331 ## File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md ## @@ -1,6 +1,6 @@ ## Summary - - Number of queries: 378 + -

[GitHub] [spark] HyukjinKwon commented on pull request #35489: [SPARK-38184][SQL][DOCS] Fix malformatted ExpressionDescription of decode

2022-02-11 Thread GitBox
HyukjinKwon commented on pull request #35489: URL: https://github.com/apache/spark/pull/35489#issuecomment-1036032729 Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036036259 Failed test are unrelated: `- job with fetch failure *** FAILED *** (51 milliseconds)`, rerun. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] Yikun commented on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036036259 Failed test are unrelated: `- job with fetch failure *** FAILED *** (51 milliseconds)`, recheck. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] bjornjorgensen commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

2022-02-11 Thread GitBox
bjornjorgensen commented on pull request #35488: URL: https://github.com/apache/spark/pull/35488#issuecomment-1036045494 SQL ANSI mode 'spark.sql.ansi.enabled' is set to True. This is an experimental config. For more information spark.apache.org/docs/latest/sql-ref-ansi-compliance.html

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm, will adress soon, cc @william-wang https://github.com/volcano-sh/volcano/issues/2010

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
sunchao commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804439406 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ## @@ -661,21 +671,22 @@ public final int

[GitHub] [spark] LuciferYang commented on pull request #35335: [SPARK-38036][SQL][TESTS] Refactor `VersionsSuite` to `HiveClientSuite` and make it a subclass of `HiveVersionSuite`

2022-02-11 Thread GitBox
LuciferYang commented on pull request #35335: URL: https://github.com/apache/spark/pull/35335#issuecomment-1036014531 thanks @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #35491: [SPARK-38186][SQL] Improve the README of Spark docs

2022-02-11 Thread GitBox
HyukjinKwon commented on pull request #35491: URL: https://github.com/apache/spark/pull/35491#issuecomment-1036032007 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #35489: [SPARK-38184][SQL][DOCS] Fix malformatted ExpressionDescription of decode

2022-02-11 Thread GitBox
HyukjinKwon closed pull request #35489: URL: https://github.com/apache/spark/pull/35489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon closed pull request #35491: [SPARK-38186][SQL] Improve the README of Spark docs

2022-02-11 Thread GitBox
HyukjinKwon closed pull request #35491: URL: https://github.com/apache/spark/pull/35491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

2022-02-11 Thread GitBox
HyukjinKwon commented on a change in pull request #35488: URL: https://github.com/apache/spark/pull/35488#discussion_r804497038 ## File path: python/pyspark/pandas/utils.py ## @@ -467,11 +467,16 @@ def is_testing() -> bool: def default_session() -> SparkSession: spark

[GitHub] [spark] Yikun commented on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun commented on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm, will adress soon, cc @william-wang https://github.com/volcano-sh/volcano/issues/2010 > Do

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm, will adress soon, cc @william-wang https://github.com/volcano-sh/volcano/issues/2010

[GitHub] [spark] HyukjinKwon commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

2022-02-11 Thread GitBox
HyukjinKwon commented on pull request #35488: URL: https://github.com/apache/spark/pull/35488#issuecomment-1036040353 The configuration is off by default, so should probably fine .. but would be great to list up few examples of the unexpected behaviour in the PR description as

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, will adress soon by cc @william-wang

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, will adress soon by cc @william-wang

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, but will adress soon by cc @william-wang

[GitHub] [spark] cloud-fan commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-02-11 Thread GitBox
cloud-fan commented on a change in pull request #35483: URL: https://github.com/apache/spark/pull/35483#discussion_r804437630 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ## @@ -661,21 +671,22 @@ public final int

[GitHub] [spark] chasingegg commented on pull request #35290: [SPARK-37865][SQL][3.0]Fix union bug when the first child of union has duplicate columns

2022-02-11 Thread GitBox
chasingegg commented on pull request #35290: URL: https://github.com/apache/spark/pull/35290#issuecomment-1035997195 > > My point is when you do select a from (select a, a union select a, b), it will not fail but produce unexpected result. > > Are you sure this won't fail? I thought

[GitHub] [spark] bjornjorgensen commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

2022-02-11 Thread GitBox
bjornjorgensen commented on pull request #35488: URL: https://github.com/apache/spark/pull/35488#issuecomment-1036034626 "This can cause the unexpected behavior" But what is the unexpected behavior? If this is something big, then we need some documentation for this. -- This is

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, but will adress soon by cc @william-wang

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, but will adress soon by cc @william-wang

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, but will adress soon by cc @william-wang

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on arm deploy, but will adress soon by cc @william-wang

[GitHub] [spark] steven-aerts commented on a change in pull request #35426: [SPARK-38130][SQL] Remove array_sort orderable entries check

2022-02-11 Thread GitBox
steven-aerts commented on a change in pull request #35426: URL: https://github.com/apache/spark/pull/35426#discussion_r804596191 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ## @@ -388,18 +388,13 @@ case class

[GitHub] [spark] LuciferYang commented on pull request #35477: [SPARK-38175][SQL] Clean up unused parameters of private methods

2022-02-11 Thread GitBox
LuciferYang commented on pull request #35477: URL: https://github.com/apache/spark/pull/35477#issuecomment-1036156572 There are still some cases to be checked -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] weixiuli opened a new pull request #35492: [SPARK-38191][CORE] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol.

2022-02-11 Thread GitBox
weixiuli opened a new pull request #35492: URL: https://github.com/apache/spark/pull/35492 ### What changes were proposed in this pull request? Replace the stagingDir method with a stagingDir constant in HadoopMapReduceCommitProtocol. ### Why are the changes needed?

[GitHub] [spark] xinrong-databricks commented on pull request #35415: [SPARK-37507][SQL] Add a new SQL function to_binary

2022-02-11 Thread GitBox
xinrong-databricks commented on pull request #35415: URL: https://github.com/apache/spark/pull/35415#issuecomment-1036185494 Sorry I rebased the PR so the commit history looks confusing, but only two commits below are new:

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35415: [SPARK-37507][SQL] Add a new SQL function to_binary

2022-02-11 Thread GitBox
xinrong-databricks commented on a change in pull request #35415: URL: https://github.com/apache/spark/pull/35415#discussion_r804623336 ## File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md ## @@ -1,6 +1,6 @@ ## Summary - - Number of queries: 378

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35415: [SPARK-37507][SQL] Add a new SQL function to_binary

2022-02-11 Thread GitBox
xinrong-databricks commented on a change in pull request #35415: URL: https://github.com/apache/spark/pull/35415#discussion_r803360535 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -2538,6 +2538,76 @@ case

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
HyukjinKwon commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r804569603 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

[GitHub] [spark] Yikun edited a comment on pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
Yikun edited a comment on pull request #35422: URL: https://github.com/apache/spark/pull/35422#issuecomment-1036035298 > Ur, it seems to fail. @dongjoon-hyun yep, there are some failure on volcano arm-deploy yaml, but will adress soon by cc @william-wang

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35422: [SPARK-36061][K8S] Add `volcano` module and feature step

2022-02-11 Thread GitBox
HyukjinKwon commented on a change in pull request #35422: URL: https://github.com/apache/spark/pull/35422#discussion_r804539032 ## File path: resource-managers/kubernetes/core/pom.xml ## @@ -29,8 +29,25 @@ Spark Project Kubernetes kubernetes +**/*Volcano*.scala

  1   2   >