[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024522973 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-16 Thread GitBox
amaliujia commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1024541016 ## python/pyspark/sql/connect/dataframe.py: ## @@ -667,12 +668,70 @@ def schema(self) -> StructType: else: return self._schema -def

[GitHub] [spark] HeartSaVioR closed pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR closed pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling URL: https://github.com/apache/spark/pull/38384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] huskysun opened a new pull request, #38679: [SPARK-40671][Kubernetes] Add configurability to customize labels of driver service object

2022-11-16 Thread GitBox
huskysun opened a new pull request, #38679: URL: https://github.com/apache/spark/pull/38679 ### What changes were proposed in this pull request? This PR to add configurability to customize driver service object labels when running Spark on k8s. The new config is

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38618: [SPARK-41108][SPARK-41005][CONNECT][FOLLOW-UP] Deduplicate ArrowConverters codes

2022-11-16 Thread GitBox
HyukjinKwon commented on code in PR #38618: URL: https://github.com/apache/spark/pull/38618#discussion_r1024604820 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -71,158 +71,146 @@ private[sql] class ArrowBatchStreamWriter( }

[GitHub] [spark] hvanhovell commented on a diff in pull request #38618: [SPARK-41108][SPARK-41005][CONNECT][FOLLOW-UP] Deduplicate ArrowConverters codes

2022-11-16 Thread GitBox
hvanhovell commented on code in PR #38618: URL: https://github.com/apache/spark/pull/38618#discussion_r1024455483 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -71,158 +71,146 @@ private[sql] class ArrowBatchStreamWriter( }

[GitHub] [spark] HeartSaVioR commented on pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-16 Thread GitBox
HeartSaVioR commented on PR #38503: URL: https://github.com/apache/spark/pull/38503#issuecomment-1317644092 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024522973 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024522973 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
HeartSaVioR commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024522973 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-16 Thread GitBox
amaliujia commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1024541016 ## python/pyspark/sql/connect/dataframe.py: ## @@ -667,12 +668,70 @@ def schema(self) -> StructType: else: return self._schema -def

[GitHub] [spark] rangadi commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
rangadi commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024614312 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] asfgit closed pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
asfgit closed pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission URL: https://github.com/apache/spark/pull/38441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] mridulm commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
mridulm commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1024857218 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,18 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] mridulm commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
mridulm commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1024857218 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,18 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1318228713 The test failure looks unrelated, can you retrigger the tests @gaoyajun02 ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk opened a new pull request, #38685: [WIP][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-16 Thread GitBox
MaxGekk opened a new pull request, #38685: URL: https://github.com/apache/spark/pull/38685 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] mridulm commented on pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-16 Thread GitBox
mridulm commented on PR #38441: URL: https://github.com/apache/spark/pull/38441#issuecomment-1318221059 Merged to master. Thanks for working on this @warrenzhu25 ! Thanks for the reviews @dongjoon-hyun, @Ngone51 :-) -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] mridulm commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-16 Thread GitBox
mridulm commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1318223807 Would be better for @tgravescs to take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-16 Thread GitBox
mridulm commented on PR #38668: URL: https://github.com/apache/spark/pull/38668#issuecomment-1318223332 +CC @dongjoon-hyun, @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38630: URL: https://github.com/apache/spark/pull/38630#issuecomment-1317843773 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38666: URL: https://github.com/apache/spark/pull/38666#issuecomment-1317892003 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-16 Thread GitBox
zhengruifeng closed pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests URL: https://github.com/apache/spark/pull/38666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
dongjoon-hyun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317949502 Oh, what I meant was this commit addressing my comment was a minor documentation fix. Your PR is not trivial at all. :) -

[GitHub] [spark] LuciferYang commented on pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-16 Thread GitBox
LuciferYang commented on PR #38610: URL: https://github.com/apache/spark/pull/38610#issuecomment-1317963555 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc-gen-grpc-

2022-11-16 Thread GitBox
LuciferYang commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1317964560 Any other changes? @HyukjinKwon @grundprinzip @amaliujia Thanks ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] amaliujia commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc-gen-grpc-ja

2022-11-16 Thread GitBox
amaliujia commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1318054981 Checking with @grundprinzip to see if there are more comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] Yaohua628 closed pull request #38663: [SPARK-41143][SQL] Add named argument syntax support for table-valued function

2022-11-16 Thread GitBox
Yaohua628 closed pull request #38663: [SPARK-41143][SQL] Add named argument syntax support for table-valued function URL: https://github.com/apache/spark/pull/38663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wankunde commented on pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-16 Thread GitBox
wankunde commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1318136683 > Seems like it became slower after your PR (?) Sorry for the mistake, I have update the benchmark result, after this PR, `Query with LikeAny simplification` should be the same

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-16 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1024817626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,8 +1059,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] cloud-fan commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2022-11-16 Thread GitBox
cloud-fan commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1317931495 We should probably enrich the PR description to talk about the general approach. e.g. we add a virtual column to indicate the operation (delete, update, insert) -- This is an

[GitHub] [spark] huskysun commented on pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
huskysun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317947106 Oh thanks @dongjoon-hyun. It added a new spark config as code change, but yeah that's rather trivial. Thanks for the quick merge! So this change will be in `3.4.0`, right? Do you know

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
zhengruifeng commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024720949 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] wankunde opened a new pull request, #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-16 Thread GitBox
wankunde opened a new pull request, #38682: URL: https://github.com/apache/spark/pull/38682 ### What changes were proposed in this pull request? We can improve multi like by reorder the match expressions. ### Why are the changes needed? Local benchmark

[GitHub] [spark] zhengruifeng commented on pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38681: URL: https://github.com/apache/spark/pull/38681#issuecomment-1318050418 also cc @cloud-fan @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
amaliujia commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024749559 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] HyukjinKwon closed pull request #38673: [SPARK-41149][PYTHON] Fix `SparkSession.builder.config` to support bool

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38673: [SPARK-41149][PYTHON] Fix `SparkSession.builder.config` to support bool URL: https://github.com/apache/spark/pull/38673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38673: [SPARK-41149][PYTHON] Fix `SparkSession.builder.config` to support bool

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38673: URL: https://github.com/apache/spark/pull/38673#issuecomment-1318128005 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-16 Thread GitBox
HyukjinKwon commented on code in PR #38681: URL: https://github.com/apache/spark/pull/38681#discussion_r1024787984 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectServiceSuite.scala: ## @@ -55,4 +65,38 @@ class SparkConnectServiceSuite

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-16 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1318133748 cc: @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum closed pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
wangyum closed pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters URL: https://github.com/apache/spark/pull/38511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan opened a new pull request, #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order

2022-11-16 Thread GitBox
cloud-fan opened a new pull request, #38684: URL: https://github.com/apache/spark/pull/38684 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/38511 to fix a mistake: we should respect the original `Filter`

[GitHub] [spark] zhengruifeng commented on pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38666: URL: https://github.com/apache/spark/pull/38666#issuecomment-1317892984 oh, I forgot to mention that we should have a SPARK-X title -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] amaliujia commented on pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-16 Thread GitBox
amaliujia commented on PR #38666: URL: https://github.com/apache/spark/pull/38666#issuecomment-1317894265 @zhengruifeng I confirmed that for small doc change there is no need for a JIRA (that is why I didn't create one). -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
dongjoon-hyun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317950780 During Holiday Season (Thanksgiving + Christmas), we cannot make a new release. I guess the on-going 3.2.3 RC0 vote will be the last release in this year (if there is no urgent

[GitHub] [spark] LuciferYang commented on pull request #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite`

2022-11-16 Thread GitBox
LuciferYang commented on PR #38671: URL: https://github.com/apache/spark/pull/38671#issuecomment-1317963020 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
amaliujia commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024759018 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] HyukjinKwon commented on pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1318132557 Seems like it became slower after your PR (?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Yaohua628 opened a new pull request, #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-16 Thread GitBox
Yaohua628 opened a new pull request, #38683: URL: https://github.com/apache/spark/pull/38683 ### What changes were proposed in this pull request? In FileSourceStrategy, we add an Alias node to wrap the file metadata fields (e.g. file_name, file_size) in a NamedStruct

[GitHub] [spark] MaxGekk commented on a diff in pull request #38650: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-16 Thread GitBox
MaxGekk commented on code in PR #38650: URL: https://github.com/apache/spark/pull/38650#discussion_r1024814862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -369,7 +369,7 @@ class DataSourceV2Strategy(session:

[GitHub] [spark] cloud-fan commented on pull request #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order

2022-11-16 Thread GitBox
cloud-fan commented on PR #38684: URL: https://github.com/apache/spark/pull/38684#issuecomment-1318184753 cc @viirya @wangyum @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #34367: [SPARK-37099][SQL] Introduce a rank-based filter to optimize top-k computation

2022-11-16 Thread GitBox
github-actions[bot] closed pull request #34367: [SPARK-37099][SQL] Introduce a rank-based filter to optimize top-k computation URL: https://github.com/apache/spark/pull/34367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #38595: [SPARK-41090][SQL] Throw Exception for `db_name.view_name` when creating temp view by Dataset API

2022-11-16 Thread GitBox
cloud-fan commented on PR #38595: URL: https://github.com/apache/spark/pull/38595#issuecomment-1317932109 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38595: [SPARK-41090][SQL] Throw Exception for `db_name.view_name` when creating temp view by Dataset API

2022-11-16 Thread GitBox
cloud-fan closed pull request #38595: [SPARK-41090][SQL] Throw Exception for `db_name.view_name` when creating temp view by Dataset API URL: https://github.com/apache/spark/pull/38595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #38677: [SPARK-41150][PYTHON][DOCS] Document debugging with PySpark memory profiler

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38677: URL: https://github.com/apache/spark/pull/38677#issuecomment-1317938038 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38677: [SPARK-41150][PYTHON][DOCS] Document debugging with PySpark memory profiler

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38677: [SPARK-41150][PYTHON][DOCS] Document debugging with PySpark memory profiler URL: https://github.com/apache/spark/pull/38677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
dongjoon-hyun commented on code in PR #38679: URL: https://github.com/apache/spark/pull/38679#discussion_r1024675718 ## docs/running-on-kubernetes.md: ## @@ -856,6 +856,17 @@ See the [configuration page](configuration.html) for information on Spark config 2.3.0 + +

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38673: [SPARK-41149][PYTHON] Fix `SparkSession.builder.config` to support bool

2022-11-16 Thread GitBox
HyukjinKwon commented on code in PR #38673: URL: https://github.com/apache/spark/pull/38673#discussion_r1024678857 ## python/pyspark/sql/session.py: ## @@ -256,8 +256,12 @@ def config( self._options[k] = v elif map is not None:

[GitHub] [spark] rangadi opened a new pull request, #38680: [SPARK-40657][FOLLOWUP]Minor: Add clarifying comment in ProtobufUtils

2022-11-16 Thread GitBox
rangadi opened a new pull request, #38680: URL: https://github.com/apache/spark/pull/38680 ### What changes were proposed in this pull request? This is a follow up to address couple of comments in #38384. Fixes a comment and adds explanation about why we don't use

[GitHub] [spark] rangadi commented on pull request #38680: [SPARK-40657][FOLLOWUP]Minor: Add clarifying comment in ProtobufUtils

2022-11-16 Thread GitBox
rangadi commented on PR #38680: URL: https://github.com/apache/spark/pull/38680#issuecomment-1317967322 @HeartSaVioR PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] huskysun commented on pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
huskysun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317967794 Ah I see. Thanks for the clarification! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-16 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1024711225 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,8 +1059,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] aokolnychyi commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2022-11-16 Thread GitBox
aokolnychyi commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1318009704 @cloud-fan, sounds good. Will do by the end of this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
zhengruifeng commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024719459 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] cloud-fan commented on pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-16 Thread GitBox
cloud-fan commented on PR #38558: URL: https://github.com/apache/spark/pull/38558#issuecomment-1318015274 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-16 Thread GitBox
cloud-fan closed pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache URL: https://github.com/apache/spark/pull/38558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
zhengruifeng commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024753970 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-16 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1024775551 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,6 +60,7 @@ import

[GitHub] [spark] wangyum commented on pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-16 Thread GitBox
wangyum commented on PR #38511: URL: https://github.com/apache/spark/pull/38511#issuecomment-1318162616 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk commented on a diff in pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-16 Thread GitBox
MaxGekk commented on code in PR #38664: URL: https://github.com/apache/spark/pull/38664#discussion_r1024815444 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -146,7 +146,7 @@ object FunctionRegistryBase {

[GitHub] [spark] dongjoon-hyun commented on pull request #38679: [SPARK-40671][Kubernetes] Add configurability to customize labels of driver service object

2022-11-16 Thread GitBox
dongjoon-hyun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317939397 Hi, @huskysun . Your PR already passed the test here. - https://github.com/huskysun/spark/runs/9537511421 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] rangadi commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-16 Thread GitBox
rangadi commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1024692244 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-16 Thread GitBox
LuciferYang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1024721571 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def

[GitHub] [spark] LuciferYang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-16 Thread GitBox
LuciferYang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1024721724 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def

[GitHub] [spark] zhengruifeng commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
zhengruifeng commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1318017962 you may reformat the scala code by `./build/mvn -Pscala-2.12 scalafmt:format -Dscalafmt.skip=fase -Dscalafmt.validateOnly=false -Dscalafmt.changedOnly=false -pl

[GitHub] [spark] HyukjinKwon closed pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request URL: https://github.com/apache/spark/pull/38630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38635: URL: https://github.com/apache/spark/pull/38635#issuecomment-1317897259 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-16 Thread GitBox
HyukjinKwon commented on PR #38635: URL: https://github.com/apache/spark/pull/38635#issuecomment-1317897684 @bersprockets it has a conflict with branch-3.3. Feel free to create a backport Pr if you feel this is needed. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon closed pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-16 Thread GitBox
HyukjinKwon closed pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null` URL: https://github.com/apache/spark/pull/38635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] huskysun commented on pull request #38679: [SPARK-40671][Kubernetes] Add configurability to customize labels of driver service object

2022-11-16 Thread GitBox
huskysun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317937460 @dongjoon-hyun Hi Dongjoon, could you please take a look at this and give it an "ok to test" (I'm following the steps

[GitHub] [spark] dongjoon-hyun commented on pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
dongjoon-hyun commented on PR #38679: URL: https://github.com/apache/spark/pull/38679#issuecomment-1317944729 Since it's a minor documentation change, I updated your PR and merged to master. Thank you, @huskysun . -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dongjoon-hyun closed pull request #38679: [SPARK-40671][K8S] Support driver service labels

2022-11-16 Thread GitBox
dongjoon-hyun closed pull request #38679: [SPARK-40671][K8S] Support driver service labels URL: https://github.com/apache/spark/pull/38679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell opened a new pull request, #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-16 Thread GitBox
hvanhovell opened a new pull request, #38681: URL: https://github.com/apache/spark/pull/38681 ### What changes were proposed in this pull request? Two changes: 1. Make sure connect's arrow result path properly deals with errors, and avoids hangs. 2. Fix a common source of

[GitHub] [spark] amaliujia commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-16 Thread GitBox
amaliujia commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1318074158 You can also run the scala lint locally `./dev/lint-scala` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] viirya commented on a diff in pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
viirya commented on code in PR #38669: URL: https://github.com/apache/spark/pull/38669#discussion_r1023640393 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/SchemaColumnConvertNotSupportedException.java: ## @@ -54,7 +54,8 @@ public

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-16 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1316591989 Thanks @wangyum too! Previous run passed all tests. Only Java linter failed. I will merge this once Java Linter passes. -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on a diff in pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38464: URL: https://github.com/apache/spark/pull/38464#discussion_r1023678622 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala: ## @@ -65,7 +70,7 @@ case class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-16 Thread GitBox
cloud-fan commented on code in PR #38464: URL: https://github.com/apache/spark/pull/38464#discussion_r1023684019 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala: ## @@ -77,6 +82,24 @@ case class

[GitHub] [spark] cloud-fan commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-16 Thread GitBox
cloud-fan commented on PR #38464: URL: https://github.com/apache/spark/pull/38464#issuecomment-1316611596 I agree with using bloom filters, as the size estimation can be wrong and the build size can be too large that `InSubquery` can't work. However, this PR contains another optimization

[GitHub] [spark] cloud-fan commented on pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-16 Thread GitBox
cloud-fan commented on PR #38605: URL: https://github.com/apache/spark/pull/38605#issuecomment-1316616610 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-16 Thread GitBox
cloud-fan closed pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages URL: https://github.com/apache/spark/pull/38605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang opened a new pull request, #38671: [SPARK-41158][SQL][TESTS] Use `checkError()` to check `DATATYPE_MISMATCH` in `DataFrameFunctionsSuite`

2022-11-16 Thread GitBox
LuciferYang opened a new pull request, #38671: URL: https://github.com/apache/spark/pull/38671 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

<    1   2