Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
ueshin commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1572787315 ## python/pyspark/sql/dataframe.py: ## @@ -139,51 +123,29 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): created via using the constructor. """

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-19 Thread via GitHub
ericm-db commented on PR #45991: URL: https://github.com/apache/spark/pull/45991#issuecomment-2067115942 @HeartSaVioR PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572833180 ## .github/workflows/build_and_test.yml: ## @@ -26,4 +26,20 @@ jobs: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with:

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572834233 ## .gitignore: ## @@ -16,3 +16,30 @@ build dependencies.lock **/dependencies.lock gradle/wrapper/gradle-wrapper.jar + +# Compiled source #

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572840033 ## config/checkstyle/checkstyle.xml: ## @@ -0,0 +1,195 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + Review Comment:

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-19 Thread via GitHub
steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2067069380 So both those bindings hand off to PathOutputCommitterFactory(), which looks for a committer from the config key mapreduce.outputcommitter.factory.class

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
sahnib commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572618909 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572837826 ## build.gradle: ## @@ -1,3 +1,16 @@ +buildscript { + repositories { +maven { + url = uri("https://plugins.gradle.org/m2/;) +} + } +

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572836874 ## build-tools/helm/spark-kubernetes-operator/values.yaml: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or

[PR] [SPARK-47920] add doc for python streaming data source API [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 opened a new pull request, #46139: URL: https://github.com/apache/spark/pull/46139 ### What changes were proposed in this pull request? add doc for python streaming data source API ### Why are the changes needed? Add user guide to help user develop

Re: [PR] [SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder [spark]

2024-04-19 Thread via GitHub
allisonwang-db commented on PR #46140: URL: https://github.com/apache/spark/pull/46140#issuecomment-2067273091 cc @jasonli-db @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45265][SQL] Supporting Hive 4.0 Metastore [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun closed pull request #45801: [SPARK-45265][SQL] Supporting Hive 4.0 Metastore URL: https://github.com/apache/spark/pull/45801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [WIP][SPARK-47907] Put bang under a config [spark]

2024-04-19 Thread via GitHub
srielau opened a new pull request, #46138: URL: https://github.com/apache/spark/pull/46138 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572914583 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572915019 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

[PR] [SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder [spark]

2024-04-19 Thread via GitHub
allisonwang-db opened a new pull request, #46140: URL: https://github.com/apache/spark/pull/46140 ### What changes were proposed in this pull request? This PR fixes a bug in the ExecuteJobTag creation in ExecuteHolder. The sessionId and userId are reversed. ### Why are

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub
rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572995277 ## python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py: ## @@ -63,8 +63,13 @@ def main(infile: IO, outfile: IO) -> None: spark =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub
rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572997327 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub
rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572997060 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub
ericm-db commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1573000267 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572833649 ## .gitignore: ## @@ -16,3 +16,30 @@ build dependencies.lock **/dependencies.lock gradle/wrapper/gradle-wrapper.jar + +# Compiled source #

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572836113 ## build-tools/helm/spark-kubernetes-operator/values.yaml: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-19 Thread via GitHub
vladimirg-db commented on code in PR #46082: URL: https://github.com/apache/spark/pull/46082#discussion_r1572665195 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -359,10 +414,97 @@ public boolean startsWith(final UTF8String prefix) {

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572840033 ## config/checkstyle/checkstyle.xml: ## @@ -0,0 +1,195 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + Review Comment:

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on PR #46129: URL: https://github.com/apache/spark/pull/46129#issuecomment-2067140476 Will fix up the tests soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
allisonwang-db commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572908487 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

[PR] Try parse json [spark]

2024-04-19 Thread via GitHub
harshmotw-db opened a new pull request, #46141: URL: https://github.com/apache/spark/pull/46141 ### What changes were proposed in this pull request? This pull request implements the `try_parse_json` that runs `parse_json` on string expressions to extract variants. However, if

Re: [PR] [SPARK-45709][BUILD] Deploy packages when all packages are built [spark]

2024-04-19 Thread via GitHub
github-actions[bot] commented on PR #43561: URL: https://github.com/apache/spark/pull/43561#issuecomment-2067417258 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573092072 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573096494 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573100690 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573100752 ## python/pyspark/sql/worker/plan_data_source_read.py: ## @@ -51,6 +52,71 @@ ) +def records_to_arrow_batches( +output_iter: Iterator[Tuple], +

Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-19 Thread via GitHub
srielau commented on PR #46138: URL: https://github.com/apache/spark/pull/46138#issuecomment-2067511478 @cloud-fan @gengliangwang This is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #46142: URL: https://github.com/apache/spark/pull/46142 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573085009 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): Review

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573085074 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573094548 ## python/pyspark/sql/utils.py: ## @@ -302,6 +302,33 @@ def wrapped(*args: Any, **kwargs: Any) -> Any: return cast(FuncT, wrapped) +def

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573094799 ## python/pyspark/sql/connect/session.py: ## @@ -325,7 +325,7 @@ def active(cls) -> "SparkSession": active.__doc__ = PySparkSession.active.__doc__ -

[PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-19 Thread via GitHub
panbingkun opened a new pull request, #46146: URL: https://github.com/apache/spark/pull/46146 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on PR #46145: URL: https://github.com/apache/spark/pull/46145#issuecomment-2067526908 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun closed pull request #46145: [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` URL: https://github.com/apache/spark/pull/46145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun closed pull request #46144: [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` URL: https://github.com/apache/spark/pull/46144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2024-04-19 Thread via GitHub
github-actions[bot] commented on PR #41518: URL: https://github.com/apache/spark/pull/41518#issuecomment-2067417265 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-19 Thread via GitHub
harshmotw-db commented on PR #46141: URL: https://github.com/apache/spark/pull/46141#issuecomment-2067417422 cc @chenhao-db @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub
CTCC1 commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573145872 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column: repeat.__doc__

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095413 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095371 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573095590 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun closed pull request #46142: [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 URL: https://github.com/apache/spark/pull/46142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on PR #46142: URL: https://github.com/apache/spark/pull/46142#issuecomment-2067525074 Thank you, @HyukjinKwon ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun commented on PR #46144: URL: https://github.com/apache/spark/pull/46144#issuecomment-2067525455 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [WIP] test java toUpperCase & toLowerCase [spark]

2024-04-19 Thread via GitHub
panbingkun opened a new pull request, #46147: URL: https://github.com/apache/spark/pull/46147 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub
liucao-dd commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573144905 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column:

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub
liucao-dd commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573144905 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column:

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub
harshmotw-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573031748 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value,

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub
harshmotw-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573032141 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value,

[PR] [WIP][SPARK-47672][SQL] Avoid double eval from filter pushDown w/ projection pushdown [spark]

2024-04-19 Thread via GitHub
holdenk opened a new pull request, #46143: URL: https://github.com/apache/spark/pull/46143 ### What changes were proposed in this pull request? Changes the filter pushDown optimizer to not push down past projections of the same element if we reasonable expect that computing that

[PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #46144: URL: https://github.com/apache/spark/pull/46144 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095371 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub
gene-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573029280 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value, self.metadata) +

[PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #46145: URL: https://github.com/apache/spark/pull/46145 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-19 Thread via GitHub
panbingkun commented on PR #46146: URL: https://github.com/apache/spark/pull/46146#issuecomment-2067487411 At present, we are only testing `rocksdbjni's` `9 series` in advance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573096494 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47902][SQL]Making Compute Current Time* expressions foldable [spark]

2024-04-19 Thread via GitHub
dbatomic commented on code in PR #46120: URL: https://github.com/apache/spark/pull/46120#discussion_r1571993071 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantFoldingSuite.scala: ## @@ -437,6 +437,21 @@ class ConstantFoldingSuite extends PlanTest

[PR] [WIP] Move `src/test/java/test/*` to `src/test/java/*` [spark]

2024-04-19 Thread via GitHub
panbingkun opened a new pull request, #46134: URL: https://github.com/apache/spark/pull/46134 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-47914][SQL] Do not display the splits parameter in Rang [spark]

2024-04-19 Thread via GitHub
guixiaowen opened a new pull request, #46136: URL: https://github.com/apache/spark/pull/46136 ### What changes were proposed in this pull request? [SQL] explain extended select * from range(0, 4); Before this pr, the split is also displayed in the logical execution

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571894582 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -425,6 +421,74 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-46632][SQL] EquivalentExpressions addExprTree should allow all type of expressions [spark]

2024-04-19 Thread via GitHub
zml1206 commented on code in PR #45894: URL: https://github.com/apache/spark/pull/45894#discussion_r1571913735 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -193,7 +193,9 @@ class EquivalentExpressions( if

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571923683 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub
uros-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1572003918 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -161,4 +162,40 @@ class CollationExpressionSuite extends

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1572006465 ## python/pyspark/sql/connect/dataframe.py: ## @@ -2306,7 +2183,7 @@ def _test() -> None: ) (failure_count, test_count) = doctest.testmod( -

[PR] Fix subexpression elimination when equivalent ternary expressions have different children [spark]

2024-04-19 Thread via GitHub
zml1206 opened a new pull request, #46135: URL: https://github.com/apache/spark/pull/46135 ### What changes were proposed in this pull request? Remove unexpected exception thrown in `EquivalentExpressions.updateExprInMap()`. Equivalent expressions may contain different

Re: [PR] [SPARK-47911][SQL] Introduces a universal BinaryFormatter to make binary output consistent [spark]

2024-04-19 Thread via GitHub
yaooqinn commented on code in PR #46133: URL: https://github.com/apache/spark/pull/46133#discussion_r1572038458 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ToStringBase.scala: ## @@ -414,3 +413,24 @@ trait ToStringBase { self: UnaryExpression with

Re: [PR] [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children [spark]

2024-04-19 Thread via GitHub
zml1206 commented on PR #46135: URL: https://github.com/apache/spark/pull/46135#issuecomment-2066175301 cc @peter-toth @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` [spark]

2024-04-19 Thread via GitHub
zhengruifeng commented on PR #46128: URL: https://github.com/apache/spark/pull/46128#issuecomment-2066182638 thanks @HyukjinKwon merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` [spark]

2024-04-19 Thread via GitHub
zhengruifeng closed pull request #46128: [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` URL: https://github.com/apache/spark/pull/46128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-19 Thread via GitHub
LuciferYang commented on code in PR #46123: URL: https://github.com/apache/spark/pull/46123#discussion_r1571874614 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2485,6 +2485,30 @@ class PlanGenerationTestSuite

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on PR #46041: URL: https://github.com/apache/spark/pull/46041#issuecomment-2065856516 @uros-db addressed the latest requested fixes accordingly. please re-review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [SPARK-47912][SQL] Infer serde class from format classes [spark]

2024-04-19 Thread via GitHub
wForget opened a new pull request, #46132: URL: https://github.com/apache/spark/pull/46132 ### What changes were proposed in this pull request? Infer serde class from format classes. ### Why are the changes needed? File format of insert overwrite dir does not

[PR] [SPARK-47911][SQL] Introduces a universal BinaryFormatter to make binary output consistent [spark]

2024-04-19 Thread via GitHub
yaooqinn opened a new pull request, #46133: URL: https://github.com/apache/spark/pull/46133 ### What changes were proposed in this pull request? This PR introduces a universal BinaryFormatter to make binary output consistent across all clients, such as `beeline`,

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub
mihailom-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1571957856 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -52,6 +52,11 @@ object CollationTypeCasts extends

Re: [PR] [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token [spark]

2024-04-19 Thread via GitHub
yaooqinn commented on PR #46115: URL: https://github.com/apache/spark/pull/46115#issuecomment-2065824066 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
zhengruifeng commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571874148 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571887028 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -323,10 +323,6 @@ class CollationStringExpressionsSuite

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub
mihailom-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1571964646 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -161,4 +162,40 @@ class CollationExpressionSuite

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-19 Thread via GitHub
yaooqinn closed pull request #46028: [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException URL: https://github.com/apache/spark/pull/46028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on PR #46040: URL: https://github.com/apache/spark/pull/46040#issuecomment-2065859992 @uros-db I have made the suggested changes. please re-review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571898280 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub
HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571898280 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1571898438 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -425,6 +425,54 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-19 Thread via GitHub
yaooqinn commented on PR #46028: URL: https://github.com/apache/spark/pull/46028#issuecomment-2066017780 Merged to master. Thank you @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Spark collation 47413 3 [spark]

2024-04-19 Thread via GitHub
GideonPotok closed pull request #45986: Spark collation 47413 3 URL: https://github.com/apache/spark/pull/45986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572265083 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -27,18 +27,22 @@ import

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572264833 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -813,6 +823,28 @@ class SparkSession private[sql] ( * Set to false to

Re: [PR] [SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub
nikolamand-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1572342093 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationRegexpExpressionSuite.scala: ## @@ -0,0 +1,170 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572263903 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -813,6 +823,28 @@ class SparkSession private[sql] ( * Set to false to

Re: [PR] [SPARK-47297][SQL] Add collations support to split regex expression [spark]

2024-04-19 Thread via GitHub
nikolamand-db closed pull request #45856: [SPARK-47297][SQL] Add collations support to split regex expression URL: https://github.com/apache/spark/pull/45856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47297][SQL] Add collations support to split regex expression [spark]

2024-04-19 Thread via GitHub
nikolamand-db commented on PR #45856: URL: https://github.com/apache/spark/pull/45856#issuecomment-2066549783 Closing as we have new approach for all regex functions https://github.com/apache/spark/pull/46077. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1572135059 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends

[PR] [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1 [spark]

2024-04-19 Thread via GitHub
bjornjorgensen opened a new pull request, #46137: URL: https://github.com/apache/spark/pull/46137 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.12.0 to 6.12.1 ### Why are the changes needed? [Release

  1   2   >