date:20240419

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

ueshin commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1572787315 ## python/pyspark/sql/dataframe.py: ## @@ -139,51 +123,29 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): created via using the constructor. """

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-19 Thread via GitHub

ericm-db commented on PR #45991: URL: https://github.com/apache/spark/pull/45991#issuecomment-2067115942 @HeartSaVioR PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572833180 ## .github/workflows/build_and_test.yml: ## @@ -26,4 +26,20 @@ jobs: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with:

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572834233 ## .gitignore: ## @@ -16,3 +16,30 @@ build dependencies.lock **/dependencies.lock gradle/wrapper/gradle-wrapper.jar + +# Compiled source #

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572840033 ## config/checkstyle/checkstyle.xml: ## @@ -0,0 +1,195 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + Review Comment:

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-19 Thread via GitHub

steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2067069380 So both those bindings hand off to PathOutputCommitterFactory(), which looks for a committer from the config key mapreduce.outputcommitter.factory.class

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

sahnib commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572618909 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572837826 ## build.gradle: ## @@ -1,3 +1,16 @@ +buildscript { + repositories { +maven { + url = uri("https://plugins.gradle.org/m2/;) +} + } +

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572836874 ## build-tools/helm/spark-kubernetes-operator/values.yaml: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or

[PR] [SPARK-47920] add doc for python streaming data source API [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 opened a new pull request, #46139: URL: https://github.com/apache/spark/pull/46139 ### What changes were proposed in this pull request? add doc for python streaming data source API ### Why are the changes needed? Add user guide to help user develop

Re: [PR] [SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder [spark]

2024-04-19 Thread via GitHub

allisonwang-db commented on PR #46140: URL: https://github.com/apache/spark/pull/46140#issuecomment-2067273091 cc @jasonli-db @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45265][SQL] Supporting Hive 4.0 Metastore [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun closed pull request #45801: [SPARK-45265][SQL] Supporting Hive 4.0 Metastore URL: https://github.com/apache/spark/pull/45801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [WIP][SPARK-47907] Put bang under a config [spark]

2024-04-19 Thread via GitHub

srielau opened a new pull request, #46138: URL: https://github.com/apache/spark/pull/46138 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572914583 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572915019 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

[PR] [SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder [spark]

2024-04-19 Thread via GitHub

allisonwang-db opened a new pull request, #46140: URL: https://github.com/apache/spark/pull/46140 ### What changes were proposed in this pull request? This PR fixes a bug in the ExecuteJobTag creation in ExecuteHolder. The sessionId and userId are reversed. ### Why are

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub

rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572995277 ## python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py: ## @@ -63,8 +63,13 @@ def main(infile: IO, outfile: IO) -> None: spark =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub

rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572997327 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub

rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1572997060 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] [WIP] Testing that error is propagated to user upon deserialization [spark]

2024-04-19 Thread via GitHub

ericm-db commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1573000267 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -66,6 +66,30 @@ def func(df, _): q =

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572833649 ## .gitignore: ## @@ -16,3 +16,30 @@ build dependencies.lock **/dependencies.lock gradle/wrapper/gradle-wrapper.jar + +# Compiled source #

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572836113 ## build-tools/helm/spark-kubernetes-operator/values.yaml: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-19 Thread via GitHub

vladimirg-db commented on code in PR #46082: URL: https://github.com/apache/spark/pull/46082#discussion_r1572665195 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -359,10 +414,97 @@ public boolean startsWith(final UTF8String prefix) {

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1572840033 ## config/checkstyle/checkstyle.xml: ## @@ -0,0 +1,195 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + Review Comment:

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on PR #46129: URL: https://github.com/apache/spark/pull/46129#issuecomment-2067140476 Will fix up the tests soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

allisonwang-db commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1572908487 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

[PR] Try parse json [spark]

2024-04-19 Thread via GitHub

harshmotw-db opened a new pull request, #46141: URL: https://github.com/apache/spark/pull/46141 ### What changes were proposed in this pull request? This pull request implements the `try_parse_json` that runs `parse_json` on string expressions to extract variants. However, if

Re: [PR] [SPARK-45709][BUILD] Deploy packages when all packages are built [spark]

2024-04-19 Thread via GitHub

github-actions[bot] commented on PR #43561: URL: https://github.com/apache/spark/pull/43561#issuecomment-2067417258 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573092072 ## python/pyspark/sql/datasource.py: ## @@ -183,11 +186,40 @@ def streamWriter(self, schema: StructType, overwrite: bool) -> "DataSourceStream

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub

erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573096494 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573100690 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573100752 ## python/pyspark/sql/worker/plan_data_source_read.py: ## @@ -51,6 +52,71 @@ ) +def records_to_arrow_batches( +output_iter: Iterator[Tuple], +

Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-19 Thread via GitHub

srielau commented on PR #46138: URL: https://github.com/apache/spark/pull/46138#issuecomment-2067511478 @cloud-fan @gengliangwang This is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun opened a new pull request, #46142: URL: https://github.com/apache/spark/pull/46142 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573085009 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): Review

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-19 Thread via GitHub

chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1573085074 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573094548 ## python/pyspark/sql/utils.py: ## @@ -302,6 +302,33 @@ def wrapped(*args: Any, **kwargs: Any) -> Any: return cast(FuncT, wrapped) +def

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573094799 ## python/pyspark/sql/connect/session.py: ## @@ -325,7 +325,7 @@ def active(cls) -> "SparkSession": active.__doc__ = PySparkSession.active.__doc__ -

[PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-19 Thread via GitHub

panbingkun opened a new pull request, #46146: URL: https://github.com/apache/spark/pull/46146 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on PR #46145: URL: https://github.com/apache/spark/pull/46145#issuecomment-2067526908 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun closed pull request #46145: [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` URL: https://github.com/apache/spark/pull/46145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun closed pull request #46144: [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` URL: https://github.com/apache/spark/pull/46144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2024-04-19 Thread via GitHub

github-actions[bot] commented on PR #41518: URL: https://github.com/apache/spark/pull/41518#issuecomment-2067417265 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-19 Thread via GitHub

harshmotw-db commented on PR #46141: URL: https://github.com/apache/spark/pull/46141#issuecomment-2067417422 cc @chenhao-db @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub

CTCC1 commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573145872 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column: repeat.__doc__

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095413 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095371 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub

erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573095590 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun closed pull request #46142: [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 URL: https://github.com/apache/spark/pull/46142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0 [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on PR #46142: URL: https://github.com/apache/spark/pull/46142#issuecomment-2067525074 Thank you, @HyukjinKwon ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun commented on PR #46144: URL: https://github.com/apache/spark/pull/46144#issuecomment-2067525455 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [WIP] test java toUpperCase & toLowerCase [spark]

2024-04-19 Thread via GitHub

panbingkun opened a new pull request, #46147: URL: https://github.com/apache/spark/pull/46147 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub

liucao-dd commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573144905 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column:

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-19 Thread via GitHub

liucao-dd commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1573144905 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column:

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub

harshmotw-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573031748 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value,

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub

harshmotw-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573032141 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value,

[PR] [WIP][SPARK-47672][SQL] Avoid double eval from filter pushDown w/ projection pushdown [spark]

2024-04-19 Thread via GitHub

holdenk opened a new pull request, #46143: URL: https://github.com/apache/spark/pull/46143 ### What changes were proposed in this pull request? Changes the filter pushDown optimizer to not push down past projections of the same element if we reasonable expect that computing that

[PR] [SPARK-47924][CORE] Add a DEBUG log to `DiskStore.moveFileToBlock` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun opened a new pull request, #46144: URL: https://github.com/apache/spark/pull/46144 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1573095371 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1974 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]

2024-04-19 Thread via GitHub

gene-db commented on code in PR #46122: URL: https://github.com/apache/spark/pull/46122#discussion_r1573029280 ## python/pyspark/sql/types.py: ## @@ -1521,6 +1521,19 @@ def toPython(self) -> Any: """ return VariantUtils.to_python(self.value, self.metadata) +

[PR] [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` [spark]

2024-04-19 Thread via GitHub

dongjoon-hyun opened a new pull request, #46145: URL: https://github.com/apache/spark/pull/46145 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-19 Thread via GitHub

panbingkun commented on PR #46146: URL: https://github.com/apache/spark/pull/46146#issuecomment-2067487411 At present, we are only testing `rocksdbjni's` `9 series` in advance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-19 Thread via GitHub

erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1573096494 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +900,85 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-47902][SQL]Making Compute Current Time* expressions foldable [spark]

2024-04-19 Thread via GitHub

dbatomic commented on code in PR #46120: URL: https://github.com/apache/spark/pull/46120#discussion_r1571993071 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantFoldingSuite.scala: ## @@ -437,6 +437,21 @@ class ConstantFoldingSuite extends PlanTest

[PR] [WIP] Move `src/test/java/test/` to `src/test/java/` [spark]

2024-04-19 Thread via GitHub

panbingkun opened a new pull request, #46134: URL: https://github.com/apache/spark/pull/46134 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-47914][SQL] Do not display the splits parameter in Rang [spark]

2024-04-19 Thread via GitHub

guixiaowen opened a new pull request, #46136: URL: https://github.com/apache/spark/pull/46136 ### What changes were proposed in this pull request? [SQL] explain extended select * from range(0, 4); Before this pr, the split is also displayed in the logical execution

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571894582 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -425,6 +421,74 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-46632][SQL] EquivalentExpressions addExprTree should allow all type of expressions [spark]

2024-04-19 Thread via GitHub

zml1206 commented on code in PR #45894: URL: https://github.com/apache/spark/pull/45894#discussion_r1571913735 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -193,7 +193,9 @@ class EquivalentExpressions( if

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub

uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571923683 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub

uros-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1572003918 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -161,4 +162,40 @@ class CollationExpressionSuite extends

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1572006465 ## python/pyspark/sql/connect/dataframe.py: ## @@ -2306,7 +2183,7 @@ def _test() -> None: ) (failure_count, test_count) = doctest.testmod( -

[PR] Fix subexpression elimination when equivalent ternary expressions have different children [spark]

2024-04-19 Thread via GitHub

zml1206 opened a new pull request, #46135: URL: https://github.com/apache/spark/pull/46135 ### What changes were proposed in this pull request? Remove unexpected exception thrown in `EquivalentExpressions.updateExprInMap()`. Equivalent expressions may contain different

Re: [PR] [SPARK-47911][SQL] Introduces a universal BinaryFormatter to make binary output consistent [spark]

2024-04-19 Thread via GitHub

yaooqinn commented on code in PR #46133: URL: https://github.com/apache/spark/pull/46133#discussion_r1572038458 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ToStringBase.scala: ## @@ -414,3 +413,24 @@ trait ToStringBase { self: UnaryExpression with

Re: [PR] [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children [spark]

2024-04-19 Thread via GitHub

zml1206 commented on PR #46135: URL: https://github.com/apache/spark/pull/46135#issuecomment-2066175301 cc @peter-toth @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` [spark]

2024-04-19 Thread via GitHub

zhengruifeng commented on PR #46128: URL: https://github.com/apache/spark/pull/46128#issuecomment-2066182638 thanks @HyukjinKwon merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` [spark]

2024-04-19 Thread via GitHub

zhengruifeng closed pull request #46128: [SPARK-47906][PYTHON][DOCS] Fix docstring and type hint of `hll_union_agg` URL: https://github.com/apache/spark/pull/46128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-19 Thread via GitHub

LuciferYang commented on code in PR #46123: URL: https://github.com/apache/spark/pull/46123#discussion_r1571874614 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2485,6 +2485,30 @@ class PlanGenerationTestSuite

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on PR #46041: URL: https://github.com/apache/spark/pull/46041#issuecomment-2065856516 @uros-db addressed the latest requested fixes accordingly. please re-review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [SPARK-47912][SQL] Infer serde class from format classes [spark]

2024-04-19 Thread via GitHub

wForget opened a new pull request, #46132: URL: https://github.com/apache/spark/pull/46132 ### What changes were proposed in this pull request? Infer serde class from format classes. ### Why are the changes needed? File format of insert overwrite dir does not

[PR] [SPARK-47911][SQL] Introduces a universal BinaryFormatter to make binary output consistent [spark]

2024-04-19 Thread via GitHub

yaooqinn opened a new pull request, #46133: URL: https://github.com/apache/spark/pull/46133 ### What changes were proposed in this pull request? This PR introduces a universal BinaryFormatter to make binary output consistent across all clients, such as `beeline`,

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub

mihailom-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1571957856 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -52,6 +52,11 @@ object CollationTypeCasts extends

Re: [PR] [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token [spark]

2024-04-19 Thread via GitHub

yaooqinn commented on PR #46115: URL: https://github.com/apache/spark/pull/46115#issuecomment-2065824066 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

zhengruifeng commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571874148 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1571887028 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -323,10 +323,6 @@ class CollationStringExpressionsSuite

Re: [PR] [DRAFT][SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub

mihailom-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1571964646 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -161,4 +162,40 @@ class CollationExpressionSuite

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-19 Thread via GitHub

yaooqinn closed pull request #46028: [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException URL: https://github.com/apache/spark/pull/46028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on PR #46040: URL: https://github.com/apache/spark/pull/46040#issuecomment-2065859992 @uros-db I have made the suggested changes. please re-review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571898280 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-19 Thread via GitHub

HyukjinKwon commented on code in PR #46129: URL: https://github.com/apache/spark/pull/46129#discussion_r1571898280 ## python/pyspark/sql/classic/dataframe.py: ## @@ -0,0 +1,1952 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1571898438 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -425,6 +425,54 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-19 Thread via GitHub

yaooqinn commented on PR #46028: URL: https://github.com/apache/spark/pull/46028#issuecomment-2066017780 Merged to master. Thank you @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Spark collation 47413 3 [spark]

2024-04-19 Thread via GitHub

GideonPotok closed pull request #45986: Spark collation 47413 3 URL: https://github.com/apache/spark/pull/45986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub

xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572265083 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -27,18 +27,22 @@ import

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub

xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572264833 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -813,6 +823,28 @@ class SparkSession private[sql] ( * Set to false to

Re: [PR] [SPARK-47414][SQL] Lowercase collation support for regexp expressions [spark]

2024-04-19 Thread via GitHub

nikolamand-db commented on code in PR #46077: URL: https://github.com/apache/spark/pull/46077#discussion_r1572342093 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationRegexpExpressionSuite.scala: ## @@ -0,0 +1,170 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-19 Thread via GitHub

xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1572263903 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -813,6 +823,28 @@ class SparkSession private[sql] ( * Set to false to

Re: [PR] [SPARK-47297][SQL] Add collations support to split regex expression [spark]

2024-04-19 Thread via GitHub

nikolamand-db closed pull request #45856: [SPARK-47297][SQL] Add collations support to split regex expression URL: https://github.com/apache/spark/pull/45856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47297][SQL] Add collations support to split regex expression [spark]

2024-04-19 Thread via GitHub

nikolamand-db commented on PR #45856: URL: https://github.com/apache/spark/pull/45856#issuecomment-2066549783 Closing as we have new approach for all regex functions https://github.com/apache/spark/pull/46077. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-19 Thread via GitHub

GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1572135059 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -54,7 +54,7 @@ object CollationTypeCasts extends

[PR] [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1 [spark]

2024-04-19 Thread via GitHub

bjornjorgensen opened a new pull request, #46137: URL: https://github.com/apache/spark/pull/46137 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.12.0 to 6.12.1 ### Why are the changes needed? [Release

1 2 >

1 - 100 of 120 matches

Mail list logo