Re: [PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45915: URL: https://github.com/apache/spark/pull/45915#issuecomment-2041332540 cc @itholic @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
itholic commented on PR #45915: URL: https://github.com/apache/spark/pull/45915#issuecomment-2041342058 LGTM when CI pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45915: URL: https://github.com/apache/spark/pull/45915#issuecomment-2041387530 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45915: [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect URL: https://github.com/apache/spark/pull/45915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45916: [SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with pyspark-connect URL: https://github.com/apache/spark/pull/45916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [DO-NOT-MERGE][SPARK-47725][INFRA] Set up the CI for pyspark-connect package [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45870: URL: https://github.com/apache/spark/pull/45870#discussion_r1554825945 ## python/pyspark/sql/tests/connect/test_connect_session.py: ## @@ -248,7 +250,7 @@ def setUp(self) -> None: .config("integer", 1)

Re: [PR] [DO-NOT-MERGE][SPARK-47725][INFRA] Set up the CI for pyspark-connect package [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45870: URL: https://github.com/apache/spark/pull/45870#discussion_r1554825945 ## python/pyspark/sql/tests/connect/test_connect_session.py: ## @@ -248,7 +250,7 @@ def setUp(self) -> None: .config("integer", 1)

[PR] [SPARK-47755][CONNECT] Pivot should fail when the number of distinct values is too large [spark]

2024-04-07 Thread via GitHub
zhengruifeng opened a new pull request, #45918: URL: https://github.com/apache/spark/pull/45918 ### What changes were proposed in this pull request? `Pivot` should fail when the number of distinct values is too large ### Why are the changes needed? Following check is missing

Re: [PR] [SPARK-37018][SQL] Spark SQL should support create function with Aggregator [spark]

2024-04-07 Thread via GitHub
IceMimosa commented on PR #34352: URL: https://github.com/apache/spark/pull/34352#issuecomment-2041333639 @beliefer Hi, is there any progress on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-47753][PYTHON][TESTS] Make pyspark.testing compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45916: URL: https://github.com/apache/spark/pull/45916 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.testing` compatible with `pyspark-connect` by using noop context manager `contextlib.nullcontext`

Re: [PR] [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45913: URL: https://github.com/apache/spark/pull/45913#issuecomment-2041357058 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45914: URL: https://github.com/apache/spark/pull/45914#issuecomment-2041380167 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45914: [SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils compatible with pyspark-connect URL: https://github.com/apache/spark/pull/45914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45913: URL: https://github.com/apache/spark/pull/45913#issuecomment-2041329151 cc @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47737][PYTHON] Bump PyArrow to 10.0.0 [spark]

2024-04-07 Thread via GitHub
itholic commented on code in PR #45892: URL: https://github.com/apache/spark/pull/45892#discussion_r1554831706 ## dev/create-release/spark-rm/Dockerfile: ## @@ -37,7 +37,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true # These arguments are just for reuse and not really meant to be

Re: [PR] [SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45916: URL: https://github.com/apache/spark/pull/45916#issuecomment-2041386906 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45915: URL: https://github.com/apache/spark/pull/45915 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.pandas` compatible with `pyspark-connect`. ### Why are the changes needed? In order for

Re: [PR] [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr/left/right for collations [spark]

2024-04-07 Thread via GitHub
uros-db commented on code in PR #45738: URL: https://github.com/apache/spark/pull/45738#discussion_r1554838890 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -86,6 +89,43 @@ class CollationStringExpressionsSuite extends QueryTest

[PR] [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45913: URL: https://github.com/apache/spark/pull/45913 ### What changes were proposed in this pull request? This PR proposes to drop the tables after tests finished. ### Why are the changes needed? - To clean up resources properly.

[PR] [SPARK-47754][SQL] Postgres: Support reading multidimensional arrays [spark]

2024-04-07 Thread via GitHub
yaooqinn opened a new pull request, #45917: URL: https://github.com/apache/spark/pull/45917 ### What changes were proposed in this pull request? Because the ResultSetMetadata cannot distinguish a single-dimensional array from multidimensional arrays. Thus, we always read

[PR] [SPARK-47751][PYTHON] Make pyspark.worker_utils compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45914: URL: https://github.com/apache/spark/pull/45914 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.worker_utils` compatible with `pyspark-connect`. ### Why are the changes needed? In

Re: [PR] [SPARK-47737][PYTHON] Bump PyArrow to 10.0.1 [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45892: URL: https://github.com/apache/spark/pull/45892#discussion_r1554839130 ## python/docs/source/getting_started/install.rst: ## @@ -157,7 +157,7 @@ PackageSupported version Note ==

Re: [PR] [SPARK-47737][PYTHON] Bump PyArrow to 10.0.1 [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45892: URL: https://github.com/apache/spark/pull/45892#discussion_r1554839162 ## python/docs/source/getting_started/install.rst: ## @@ -157,7 +157,7 @@ PackageSupported version Note ==

Re: [PR] [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45913: [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished URL: https://github.com/apache/spark/pull/45913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
zhengruifeng commented on code in PR #45915: URL: https://github.com/apache/spark/pull/45915#discussion_r1554840399 ## python/pyspark/pandas/plot/core.py: ## @@ -22,8 +22,6 @@ from pandas.core.base import PandasObject from pandas.core.dtypes.inference import is_integer

Re: [PR] [SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with pyspark-connect [spark]

2024-04-07 Thread via GitHub
09306677806 commented on PR #45916: URL: https://github.com/apache/spark/pull/45916#issuecomment-2041407341 Is everything done correctly and entered into the teacher's account? ShahrzadMahro در تاریخ یکشنبه ۷ آوریل ۲۰۲۴،‏ ۰۹:۵۴ Hyukjin Kwon ***@***.***> نوشت: > What

Re: [PR] Operator 1.0.0-alpha [spark-kubernetes-operator]

2024-04-07 Thread via GitHub
vakarisbk commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1554963392 ## build-tools/helm/spark-kubernetes-operator/templates/spark-operator.yaml: ## @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF)

Re: [PR] Operator 1.0.0-alpha [spark-kubernetes-operator]

2024-04-07 Thread via GitHub
vakarisbk commented on code in PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#discussion_r1554963392 ## build-tools/helm/spark-kubernetes-operator/templates/spark-operator.yaml: ## @@ -0,0 +1,211 @@ +# +# Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47744] Add support for negative-valued bytes in range encoder [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45906: URL: https://github.com/apache/spark/pull/45906#issuecomment-2041604953 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47744] Add support for negative-valued bytes in range encoder [spark]

2024-04-07 Thread via GitHub
HeartSaVioR closed pull request #45906: [SPARK-47744] Add support for negative-valued bytes in range encoder URL: https://github.com/apache/spark/pull/45906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47411][SQL] Support INSTR & FIND_IN_SET functions to work with collated strings [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45643: URL: https://github.com/apache/spark/pull/45643#discussion_r1555173817 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -549,6 +549,51 @@ public int findInSet(UTF8String match) { return 0; }

Re: [PR] [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
itholic commented on PR #45876: URL: https://github.com/apache/spark/pull/45876#issuecomment-2041778287 @panbingkun sure! Thanks for the notice :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [MINOR][PYTHON][CONNECT][TESTS] Enable `MapInPandasParityTests.test_dataframes_with_incompatible_types` [spark]

2024-04-07 Thread via GitHub
zhengruifeng opened a new pull request, #45922: URL: https://github.com/apache/spark/pull/45922 ### What changes were proposed in this pull request? Enable `MapInPandasParityTests.test_dataframes_with_incompatible_types` ### Why are the changes needed? for test coverage

Re: [PR] [SPARK-47750][DOCS][SQL] Postgres: Document Mapping Spark SQL Data Types to PostgreSQL [spark]

2024-04-07 Thread via GitHub
yaooqinn commented on PR #45912: URL: https://github.com/apache/spark/pull/45912#issuecomment-2041801277 cc @dongjoon-hyun @cloud-fan thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
itholic opened a new pull request, #45923: URL: https://github.com/apache/spark/pull/45923 ### What changes were proposed in this pull request? This PR proposes to migrate `logWarning` with variables of Hive module to Hive-thriftserver. ### Why are the changes needed?

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45805: URL: https://github.com/apache/spark/pull/45805#discussion_r1555179535 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +404,85 @@ object VariantGetExpressionBuilder

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45805: URL: https://github.com/apache/spark/pull/45805#discussion_r1555180077 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +404,85 @@ object VariantGetExpressionBuilder

Re: [PR] [SPARK-47682][SQL] Support cast from variant. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45807: URL: https://github.com/apache/spark/pull/45807#discussion_r1555185070 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1198,6 +1208,20 @@ case class Cast( case _ if from == NullType =>

Re: [PR] [SPARK-47657][SQL] Implement collation filter push down support per file source [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45782: URL: https://github.com/apache/spark/pull/45782#issuecomment-2041831683 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47657][SQL] Implement collation filter push down support per file source [spark]

2024-04-07 Thread via GitHub
cloud-fan closed pull request #45782: [SPARK-47657][SQL] Implement collation filter push down support per file source URL: https://github.com/apache/spark/pull/45782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #43854: URL: https://github.com/apache/spark/pull/43854#issuecomment-2041841906 what's the target use case of this improvement? Super long SQL statement or super long DataFrame transformation chain? -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the tables after tests finished [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45913: URL: https://github.com/apache/spark/pull/45913#issuecomment-2041622482 Late +1. Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47755][CONNECT] Pivot should fail when the number of distinct values is too large [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45918: [SPARK-47755][CONNECT] Pivot should fail when the number of distinct values is too large URL: https://github.com/apache/spark/pull/45918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47541][SQL] Collated strings in complex types supporting operations reverse, array_join, concat, map [spark]

2024-04-07 Thread via GitHub
cloud-fan closed pull request #45693: [SPARK-47541][SQL] Collated strings in complex types supporting operations reverse, array_join, concat, map URL: https://github.com/apache/spark/pull/45693 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47737][PYTHON] Bump PyArrow to 10.0.1 [spark]

2024-04-07 Thread via GitHub
itholic commented on code in PR #45892: URL: https://github.com/apache/spark/pull/45892#discussion_r1555177884 ## python/docs/source/getting_started/install.rst: ## @@ -157,7 +157,7 @@ PackageSupported version Note ==

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45805: URL: https://github.com/apache/spark/pull/45805#discussion_r1555178869 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -822,6 +822,8 @@ object FunctionRegistry {

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45805: URL: https://github.com/apache/spark/pull/45805#discussion_r1555180608 ## sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala: ## @@ -298,4 +300,26 @@ class VariantSuite extends QueryTest with SharedSparkSession { }

Re: [PR] [SPARK-47558][SS] State TTL support for ValueState [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45674: URL: https://github.com/apache/spark/pull/45674#issuecomment-2041792044 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47681][SQL] Add schema_of_variant expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45806: URL: https://github.com/apache/spark/pull/45806#discussion_r1555183761 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +405,134 @@ object

Re: [PR] [SPARK-47681][SQL] Add schema_of_variant expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45806: URL: https://github.com/apache/spark/pull/45806#discussion_r1555184296 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +405,134 @@ object

Re: [PR] [SPARK-47706][BUILD] Bump json4s 4.0.7 [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45838: URL: https://github.com/apache/spark/pull/45838#issuecomment-2041836659 does json4s release notes mention anything about backward compatibility? We have persistent event logs in JSON format that may need to be consumed by new version of Spark history

Re: [PR] [DO-NOT-MERGE][SPARK-47725][INFRA] Set up the CI for pyspark-connect package [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45870: URL: https://github.com/apache/spark/pull/45870#discussion_r1554957196 ## python/pyspark/sql/tests/test_catalog.py: ## @@ -73,6 +73,9 @@ def test_list_tables(self): spark.sql("CREATE DATABASE some_db") with

Re: [PR] [DO-NOT-MERGE][SPARK-47725][INFRA] Set up the CI for pyspark-connect package [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45870: URL: https://github.com/apache/spark/pull/45870#discussion_r1554957126 ## python/pyspark/sql/tests/test_catalog.py: ## @@ -73,6 +73,9 @@ def test_list_tables(self): spark.sql("CREATE DATABASE some_db") with

Re: [PR] [DO-NOT-MERGE][SPARK-47725][INFRA] Set up the CI for pyspark-connect package [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on code in PR #45870: URL: https://github.com/apache/spark/pull/45870#discussion_r1554957268 ## python/pyspark/sql/tests/test_catalog.py: ## @@ -73,6 +73,9 @@ def test_list_tables(self): spark.sql("CREATE DATABASE some_db") with

Re: [PR] [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45400: URL: https://github.com/apache/spark/pull/45400#issuecomment-2041625471 (BTW sorry for visiting this too lately. Was very busy with other stuffs.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45920: URL: https://github.com/apache/spark/pull/45920 ### What changes were proposed in this pull request? This PR proposes to drop the tables after tests finished. ### Why are the changes needed? - To clean up resources properly.

Re: [PR] [DO-NOT-REVIEW][DRAFT] Spark 45637 multiple state test [spark]

2024-04-07 Thread via GitHub
github-actions[bot] commented on PR #44076: URL: https://github.com/apache/spark/pull/44076#issuecomment-2041662026 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45920: URL: https://github.com/apache/spark/pull/45920#issuecomment-2041662495 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents [spark]

2024-04-07 Thread via GitHub
HeartSaVioR closed pull request #45400: [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents URL: https://github.com/apache/spark/pull/45400 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45400: URL: https://github.com/apache/spark/pull/45400#issuecomment-2041698954 Thanks! Merging to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
panbingkun commented on PR #45876: URL: https://github.com/apache/spark/pull/45876#issuecomment-204177 @itholic Can you retrigger the workflow again? I believe it can turn green this time, haha -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45920: URL: https://github.com/apache/spark/pull/45920#issuecomment-2041732960 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45920: [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch` URL: https://github.com/apache/spark/pull/45920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47541][SQL] Collated strings in complex types supporting operations reverse, array_join, concat, map [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45693: URL: https://github.com/apache/spark/pull/45693#issuecomment-2041779179 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [WIP][SPARK-47758][BUILD] Upgrade commons-collections4 to 4.5.0-M1 [spark]

2024-04-07 Thread via GitHub
panbingkun opened a new pull request, #45921: URL: https://github.com/apache/spark/pull/45921 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041828693 @itholic what if we don't use thread local? IIUC, PySpark calls JVM methods to build the column instances at the end. On the JVM side, we wrap code with `withOrigin` to capture the

Re: [PR] [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents [spark]

2024-04-07 Thread via GitHub
panbingkun commented on PR #45400: URL: https://github.com/apache/spark/pull/45400#issuecomment-2041683267 > +1 > > @panbingkun Could you please remove the versions.json in pyspark docs as well to be not confused? Done Thank you `very much` for taking the time to review

Re: [PR] [MINOR][TESTS] Deduplicate test cases `test_parse_datatype_string` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45919: URL: https://github.com/apache/spark/pull/45919#issuecomment-2041732439 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][TESTS] Deduplicate test cases `test_parse_datatype_string` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45919: [MINOR][TESTS] Deduplicate test cases `test_parse_datatype_string` URL: https://github.com/apache/spark/pull/45919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47681][SQL] Add schema_of_variant expression. [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45806: URL: https://github.com/apache/spark/pull/45806#discussion_r1555182484 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +405,134 @@ object

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
itholic commented on PR #45923: URL: https://github.com/apache/spark/pull/45923#issuecomment-2041811506 cc @gengliangwang as you're working on logError for Hive-thriftserver IIRC. Also cc @panbingkun @pan3793 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45123: URL: https://github.com/apache/spark/pull/45123#discussion_r1555211158 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/PartitionMetricsCollector.java: ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47706][BUILD] Bump json4s 4.0.7 [spark]

2024-04-07 Thread via GitHub
pan3793 commented on PR #45838: URL: https://github.com/apache/spark/pull/45838#issuecomment-2041867141 @cloud-fan seems json4s does not have release note. the API binary incompatible does not affect JSON serialize/deserialize, the case of parsing event log generated by the previous Spark

Re: [PR] [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr/left/right for collations [spark]

2024-04-07 Thread via GitHub
uros-db commented on code in PR #45738: URL: https://github.com/apache/spark/pull/45738#discussion_r1555227674 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -86,6 +89,43 @@ class CollationStringExpressionsSuite extends QueryTest

Re: [PR] [SPARK-47681][SQL] Add schema_of_variant expression. [spark]

2024-04-07 Thread via GitHub
chenhao-db commented on code in PR #45806: URL: https://github.com/apache/spark/pull/45806#discussion_r1555250846 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +405,134 @@ object

Re: [PR] [SPARK-47681][SQL] Add schema_of_variant expression. [spark]

2024-04-07 Thread via GitHub
chenhao-db commented on code in PR #45806: URL: https://github.com/apache/spark/pull/45806#discussion_r1555250846 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -403,3 +405,134 @@ object

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
panbingkun commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1555255130 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala: ## @@ -218,7 +232,9 @@ private[thriftserver]

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on code in PR #45123: URL: https://github.com/apache/spark/pull/45123#discussion_r1555211158 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/PartitionMetricsCollector.java: ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][SQL][CONNECT] Fix a self-join case [spark]

2024-04-07 Thread via GitHub
zhengruifeng closed pull request #45831: [WIP][SQL][CONNECT] Fix a self-join case URL: https://github.com/apache/spark/pull/45831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47706][BUILD] Bump json4s 4.0.7 [spark]

2024-04-07 Thread via GitHub
pan3793 commented on code in PR #45838: URL: https://github.com/apache/spark/pull/45838#discussion_r1555230740 ## project/MimaExcludes.scala: ## @@ -90,7 +90,21 @@ object MimaExcludes {

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
panbingkun commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1555252729 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -231,14 +232,15 @@ private[hive] object

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
panbingkun commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1555253496 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -231,14 +232,15 @@ private[hive] object

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
panbingkun commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1555253322 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -231,14 +232,15 @@ private[hive] object

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-07 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041845268 > My idea: we add new Column creation methods for PySpark, which takes python call site information. I'm not 100% sure if it is work, but it sounds worth enough to try. Let me

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45123: URL: https://github.com/apache/spark/pull/45123#issuecomment-2041852145 Can we split this PR into two? IIUC the DS v1 change can benefit file source tables immediately if `spark.sql.statistics.size.autoUpdate.enabled` is enabled. For the DS v2 part, do we

Re: [PR] [SPARK-47754][SQL] Postgres: Support reading multidimensional arrays [spark]

2024-04-07 Thread via GitHub
yaooqinn commented on PR #45917: URL: https://github.com/apache/spark/pull/45917#issuecomment-2041850988 cc @cloud-fan @dongjoon-hyun thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47750][DOCS][SQL] Postgres: Document Mapping Spark SQL Data Types to PostgreSQL [spark]

2024-04-07 Thread via GitHub
yaooqinn commented on PR #45912: URL: https://github.com/apache/spark/pull/45912#issuecomment-2041852022 Merged to master. Thank you @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47750][DOCS][SQL] Postgres: Document Mapping Spark SQL Data Types to PostgreSQL [spark]

2024-04-07 Thread via GitHub
yaooqinn closed pull request #45912: [SPARK-47750][DOCS][SQL] Postgres: Document Mapping Spark SQL Data Types to PostgreSQL URL: https://github.com/apache/spark/pull/45912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47746] Implement ordinal-based range encoding in the RocksDBStateEncoder [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on code in PR #45905: URL: https://github.com/apache/spark/pull/45905#discussion_r1555208836 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -229,16 +229,19 @@ class PrefixKeyScanStateEncoder( *

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-07 Thread via GitHub
pan3793 commented on PR #45923: URL: https://github.com/apache/spark/pull/45923#issuecomment-2041889112 LGTM. Actually, there are lot of Java code copied from Hive in this module, I'm not sure if we want to do some refactoring to adapt the structured logging framework. -- This is an

Re: [PR] [SPARK-47746] Implement ordinal-based range encoding in the RocksDBStateEncoder [spark]

2024-04-07 Thread via GitHub
neilramaswamy commented on code in PR #45905: URL: https://github.com/apache/spark/pull/45905#discussion_r1555241781 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala: ## @@ -447,6 +492,97 @@ class RocksDBStateStoreSuite

Re: [PR] [SPARK-47682][SQL] Support cast from variant. [spark]

2024-04-07 Thread via GitHub
chenhao-db commented on code in PR #45807: URL: https://github.com/apache/spark/pull/45807#discussion_r1555249662 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1198,6 +1208,20 @@ case class Cast( case _ if from == NullType =>

Re: [PR] [SPARK-47734][PYTHON][TESTS][3.4] Fix flaky DataFrame.writeStream doctest by stopping streaming query [spark]

2024-04-07 Thread via GitHub
HeartSaVioR commented on PR #45908: URL: https://github.com/apache/spark/pull/45908#issuecomment-2041623314 Thanks! Merging to 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [MINOR][TESTS] Deduplicate test cases `test_parse_datatype_string` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45919: URL: https://github.com/apache/spark/pull/45919 ### What changes were proposed in this pull request? This PR proposes to skip `test_parse_datatype_string`. ### Why are the changes needed? It does not test anything related to

Re: [PR] [SPARK-47734][PYTHON][TESTS][3.4] Fix flaky DataFrame.writeStream doctest by stopping streaming query [spark]

2024-04-07 Thread via GitHub
HyukjinKwon closed pull request #45908: [SPARK-47734][PYTHON][TESTS][3.4] Fix flaky DataFrame.writeStream doctest by stopping streaming query URL: https://github.com/apache/spark/pull/45908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR][TESTS] Deduplicate test cases `test_parse_datatype_string` [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45919: URL: https://github.com/apache/spark/pull/45919#issuecomment-2041652349 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47755][CONNECT] Pivot should fail when the number of distinct values is too large [spark]

2024-04-07 Thread via GitHub
HyukjinKwon commented on PR #45918: URL: https://github.com/apache/spark/pull/45918#issuecomment-2041652839 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47737][PYTHON] Bump PyArrow to 10.0.1 [spark]

2024-04-07 Thread via GitHub
itholic commented on code in PR #45892: URL: https://github.com/apache/spark/pull/45892#discussion_r1555177315 ## python/docs/source/getting_started/install.rst: ## @@ -157,7 +157,7 @@ PackageSupported version Note ==

Re: [PR] [SPARK-47558][SS] State TTL support for ValueState [spark]

2024-04-07 Thread via GitHub
HeartSaVioR closed pull request #45674: [SPARK-47558][SS] State TTL support for ValueState URL: https://github.com/apache/spark/pull/45674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47413][SQL] [WIP - DONT REVIEW] - add support to substr/left/right for collations [spark]

2024-04-07 Thread via GitHub
GideonPotok commented on code in PR #45738: URL: https://github.com/apache/spark/pull/45738#discussion_r1555192610 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -86,6 +89,43 @@ class CollationStringExpressionsSuite extends

  1   2   >