Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

2024-04-17 Thread via GitHub
pspoerri commented on PR #34864: URL: https://github.com/apache/spark/pull/34864#issuecomment-2060527524 @steveloughran How do I call the Hue APIs from Spark? Can you point me to a package? I agree with you that using the Hadoop APIs are not ideal performance wise, but they are great

[PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db opened a new pull request, #46097: URL: https://github.com/apache/spark/pull/46097 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568360558 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -63,83 +68,122 @@ object LogKey extends Enumeration { val CSV_SCHEMA_FIELD_NAME =

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46071: [SPARK-47867][SQL] Support variant in JSON scan. URL: https://github.com/apache/spark/pull/46071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379883 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46071: URL: https://github.com/apache/spark/pull/46071#issuecomment-2060603814 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379265 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568393760 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46017: URL: https://github.com/apache/spark/pull/46017#issuecomment-2060634388 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568413626 ## python/pyspark/sql/column.py: ## @@ -175,46 +175,13 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]], "Column"

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568413994 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568487322 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

[PR] [WIP][SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-17 Thread via GitHub
xi-db opened a new pull request, #46098: URL: https://github.com/apache/spark/pull/46098 ### What changes were proposed in this pull request? In [the previous PR](https://github.com/apache/spark/pull/46012), we cache plans in AnalyzePlan requests. We're also enabling it for

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485463 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568481821 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568270967 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala: ## @@ -179,7 +180,7 @@ class

Re: [PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-17 Thread via GitHub
HyukjinKwon closed pull request #46094: [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging URL: https://github.com/apache/spark/pull/46094 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

2024-04-17 Thread via GitHub
beliefer opened a new pull request, #40661: URL: https://github.com/apache/spark/pull/40661 ### What changes were proposed in this pull request? There are a lot of SQL with union multiple subquery with filter in user scenarios. Take an example, **q1** ``` SELECT ss_item_sk,

[PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng opened a new pull request, #46095: URL: https://github.com/apache/spark/pull/46095 ### What changes were proposed in this pull request? Make CollectTailExec execute lazily ### Why are the changes needed? In Spark Connect, `dataframe.tail` is based on

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060538992 cc @HyukjinKwon @ueshin @zhengruifeng @xinrong-meng @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568352067 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala: ## @@ -274,7 +275,8 @@ class

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568305287 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -100,6 +100,90 @@ abstract class CollationBenchmarkBase

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379883 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568379265 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2060677068 The vote passed. - https://lists.apache.org/thread/4cbkpvc3vr3b6k0wp6lgsw37spdpnqrc Merged to master for Apache Spark 4.0.0. -- This is an automated message from the

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568444549 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46094: URL: https://github.com/apache/spark/pull/46094#issuecomment-2060471004 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060645900 Add more dependencies and updated screen capture from PR description accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568399026 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,92 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568400477 ## python/pyspark/sql/column.py: ## @@ -175,46 +175,13 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]],

Re: [PR] [SPARK-44444][SQL] Use ANSI SQL mode by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun closed pull request #46013: [SPARK-4][SQL] Use ANSI SQL mode by default URL: https://github.com/apache/spark/pull/46013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568479784 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568479342 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568326394 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -354,9 +355,13 @@ private[sql] class

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46071: URL: https://github.com/apache/spark/pull/46071#discussion_r1568380607 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -766,6 +769,17 @@ class DataFrameReader private[sql](sparkSession: SparkSession)

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #44971: URL: https://github.com/apache/spark/pull/44971#discussion_r1568388171 ## docs/util/build-error-docs.py: ## @@ -0,0 +1,151 @@ +""" +Generate a unified page of documentation for all error conditions. +""" +import json +import os +import

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46011: [SPARK-47821][SQL] Implement is_variant_null expression URL: https://github.com/apache/spark/pull/46011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46017: [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type URL: https://github.com/apache/spark/pull/46017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485937 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

[PR] [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values [spark]

2024-04-17 Thread via GitHub
yaooqinn opened a new pull request, #46102: URL: https://github.com/apache/spark/pull/46102 ### What changes were proposed in this pull request? This PR added tests and doc for Postgres special numeric values. Postgres supports special numeric values "NaN",

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568311255 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on PR #45946: URL: https://github.com/apache/spark/pull/45946#issuecomment-2060579504 Yeah forgot to block it. Will create a followup to add that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun opened a new pull request, #46099: URL: https://github.com/apache/spark/pull/46099 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568486468 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568353748 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568406504 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -157,18 +164,6 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568413581 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -99,7 +99,10 @@ public static boolean execLowercase(final

Re: [PR] [SPARK-44444][SQL] Use ANSI SQL mode by default [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2060686852 Since ANSI is on by default, shall we remove the daily ANSI test job? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568438791 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -99,7 +99,10 @@ public static boolean execLowercase(final UTF8String l,

[PR] [SPARK-47885][PYTHON][CONNECT] Make pyspark.resource compatible with pyspark-connect [spark]

2024-04-17 Thread via GitHub
HyukjinKwon opened a new pull request, #46100: URL: https://github.com/apache/spark/pull/46100 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.resource` compatible with `pyspark-connect`. ### Why are the changes needed? In order

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568486652 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on PR #46076: URL: https://github.com/apache/spark/pull/46076#issuecomment-2060508366 if this PR is no longer related to https://issues.apache.org/jira/browse/SPARK-47416, please delete tag in PR title -- This is an automated message from the Apache Git Service. To

[PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic opened a new pull request, #46096: URL: https://github.com/apache/spark/pull/46096 ### What changes were proposed in this pull request? This PR proposes to enhance "Installation" page to cover all installable options for PySpark pip installation. ### Why are the

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568344903 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,92 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
panbingkun commented on PR #46057: URL: https://github.com/apache/spark/pull/46057#issuecomment-2060575614 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568388242 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils {

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2060623468 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2060694544 Could you review this, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46096: URL: https://github.com/apache/spark/pull/46096#issuecomment-2060705481 cc @zhengruifeng and @WeichenXu123 would you mind reviewing this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568484563 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568485173 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568483558 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

[PR] [SPARK-47883][SQL] Make CollectTailExec.doExecute lazy with RowQueue [spark]

2024-04-17 Thread via GitHub
zhengruifeng opened a new pull request, #46101: URL: https://github.com/apache/spark/pull/46101 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46097: URL: https://github.com/apache/spark/pull/46097#discussion_r1568525373 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -101,6 +101,9 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568616448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils {

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568564309 ## python/pyspark/errors/utils.py: ## @@ -119,3 +127,74 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568638560 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -130,6 +215,9 @@ object CollationBenchmark extends

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568638593 ## sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568665548 ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -125,23 +128,26 @@ class SparkThrowableSuite extends SparkFunSuite { s"Error classes

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568664744 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -564,6 +564,31 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568613066 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2060916645 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568599038 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568660077 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1698,10 +1703,10 @@ case class FormatString(children:

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568627540 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -36,18 +36,19 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568713831 ## common/utils/src/main/resources/error/README.md: ## @@ -1,77 +1,132 @@ -# Guidelines +# Guidelines for Throwing User-Facing Errors -To throw a standardized

Re: [PR] [SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46104: URL: https://github.com/apache/spark/pull/46104#discussion_r1568653835 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -137,6 +140,105 @@ public static boolean execICU(final UTF8String

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable` [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on PR #46052: URL: https://github.com/apache/spark/pull/46052#issuecomment-2061108750 friendly ping @dongjoon-hyun, I would like to confirm if you have any other suggestions for this pr, thanks ~ -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47885][PYTHON][CONNECT] Make pyspark.resource compatible with pyspark-connect [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46100: URL: https://github.com/apache/spark/pull/46100#issuecomment-2061126018 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto` [spark]

2024-04-17 Thread via GitHub
LuciferYang opened a new pull request, #46106: URL: https://github.com/apache/spark/pull/46106 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568431472 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression,

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568584880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression,

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568635378 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568645699 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false }

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568629092 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -36,18 +36,19 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568622375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568644785 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568642600 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46097: URL: https://github.com/apache/spark/pull/46097#issuecomment-2061049646 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46097: [SPARK-47863][SQL] Fix startsWith & endsWith collation-aware implementation for ICU URL: https://github.com/apache/spark/pull/46097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568662387 ## common/utils/src/main/resources/error/README.md: ## @@ -1,77 +1,132 @@ -# Guidelines +# Guidelines for Throwing User-Facing Errors -To throw a standardized

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-17 Thread via GitHub
GideonPotok commented on PR #46040: URL: https://github.com/apache/spark/pull/46040#issuecomment-2061094985 @uros-db please re-review this one too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568727039 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1573,8 +1573,10 @@ case class StringLPad(str: Expression,

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47883][SQL] Make CollectTailExec.doExecute lazy with RowQueue [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46101: URL: https://github.com/apache/spark/pull/46101#discussion_r1568725499 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -118,18 +118,52 @@ case class CollectLimitExec(limit: Int = -1, child: SparkPlan,

  1   2   3   >