[PR] [FOLLOWUP][SPARK-47765] Disable SET COLLATION when collations are disabled [spark]

2024-04-17 Thread via GitHub
mihailom-db opened a new pull request, #46103: URL: https://github.com/apache/spark/pull/46103 ### What changes were proposed in this pull request? Disable SET COLLATION when collations are diabled. ### Why are the changes needed? We do not want users to use syntax that is

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568631300 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568630997 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763 64-Core

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568631652 ## sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt: ## @@ -2,26 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure AMD EPYC 7763

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568639177 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
HyukjinKwon closed pull request #46099: [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job URL: https://github.com/apache/spark/pull/46099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568666443 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/EncoderUtils.scala: ## @@ -77,6 +77,7 @@ object EncoderUtils { case _: DecimalType =>

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568665456 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -564,6 +564,31 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568667247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression,

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568699474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1573,8 +1573,10 @@ case class StringLPad(str: Expression, len:

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568735348 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -307,6 +307,74 @@ class CollationStringExpressionsSuite }) } +

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
vladimirg-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568733978 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -130,6 +215,9 @@ object CollationBenchmark extends

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568488642 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490894 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568600521 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490051 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607999 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils {

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607199 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false }

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568662387 ## common/utils/src/main/resources/error/README.md: ## @@ -1,77 +1,132 @@ -# Guidelines +# Guidelines for Throwing User-Facing Errors -To throw a standardized

Re: [PR] [SPARK-46810][DOCS][FOLLOWUP] Make some reference file links clickable [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #46105: URL: https://github.com/apache/spark/pull/46105#discussion_r1568731008 ## common/utils/src/main/resources/error/README.md: ## @@ -41,7 +41,7 @@ Unfortunately, we have historically used the term "error class" inconsistently t Fixing

Re: [PR] [SPARK-46810][DOCS][FOLLOWUP] Make some reference file links clickable [spark]

2024-04-17 Thread via GitHub
panbingkun commented on PR #46105: URL: https://github.com/apache/spark/pull/46105#issuecomment-2061100926 Before: https://github.com/apache/spark/assets/15246973/7e621daa-4771-4429-a8be-e7bc56504f71;> After:

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568662503 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2320,9 +2328,9 @@ case class Levenshtein( case class

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568735881 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -307,6 +307,74 @@ class CollationStringExpressionsSuite }) } +

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568736703 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -307,6 +307,74 @@ class CollationStringExpressionsSuite }) } +

Re: [PR] [SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-17 Thread via GitHub
zhengruifeng commented on code in PR #46098: URL: https://github.com/apache/spark/pull/46098#discussion_r1568753536 ## python/pyspark/sql/tests/connect/test_parity_udf_profiler.py: ## @@ -35,6 +49,7 @@ def action(df): with

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
nikolamand-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568732221 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/EncoderUtils.scala: ## @@ -77,6 +77,7 @@ object EncoderUtils { case _: DecimalType

Re: [PR] [SPARK-46810][DOCS][FOLLOWUP] Make some reference file links clickable [spark]

2024-04-17 Thread via GitHub
panbingkun commented on PR #46105: URL: https://github.com/apache/spark/pull/46105#issuecomment-2061102459 cc @nchammas @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47883][SQL] Make CollectTailExec.doExecute lazy with RowQueue [spark]

2024-04-17 Thread via GitHub
LuciferYang commented on code in PR #46101: URL: https://github.com/apache/spark/pull/46101#discussion_r1568624495 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -118,18 +118,52 @@ case class CollectLimitExec(limit: Int = -1, child: SparkPlan,

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568652210 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -804,21 +804,26 @@ case class Overlay(input: Expression,

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568652156 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) =>

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
GideonPotok commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568687405 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1573,8 +1573,10 @@ case class StringLPad(str: Expression,

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [WIP][SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-17 Thread via GitHub
vicennial commented on PR #46098: URL: https://github.com/apache/spark/pull/46098#issuecomment-2061077133 cc @ueshin @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1568746724 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -212,6 +212,49 @@ class CollationStringExpressionsSuite }) } +

Re: [PR] [SPARK-47763][CONNECT][TESTS] Enable local-cluster tests with pyspark-connect package [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on PR #46090: URL: https://github.com/apache/spark/pull/46090#issuecomment-2061124665 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-46810][DOCS][FOLLOWUP] Make file links clickable [spark]

2024-04-17 Thread via GitHub
panbingkun opened a new pull request, #46105: URL: https://github.com/apache/spark/pull/46105 ### What changes were proposed in this pull request? The pr is following up https://github.com/apache/spark/pull/44902, to make some `reference files` clickable. ### Why are the changes

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1568490593 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568627121 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -85,19 +86,103 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-17 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
panbingkun commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568665548 ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -125,23 +128,26 @@ class SparkThrowableSuite extends SparkFunSuite { s"Error classes

Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46041: URL: https://github.com/apache/spark/pull/46041#discussion_r1568701957 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1573,8 +1573,10 @@ case class StringLPad(str: Expression, len:

Re: [PR] [SPARK-47885][PYTHON][CONNECT] Make pyspark.resource compatible with pyspark-connect [spark]

2024-04-17 Thread via GitHub
HyukjinKwon closed pull request #46100: [SPARK-47885][PYTHON][CONNECT] Make pyspark.resource compatible with pyspark-connect URL: https://github.com/apache/spark/pull/46100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47763][CONNECT][TESTS] Enable local-cluster tests with pyspark-connect package [spark]

2024-04-17 Thread via GitHub
HyukjinKwon closed pull request #46090: [SPARK-47763][CONNECT][TESTS] Enable local-cluster tests with pyspark-connect package URL: https://github.com/apache/spark/pull/46090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
dbatomic commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568832269 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -48,6 +48,10 @@ object CollationTypeCasts extends

Re: [PR] [SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode [spark]

2024-04-17 Thread via GitHub
bozhang2820 commented on code in PR #45930: URL: https://github.com/apache/spark/pull/45930#discussion_r1568946086 ## core/src/main/scala/org/apache/spark/shuffle/MigratableResolver.scala: ## @@ -35,6 +35,11 @@ trait MigratableResolver { */ def getStoredShuffles():

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-17 Thread via GitHub
nchammas commented on code in PR #44902: URL: https://github.com/apache/spark/pull/44902#discussion_r1568947251 ## common/utils/src/main/resources/error/README.md: ## @@ -1,77 +1,132 @@ -# Guidelines +# Guidelines for Throwing User-Facing Errors -To throw a standardized

Re: [PR] [SPARK-46810][DOCS][FOLLOWUP] Make some reference file links clickable [spark]

2024-04-17 Thread via GitHub
nchammas commented on code in PR #46105: URL: https://github.com/apache/spark/pull/46105#discussion_r1568948639 ## common/utils/src/main/resources/error/README.md: ## @@ -41,7 +41,7 @@ Unfortunately, we have historically used the term "error class" inconsistently t Fixing

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46003: URL: https://github.com/apache/spark/pull/46003#discussion_r1568809481 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -48,6 +48,10 @@ object CollationTypeCasts extends

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-17 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1568822574 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -212,6 +212,49 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-17 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1568822574 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -212,6 +212,49 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto` [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on code in PR #46106: URL: https://github.com/apache/spark/pull/46106#discussion_r1568965955 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -23,7 +23,6 @@ import "google/protobuf/any.proto"; import

Re: [PR] [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on code in PR #46093: URL: https://github.com/apache/spark/pull/46093#discussion_r1568974565 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -875,21 +875,20 @@ object JdbcUtils extends Logging with

Re: [PR] [SPARK-44444][SQL] Use ANSI SQL mode by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2061488478 To @cloud-fan , ANSI GitHub Action job is already migrated to NON-ANSI GitAction CI via #46099 .  -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2061488942 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46078: [SPARK-47416][SQL] Add new functions to CollationBenchmark URL: https://github.com/apache/spark/pull/46078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47765][SQL][FOLLOWUP] Disable SET COLLATION when collations are disabled [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46103: URL: https://github.com/apache/spark/pull/46103#issuecomment-2061373126 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun closed pull request #46102: [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values URL: https://github.com/apache/spark/pull/46102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness [spark]

2024-04-17 Thread via GitHub
mihailom-db commented on code in PR #46104: URL: https://github.com/apache/spark/pull/46104#discussion_r1568845708 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -261,6 +261,156 @@ public void testEndsWith() throws SparkException

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-17 Thread via GitHub
nchammas commented on code in PR #44971: URL: https://github.com/apache/spark/pull/44971#discussion_r1568942970 ## docs/util/build-error-docs.py: ## @@ -0,0 +1,151 @@ +""" +Generate a unified page of documentation for all error conditions. +""" +import json +import os +import

Re: [PR] [SPARK-47884][INFRA] Switch ANSI SQL CI job to NON-ANSI SQL CI job [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #46099: URL: https://github.com/apache/spark/pull/46099#issuecomment-2061391552 Thank you, @HyukjinKwon and @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47765][SQL][FOLLOWUP] Disable SET COLLATION when collations are disabled [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46103: [SPARK-47765][SQL][FOLLOWUP] Disable SET COLLATION when collations are disabled URL: https://github.com/apache/spark/pull/46103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on code in PR #46093: URL: https://github.com/apache/spark/pull/46093#discussion_r1568970785 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -2182,4 +2183,13 @@ class JDBCSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-17 Thread via GitHub
nchammas commented on PR #44971: URL: https://github.com/apache/spark/pull/44971#issuecomment-2061538407 This is ready for review. The only two things I'll call out specifically for additional attention are this open TODO: > Figure out what, if anything, we will do about the links

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46078: URL: https://github.com/apache/spark/pull/46078#issuecomment-2061538706 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47726][DOC] Document push-based shuffle metrics [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun closed pull request #45872: [SPARK-47726][DOC] Document push-based shuffle metrics URL: https://github.com/apache/spark/pull/45872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2061794084 Oh, I must be clear. Unlike `spark.hadoop.fs.s3a.*` configuration, the following are applied to all FS globally. We cannot do that for non-S3A filesystem. That's the

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569214252 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569213581 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -137,6 +137,7 @@ class ListStateImplWithTTL[S]( /** Remove

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569182582 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569192544 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569195018 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TTLState.scala: ## @@ -163,6 +172,115 @@ abstract class SingleKeyTTLStateImpl( } }

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569194799 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TTLState.scala: ## @@ -59,23 +75,6 @@ trait TTLState { * @return number of values cleaned

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-17 Thread via GitHub
hvanhovell commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1569265956 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -198,6 +206,29 @@ private[sql] class SparkResult[T](

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
cloud-fan closed pull request #46003: [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck URL: https://github.com/apache/spark/pull/46003 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47360][SQL] Collation support: Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck [spark]

2024-04-17 Thread via GitHub
cloud-fan commented on PR #46003: URL: https://github.com/apache/spark/pull/46003#issuecomment-2061763385 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-17 Thread via GitHub
steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2061780507 What code is doing the instanceof check? it's in parquet. correct? as unless its been told to save a summary, it shouldn't care... -- This is an automated message from the Apache

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569182254 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -137,6 +137,7 @@ class ListStateImplWithTTL[S]( /**

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569223183 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-17 Thread via GitHub
uros-db commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1569220712 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -212,6 +212,119 @@ class CollationStringExpressionsSuite }) } +

Re: [PR] [SPARK-47726][DOC] Document push-based shuffle metrics [spark]

2024-04-17 Thread via GitHub
LucaCanali commented on PR #45872: URL: https://github.com/apache/spark/pull/45872#issuecomment-2061895510 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47889] Setup gradle as build tool for operator repository [spark-kubernetes-operator]

2024-04-17 Thread via GitHub
jiangzho opened a new pull request, #4: URL: https://github.com/apache/spark-kubernetes-operator/pull/4 This is a breakdown from #2 : set up [gradle](https://gradle.org/) as the build-tool for operator -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569180634 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala: ## @@ -108,6 +108,28 @@ private[sql] trait StatefulProcessorHandle extends

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569209877 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TTLState.scala: ## @@ -59,23 +75,6 @@ trait TTLState { * @return number of values cleaned up.

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569241785 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution [spark]

2024-04-17 Thread via GitHub
gengliangwang commented on PR #45748: URL: https://github.com/apache/spark/pull/45748#issuecomment-2061886298 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
gengliangwang closed pull request #46057: [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework URL: https://github.com/apache/spark/pull/46057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-17 Thread via GitHub
gengliangwang commented on PR #46057: URL: https://github.com/apache/spark/pull/46057#issuecomment-2062001672 @panbingkun Thanks for the good works! Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47726][DOC] Document push-based shuffle metrics [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #45872: URL: https://github.com/apache/spark/pull/45872#issuecomment-2061724477 Sorry for making you wait. It seems that SPARK-42203 will not happen. Let's proceed this first. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569193194 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StateTypesEncoderUtils.scala: ## @@ -192,6 +195,28 @@ class CompositeKeyStateEncoder[GK, K,

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-17 Thread via GitHub
anishshri-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1569224035 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution [spark]

2024-04-17 Thread via GitHub
gengliangwang closed pull request #45748: [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution URL: https://github.com/apache/spark/pull/45748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution [spark]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on PR #45748: URL: https://github.com/apache/spark/pull/45748#issuecomment-2061909354 cc @huaxingao , @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47889] Setup gradle as build tool for operator repository [spark-kubernetes-operator]

2024-04-17 Thread via GitHub
dongjoon-hyun commented on code in PR #4: URL: https://github.com/apache/spark-kubernetes-operator/pull/4#discussion_r1569544943 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more

[PR] [SPARK-47867][FOLLOWUP] Fix variant parsing in JacksonParser. [spark]

2024-04-17 Thread via GitHub
chenhao-db opened a new pull request, #46107: URL: https://github.com/apache/spark/pull/46107 ### What changes were proposed in this pull request? This PR fixes an issue introduced in https://github.com/apache/spark/pull/46071. When parsing a JSON object as a map or struct, the

[PR] [SPARK-47891][PYTHON][DOCS] Improve docstring of mapInPandas [spark]

2024-04-17 Thread via GitHub
xinrong-meng opened a new pull request, #46108: URL: https://github.com/apache/spark/pull/46108 ### What changes were proposed in this pull request? Improve docstring of mapInPandas - "using a Python native function that takes and outputs a pandas DataFrame" is confusing cause the

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
itholic commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1569710412 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,117 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

Re: [PR] [SPARK-46408][SQL] Support date_sub on V2ExpressionBuilder [spark]

2024-04-17 Thread via GitHub
github-actions[bot] commented on PR #44357: URL: https://github.com/apache/spark/pull/44357#issuecomment-2062764367 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

2024-04-17 Thread via GitHub
HyukjinKwon commented on code in PR #46096: URL: https://github.com/apache/spark/pull/46096#discussion_r1569712898 ## python/docs/source/getting_started/install.rst: ## @@ -165,16 +168,109 @@ To install PySpark from source, refer to |building_spark|_. Dependencies

<    1   2   3   >