Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559438477 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47775][SQL] Support remaining scalar types in the variant spec. [spark]

2024-04-10 Thread via GitHub
chenhao-db commented on code in PR #45945: URL: https://github.com/apache/spark/pull/45945#discussion_r1559545416 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -248,9 +253,10 @@ case object VariantGet {

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45816: URL: https://github.com/apache/spark/pull/45816#discussion_r1559623641 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -424,21 +424,16 @@ public UTF8String toUpperCase() { if (numBytes == 0) {

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-04-10 Thread via GitHub
mridulm commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1559712537 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1014,6 +1014,15 @@ package object config { .timeConf(TimeUnit.NANOSECONDS)

Re: [PR] make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
nkarpov commented on PR #18: URL: https://github.com/apache/spark-connect-go/pull/18#issuecomment-2048055764  nice I like it. I got tripped up bouncing between readme & quick start as a newbie. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559866658 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559866863 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software

Re: [PR] make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
grundprinzip commented on code in PR #18: URL: https://github.com/apache/spark-connect-go/pull/18#discussion_r1559883957 ## README.md: ## @@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the development of this proje if it is not sustainable. ##

Re: [PR] make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
MrPowers commented on code in PR #18: URL: https://github.com/apache/spark-connect-go/pull/18#discussion_r1560029954 ## README.md: ## @@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the development of this proje if it is not sustainable. ## Getting

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559872175 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithListStateTTLSuite.scala: ## @@ -0,0 +1,365 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559870514 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithListStateTTLSuite.scala: ## @@ -0,0 +1,365 @@ +/* + * Licensed to the Apache Software

[PR] [SPARK-47802] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
srielau opened a new pull request, #45987: URL: https://github.com/apache/spark/pull/45987 ### What changes were proposed in this pull request? We will revert the meaning of `(*)` back to its original Spark 3.x behavior. ### Why are the changes needed? There are

[PR] Spark collation 47413 3 [spark]

2024-04-10 Thread via GitHub
GideonPotok opened a new pull request, #45986: URL: https://github.com/apache/spark/pull/45986 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1559997742 ## docs/job-scheduling.md: ## @@ -53,7 +53,11 @@ Resource allocation can be configured as follows, based on the cluster type: on the cluster

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559855781 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559856458 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559857448 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImplWithTTL.scala: ## @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software

Re: [PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
xinrong-meng commented on PR #45980: URL: https://github.com/apache/spark/pull/45980#issuecomment-2048143854 LGTM thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559874230 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithListStateTTLSuite.scala: ## @@ -0,0 +1,365 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45932: URL: https://github.com/apache/spark/pull/45932#discussion_r1559873586 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/ValueStateSuite.scala: ## @@ -356,7 +357,8 @@ class ValueStateSuite extends

Re: [PR] [SPARK-47799][BUILD] Preserve parameter information when using SBT package jar [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45983: URL: https://github.com/apache/spark/pull/45983#discussion_r1559881657 ## project/SparkBuild.scala: ## @@ -311,6 +311,7 @@ object SparkBuild extends PomBuild { (Compile / javacOptions) ++= Seq( "-encoding",

Re: [PR] [SPARK-47799][BUILD] Preserve parameter information when using SBT package jar [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45983: URL: https://github.com/apache/spark/pull/45983#discussion_r1559891702 ## project/SparkBuild.scala: ## @@ -311,6 +311,7 @@ object SparkBuild extends PomBuild { (Compile / javacOptions) ++= Seq( "-encoding",

Re: [PR] [SPARK-47617][SQL] Add TPC-DS testing infrastructure for collations [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45739: URL: https://github.com/apache/spark/pull/45739#discussion_r1559852667 ## sql/core/src/test/scala/org/apache/spark/sql/TPCDSCollationQueryTestSuite.scala: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45910: URL: https://github.com/apache/spark/pull/45910#issuecomment-2048533652 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45978: URL: https://github.com/apache/spark/pull/45978#issuecomment-2048651969 Can you please fill the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560293229 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47800][SQL] Create new method for identifier to tableIdentifier conversion [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45985: URL: https://github.com/apache/spark/pull/45985#discussion_r1560306930 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -118,12 +117,8 @@ class DataSourceV2Strategy(session:

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-10 Thread via GitHub
sahnib commented on code in PR #45937: URL: https://github.com/apache/spark/pull/45937#discussion_r1560074588 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -112,6 +116,10 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560115778 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -100,6 +101,7 @@ object LogKey extends Enumeration { val OFFSETS = Value val

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45826: [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark URL: https://github.com/apache/spark/pull/45826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048636068 @gene-db seems like the JIRA ID is wrong. Do we have a dedicated JIRA for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
zhengruifeng commented on PR #45614: URL: https://github.com/apache/spark/pull/45614#issuecomment-2048647268 thanks @HyukjinKwon and @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45992: URL: https://github.com/apache/spark/pull/45992#issuecomment-2048663695 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45992: [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository URL: https://github.com/apache/spark/pull/45992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-47803][SQL] Support cast to variant. [spark]

2024-04-10 Thread via GitHub
chenhao-db opened a new pull request, #45989: URL: https://github.com/apache/spark/pull/45989 ### What changes were proposed in this pull request? This PR allows casting another type into the variant type. The changes can be divided into two major parts: - The `VariantBuilder`

[PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-10 Thread via GitHub
anchovYu opened a new pull request, #45990: URL: https://github.com/apache/spark/pull/45990 ### What changes were proposed in this pull request? This PR adds a debug log for Dataframe cache that uses SQL conf to turn on. It logs necessary information on * cache hit during cache

[PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45992: URL: https://github.com/apache/spark/pull/45992 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/45870 that skips the run in forked repository. ### Why are the changes

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560187499 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -100,6 +101,7 @@ object LogKey extends Enumeration { val OFFSETS = Value val

Re: [PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
zhengruifeng commented on PR #45980: URL: https://github.com/apache/spark/pull/45980#issuecomment-2048642692 thanks @xinrong-meng merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
gene-db commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048671561 > @gene-db seems like the JIRA ID is wrong. Do we have a dedicated JIRA for this? hrmmm, I don't think there was a dedicated jira for this. Looks like I will have to create a jira

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560294470 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048784678 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45377: [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors URL: https://github.com/apache/spark/pull/45377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK-47174][CONNECT][SS][1/2] Server side SparkConnectListenerBusListener for Client side streaming query listener [spark]

2024-04-10 Thread via GitHub
WweiL opened a new pull request, #45988: URL: https://github.com/apache/spark/pull/45988 ### What changes were proposed in this pull request? Server side `SparkConnectListenerBusListener` implementation for the client side listener. There would only be one such listener for

Re: [PR] [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45910: [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework URL: https://github.com/apache/spark/pull/45910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-10 Thread via GitHub
ericm-db opened a new pull request, #45991: URL: https://github.com/apache/spark/pull/45991 ### What changes were proposed in this pull request? This PR adds support for expiring state based on TTL for MapState. Using this functionality, Spark users can specify a TTL Mode for

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048634802 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47672][SQL] Avoid double eval from filter pushDown [spark]

2024-04-10 Thread via GitHub
holdenk commented on PR #45802: URL: https://github.com/apache/spark/pull/45802#issuecomment-2048641757 Another possible solution would be to also break up the projection and move the part of the projection which is used in the filter down with the filter unless the only thing the

Re: [PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
zhengruifeng closed pull request #45980: [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test URL: https://github.com/apache/spark/pull/45980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560262338 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on PR #45937: URL: https://github.com/apache/spark/pull/45937#issuecomment-2048792306 > Should we also add some metrics around ttl? (Like keys deleted from state on ttl expiry?) Done, added -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45937: URL: https://github.com/apache/spark/pull/45937#discussion_r1560336821 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -112,6 +116,10 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal mapping remarks with JDBC data sources [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45984: [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal mapping remarks with JDBC data sources URL: https://github.com/apache/spark/pull/45984 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal mapping remarks with JDBC data sources [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45984: URL: https://github.com/apache/spark/pull/45984#issuecomment-2048435975 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on PR #45977: URL: https://github.com/apache/spark/pull/45977#issuecomment-2048549899 @allisonwang-db @HyukjinKwon @HeartSaVioR PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47802] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
srielau commented on PR #45987: URL: https://github.com/apache/spark/pull/45987#issuecomment-2048618906 @cloud-fan @gengliangwang This is ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [MINOR] Make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #18: URL: https://github.com/apache/spark-connect-go/pull/18#issuecomment-2048625272 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #18: [MINOR] Make readme easier to follow URL: https://github.com/apache/spark-connect-go/pull/18 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45614: [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` URL: https://github.com/apache/spark/pull/45614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45614: URL: https://github.com/apache/spark/pull/45614#issuecomment-2048624804 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560215436 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560219006 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45987: URL: https://github.com/apache/spark/pull/45987#issuecomment-2048732729 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45987: [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * URL: https://github.com/apache/spark/pull/45987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45833: [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled URL: https://github.com/apache/spark/pull/45833 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45833: URL: https://github.com/apache/spark/pull/45833#issuecomment-2048792451 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048796376 Thanks @cloud-fan @ueshin @HyukjinKwon @xinrong-meng for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [WIP][SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45927: URL: https://github.com/apache/spark/pull/45927#discussion_r1560341095 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -229,8 +230,8 @@ private[hive] class HiveClientImpl( case e:

Re: [PR] [WIP][SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45927: URL: https://github.com/apache/spark/pull/45927#discussion_r1560342791 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: ## @@ -104,21 +105,28 @@ private[hive] class HiveMetastoreCatalog(sparkSession:

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45946: URL: https://github.com/apache/spark/pull/45946#discussion_r1560342794 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1062,4 +1062,11 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on PR #45927: URL: https://github.com/apache/spark/pull/45927#issuecomment-2048804834 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45993: URL: https://github.com/apache/spark/pull/45993#issuecomment-2048825061 Merged to master, branch-3.5 and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45993: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` URL: https://github.com/apache/spark/pull/45993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375469 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375353 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375676 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -209,7 +209,12 @@ class V2ExpressionBuilder(e: Expression,

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-04-10 Thread via GitHub
TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1560378130 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1014,6 +1014,15 @@ package object config { .timeConf(TimeUnit.NANOSECONDS)

[PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan opened a new pull request, #45997: URL: https://github.com/apache/spark/pull/45997 ### What changes were proposed in this pull request? There is a bug in the test util `checkExceptionInExpression`. It may fail to catch bugs when codegen and non-codegen have

[PR] [SPARK-46812][CONNECT][FOLLOW-UP] Make `handleCreateResourceProfileCommand` private [spark]

2024-04-10 Thread via GitHub
zhengruifeng opened a new pull request, #45998: URL: https://github.com/apache/spark/pull/45998 ### What changes were proposed in this pull request? Make `handleCreateResourceProfileCommand` private ### Why are the changes needed? it should not be exposed to users

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560425964 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
uros-db commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048923852 @GideonPotok nice work, thanks! Heads up though: we will soon be finishing some code refactoring related to collation-aware string expression support

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560453598 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -172,19 +183,31 @@ public Collation( } /** - * Auxiliary

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45927: [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework URL: https://github.com/apache/spark/pull/45927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560460023 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -37,7 +37,10 @@ import org.apache.spark.util.SparkClassUtils * The values of the MDC

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
uros-db commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048962946 @GideonPotok You are correct, this refactor should not greatly affect your current PR in particular - I expect you'll only need to refactor testing a bit (shouldn't be too much work)

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1560363139 ## docs/job-scheduling.md: ## @@ -53,7 +53,11 @@ Resource allocation can be configured as follows, based on the cluster type: on the cluster

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2048841083 The `org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison` optimizer may also fold predicate. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375211 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [MINOR][DOCS] Make the link of spark properties with YARN more accurate [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45994: [MINOR][DOCS] Make the link of spark properties with YARN more accurate URL: https://github.com/apache/spark/pull/45994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47609][SQL] Making CacheLookup more optimal to minimize cache miss [spark]

2024-04-10 Thread via GitHub
anchovYu commented on PR #45935: URL: https://github.com/apache/spark/pull/45935#issuecomment-2048878434 Hi @ahshahid , thanks for the proposal and the PR. However, the current Dataframe cache design has a lot of design flaws, I would worry that improving the cache hit rate in this case

Re: [PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45981: URL: https://github.com/apache/spark/pull/45981#issuecomment-2048890101 Merged to master Thank you @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn closed pull request #45981: [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values URL: https://github.com/apache/spark/pull/45981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560429054 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new

Re: [PR] [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45947: URL: https://github.com/apache/spark/pull/45947#issuecomment-2048920978 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45947: [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework URL: https://github.com/apache/spark/pull/45947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560454235 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45927: URL: https://github.com/apache/spark/pull/45927#issuecomment-2048956649 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560458632 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -187,8 +187,9 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560352992 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1515,62 @@ public boolean semanticEquals(final UTF8String other, int

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560358232 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

<    1   2   3   >