Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560470002 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560461520 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new java.util.HashMap[String,

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560460548 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala: ## @@ -218,7 +232,9 @@ private[thriftserver] class

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560460023 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -37,7 +37,10 @@ import org.apache.spark.util.SparkClassUtils * The values of the MDC

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
uros-db commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048962946 @GideonPotok You are correct, this refactor should not greatly affect your current PR in particular - I expect you'll only need to refactor testing a bit (shouldn't be too much work)

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560458632 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -187,8 +187,9 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560455923 ## common/utils/src/test/scala/org/apache/spark/util/MDCSuite.scala: ## @@ -41,6 +41,21 @@ class MDCSuite assert(log.context === Map("exit_code" ->

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45927: [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework URL: https://github.com/apache/spark/pull/45927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560453598 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -172,19 +183,31 @@ public Collation( } /** - * Auxiliary

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45927: URL: https://github.com/apache/spark/pull/45927#issuecomment-2048956649 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560454235 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560452406 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560451327 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -172,19 +183,31 @@ public Collation( } /** - * Auxiliary

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048944931 PS: Do you think changes, such as these, which are only to implementations of `inputTypes` and `replacement`, which do not rely on calling UTFString or CollationFactory, will need to

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560444898 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala: ## @@ -218,7 +232,9 @@ private[thriftserver]

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048941620 @uros-db No problem at all. if I understand your refactor correctly, my changes will basically either stay in the same place or move to the new

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1560434354 ## docs/job-scheduling.md: ## @@ -92,6 +96,8 @@ In standalone mode, simply start your workers with `spark.shuffle.service.enable In YARN mode, follow the

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
uros-db commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048923852 @GideonPotok nice work, thanks! Heads up though: we will soon be finishing some code refactoring related to collation-aware string expression support

Re: [PR] [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45947: [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework URL: https://github.com/apache/spark/pull/45947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45947: URL: https://github.com/apache/spark/pull/45947#issuecomment-2048920978 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048919774 @uros-db this is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560429054 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560425964 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

[PR] [SPARK-46812][CONNECT][FOLLOW-UP] Make `handleCreateResourceProfileCommand` private [spark]

2024-04-10 Thread via GitHub
zhengruifeng opened a new pull request, #45998: URL: https://github.com/apache/spark/pull/45998 ### What changes were proposed in this pull request? Make `handleCreateResourceProfileCommand` private ### Why are the changes needed? it should not be exposed to users

Re: [PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45981: URL: https://github.com/apache/spark/pull/45981#issuecomment-2048890101 Merged to master Thank you @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn closed pull request #45981: [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values URL: https://github.com/apache/spark/pull/45981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47609][SQL] Making CacheLookup more optimal to minimize cache miss [spark]

2024-04-10 Thread via GitHub
anchovYu commented on PR #45935: URL: https://github.com/apache/spark/pull/45935#issuecomment-2048878434 Hi @ahshahid , thanks for the proposal and the PR. However, the current Dataframe cache design has a lot of design flaws, I would worry that improving the cache hit rate in this case

Re: [PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45997: URL: https://github.com/apache/spark/pull/45997#discussion_r1560396328 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala: ## @@ -170,18 +170,15 @@ trait ExpressionEvalHelper extends

Re: [PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45997: URL: https://github.com/apache/spark/pull/45997#issuecomment-2048874386 cc @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan opened a new pull request, #45997: URL: https://github.com/apache/spark/pull/45997 ### What changes were proposed in this pull request? There is a bug in the test util `checkExceptionInExpression`. It may fail to catch bugs when codegen and non-codegen have

Re: [PR] [MINOR][DOCS] Make the link of spark properties with YARN more accurate [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45994: [MINOR][DOCS] Make the link of spark properties with YARN more accurate URL: https://github.com/apache/spark/pull/45994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47808][PYTHON][ML][TESTS] Make pyspark.ml.connect tests running without optional dependencies [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45996: URL: https://github.com/apache/spark/pull/45996#issuecomment-2048865644 cc @zhengruifeng @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47807][PYTHON][ML] Make pyspark.ml compatible witbh pyspark-connect [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45995: URL: https://github.com/apache/spark/pull/45995 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.ml` compatible with `pyspark-connect`. ### Why are the changes needed? In order for

Re: [PR] [SPARK-47807][PYTHON][ML] Make pyspark.ml compatible witbh pyspark-connect [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45995: URL: https://github.com/apache/spark/pull/45995#issuecomment-2048857160 cc @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-04-10 Thread via GitHub
TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1560378130 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1014,6 +1014,15 @@ package object config { .timeConf(TimeUnit.NANOSECONDS)

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560377503 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -389,6 +394,16 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376969 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -389,6 +394,16 @@ class V2ExpressionBuilder(e: Expression,

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376882 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376615 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -209,7 +209,12 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375676 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -209,7 +209,12 @@ class V2ExpressionBuilder(e: Expression,

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375469 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375353 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560375211 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2048842691 With hindsight, we shouldn't create the v2 `Predicate` API in the first place, and should just use the v2 `Expression` API. The `Predicate` trait in catalyst is not useful as well.

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2048841083 The `org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison` optimizer may also fold predicate. -- This is an automated message from the Apache Git Service. To respond to the

[PR] [MINOR][DOCS] Make the link of spark properties with YARN more accurate [spark]

2024-04-10 Thread via GitHub
beliefer opened a new pull request, #45994: URL: https://github.com/apache/spark/pull/45994 ### What changes were proposed in this pull request? This PR propose to make the link of spark properties with YARN more accurate. ### Why are the changes needed? Currently, the link

Re: [PR] [SPARK-47001][SQL] Pushdown verification in optimizer [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45146: [SPARK-47001][SQL] Pushdown verification in optimizer URL: https://github.com/apache/spark/pull/45146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47001][SQL] Pushdown verification in optimizer [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45146: URL: https://github.com/apache/spark/pull/45146#issuecomment-2048834064 The GA failure is unrelated, I'm merging this to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1560363139 ## docs/job-scheduling.md: ## @@ -53,7 +53,11 @@ Resource allocation can be configured as follows, based on the cluster type: on the cluster

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2048826987 > Do we support scheduling jobs across applications? It's odd to me. This section is about scheduling across applications. `Scheduling Within an Application` section is related

Re: [PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45993: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` URL: https://github.com/apache/spark/pull/45993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45993: URL: https://github.com/apache/spark/pull/45993#issuecomment-2048825061 Merged to master, branch-3.5 and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560358232 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560355740 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1636,14 +1699,13 @@ public int levenshteinDistance(UTF8String other, int

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560352992 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1515,62 @@ public boolean semanticEquals(final UTF8String other, int

Re: [PR] [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on PR #45927: URL: https://github.com/apache/spark/pull/45927#issuecomment-2048804834 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45993: URL: https://github.com/apache/spark/pull/45993 ### What changes were proposed in this pull request? This PR fixes the documentation of `spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45946: URL: https://github.com/apache/spark/pull/45946#discussion_r1560342794 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1062,4 +1062,11 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45927: URL: https://github.com/apache/spark/pull/45927#discussion_r1560342791 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: ## @@ -104,21 +105,28 @@ private[hive] class HiveMetastoreCatalog(sparkSession:

Re: [PR] [WIP][SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45927: URL: https://github.com/apache/spark/pull/45927#discussion_r1560341095 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -229,8 +230,8 @@ private[hive] class HiveClientImpl( case e:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048796376 Thanks @cloud-fan @ueshin @HyukjinKwon @xinrong-meng for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45833: [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled URL: https://github.com/apache/spark/pull/45833 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47704][SQL] JSON parsing fails with "java.lang.ClassCastException" when spark.sql.json.enablePartialResults is enabled [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45833: URL: https://github.com/apache/spark/pull/45833#issuecomment-2048792451 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on PR #45937: URL: https://github.com/apache/spark/pull/45937#issuecomment-2048792306 > Should we also add some metrics around ttl? (Like keys deleted from state on ttl expiry?) Done, added -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-10 Thread via GitHub
anishshri-db commented on code in PR #45937: URL: https://github.com/apache/spark/pull/45937#discussion_r1560336821 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -112,6 +116,10 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45377: [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors URL: https://github.com/apache/spark/pull/45377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048784678 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47800][SQL] Create new method for identifier to tableIdentifier conversion [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45985: URL: https://github.com/apache/spark/pull/45985#discussion_r1560306930 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -118,12 +117,8 @@ class DataSourceV2Strategy(session:

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560294470 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560293229 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45987: [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * URL: https://github.com/apache/spark/pull/45987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47802][SQL] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45987: URL: https://github.com/apache/spark/pull/45987#issuecomment-2048732729 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560262338 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
gene-db commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048671561 > @gene-db seems like the JIRA ID is wrong. Do we have a dedicated JIRA for this? hrmmm, I don't think there was a dedicated jira for this. Looks like I will have to create a jira

Re: [PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45992: [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository URL: https://github.com/apache/spark/pull/45992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45992: URL: https://github.com/apache/spark/pull/45992#issuecomment-2048663695 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560219006 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1560215436 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45978: URL: https://github.com/apache/spark/pull/45978#issuecomment-2048651969 Can you please fill the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
zhengruifeng commented on PR #45614: URL: https://github.com/apache/spark/pull/45614#issuecomment-2048647268 thanks @HyukjinKwon and @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
zhengruifeng commented on PR #45980: URL: https://github.com/apache/spark/pull/45980#issuecomment-2048642692 thanks @xinrong-meng merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
zhengruifeng closed pull request #45980: [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test URL: https://github.com/apache/spark/pull/45980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47672][SQL] Avoid double eval from filter pushDown [spark]

2024-04-10 Thread via GitHub
holdenk commented on PR #45802: URL: https://github.com/apache/spark/pull/45802#issuecomment-2048641757 Another possible solution would be to also break up the projection and move the part of the projection which is used in the filter down with the filter unless the only thing the

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048636068 @gene-db seems like the JIRA ID is wrong. Do we have a dedicated JIRA for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45826: [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark URL: https://github.com/apache/spark/pull/45826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2048634802 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560187499 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -100,6 +101,7 @@ object LogKey extends Enumeration { val OFFSETS = Value val

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #45614: [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` URL: https://github.com/apache/spark/pull/45614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [MINOR] Make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #18: URL: https://github.com/apache/spark-connect-go/pull/18#issuecomment-2048625272 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Make readme easier to follow [spark-connect-go]

2024-04-10 Thread via GitHub
HyukjinKwon closed pull request #18: [MINOR] Make readme easier to follow URL: https://github.com/apache/spark-connect-go/pull/18 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41811][PYTHON][CONNECT] Implement `SQLStringFormatter` with `WithRelations` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45614: URL: https://github.com/apache/spark/pull/45614#issuecomment-2048624804 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47802] Revert (*) from meaning struct(*) back to meaning * [spark]

2024-04-10 Thread via GitHub
srielau commented on PR #45987: URL: https://github.com/apache/spark/pull/45987#issuecomment-2048618906 @cloud-fan @gengliangwang This is ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-47725][INFRA][FOLLOW-UP] Do not run scheduled job in forked repository [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45992: URL: https://github.com/apache/spark/pull/45992 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/45870 that skips the run in forked repository. ### Why are the changes

[PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-10 Thread via GitHub
ericm-db opened a new pull request, #45991: URL: https://github.com/apache/spark/pull/45991 ### What changes were proposed in this pull request? This PR adds support for expiring state based on TTL for MapState. Using this functionality, Spark users can specify a TTL Mode for

[PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-10 Thread via GitHub
anchovYu opened a new pull request, #45990: URL: https://github.com/apache/spark/pull/45990 ### What changes were proposed in this pull request? This PR adds a debug log for Dataframe cache that uses SQL conf to turn on. It logs necessary information on * cache hit during cache

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 commented on PR #45977: URL: https://github.com/apache/spark/pull/45977#issuecomment-2048549899 @allisonwang-db @HyukjinKwon @HeartSaVioR PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47803][SQL] Support cast to variant. [spark]

2024-04-10 Thread via GitHub
chenhao-db opened a new pull request, #45989: URL: https://github.com/apache/spark/pull/45989 ### What changes were proposed in this pull request? This PR allows casting another type into the variant type. The changes can be divided into two major parts: - The `VariantBuilder`

Re: [PR] [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang closed pull request #45910: [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework URL: https://github.com/apache/spark/pull/45910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on PR #45910: URL: https://github.com/apache/spark/pull/45910#issuecomment-2048533652 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560115778 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -100,6 +101,7 @@ object LogKey extends Enumeration { val OFFSETS = Value val

  1   2   3   >