Re: [PR] [SPARK-47001][SQL] Pushdown verification in optimizer [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45146: URL: https://github.com/apache/spark/pull/45146#issuecomment-2048834064 The GA failure is unrelated, I'm merging this to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2048842691 With hindsight, we shouldn't create the v2 `Predicate` API in the first place, and should just use the v2 `Expression` API. The `Predicate` trait in catalyst is not useful as well.

Re: [PR] [SPARK-47807][PYTHON][ML] Make pyspark.ml compatible witbh pyspark-connect [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45995: URL: https://github.com/apache/spark/pull/45995#issuecomment-2048857160 cc @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560444898 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala: ## @@ -218,7 +232,9 @@ private[thriftserver]

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048941620 @uros-db No problem at all. if I understand your refactor correctly, my changes will basically either stay in the same place or move to the new

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560451327 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -172,19 +183,31 @@ public Collation( } /** - * Auxiliary

Re: [PR] [SPARK-47410][SQL] Refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1560452406 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45923: URL: https://github.com/apache/spark/pull/45923#discussion_r1560460548 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala: ## @@ -218,7 +232,9 @@ private[thriftserver] class

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560461520 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new java.util.HashMap[String,

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560470002 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +108,10 @@ trait Logging { val context = new

[PR] [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45993: URL: https://github.com/apache/spark/pull/45993 ### What changes were proposed in this pull request? This PR fixes the documentation of `spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1560355740 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1636,14 +1699,13 @@ public int levenshteinDistance(UTF8String other, int

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2048826987 > Do we support scheduling jobs across applications? It's odd to me. This section is about scheduling across applications. `Scheduling Within an Application` section is related

Re: [PR] [SPARK-47001][SQL] Pushdown verification in optimizer [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45146: [SPARK-47001][SQL] Pushdown verification in optimizer URL: https://github.com/apache/spark/pull/45146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [MINOR][DOCS] Make the link of spark properties with YARN more accurate [spark]

2024-04-10 Thread via GitHub
beliefer opened a new pull request, #45994: URL: https://github.com/apache/spark/pull/45994 ### What changes were proposed in this pull request? This PR propose to make the link of spark properties with YARN more accurate. ### Why are the changes needed? Currently, the link

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376615 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -209,7 +209,12 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376969 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -389,6 +394,16 @@ class V2ExpressionBuilder(e: Expression,

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560377503 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -389,6 +394,16 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [spark]

2024-04-10 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1560376882 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,41 @@ class DataSourceV2Suite extends QueryTest with

[PR] [SPARK-47807][PYTHON][ML] Make pyspark.ml compatible witbh pyspark-connect [spark]

2024-04-10 Thread via GitHub
HyukjinKwon opened a new pull request, #45995: URL: https://github.com/apache/spark/pull/45995 ### What changes were proposed in this pull request? This PR proposes to make `pyspark.ml` compatible with `pyspark-connect`. ### Why are the changes needed? In order for

Re: [PR] [SPARK-47808][PYTHON][ML][TESTS] Make pyspark.ml.connect tests running without optional dependencies [spark]

2024-04-10 Thread via GitHub
HyukjinKwon commented on PR #45996: URL: https://github.com/apache/spark/pull/45996#issuecomment-2048865644 cc @zhengruifeng @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45997: URL: https://github.com/apache/spark/pull/45997#discussion_r1560396328 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala: ## @@ -170,18 +170,15 @@ trait ExpressionEvalHelper extends

Re: [PR] [SPARK-47809][SQL][TEST] `checkExceptionInExpression` should check error for each codegen mode [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45997: URL: https://github.com/apache/spark/pull/45997#issuecomment-2048874386 cc @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048919774 @uros-db this is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1560434354 ## docs/job-scheduling.md: ## @@ -92,6 +96,8 @@ In standalone mode, simply start your workers with `spark.shuffle.service.enable In YARN mode, follow the

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2048944931 PS: Do you think changes, such as these, which are only to implementations of `inputTypes` and `replacement`, which do not rely on calling UTFString or CollationFactory, will need to

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45975: URL: https://github.com/apache/spark/pull/45975#discussion_r1560455923 ## common/utils/src/test/scala/org/apache/spark/util/MDCSuite.scala: ## @@ -41,6 +41,21 @@ class MDCSuite assert(log.context === Map("exit_code" ->

<    1   2   3