Re: [PR] [SPARK-47706][BUILD] Bump json4s 4.0.7 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45838: [SPARK-47706][BUILD] Bump json4s 4.0.7 URL: https://github.com/apache/spark/pull/45838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on PR #45947: URL: https://github.com/apache/spark/pull/45947#issuecomment-2046697763 +1, LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558922175 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala: ## @@ -58,9 +59,12 @@ private[hive] class

[PR] [SPARK-47791][SQL] Truncate exceed decimals with scale first instead of precision from JDBC datasource [spark]

2024-04-10 Thread via GitHub
yaooqinn opened a new pull request, #45976: URL: https://github.com/apache/spark/pull/45976 ### What changes were proposed in this pull request? This PR is kind of a follow-up of SPARK-45905 but for JDBC datasource readings, which truncates exceed decimals with scale

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558919561 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite:

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558919622 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala: ## @@ -58,9 +59,12 @@ private[hive] class

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` [spark]

2024-04-10 Thread via GitHub
panbingkun commented on PR #45975: URL: https://github.com/apache/spark/pull/45975#issuecomment-2046600800 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47792][CORE] Make the value of MDC can support `null` [spark]

2024-04-10 Thread via GitHub
panbingkun opened a new pull request, #45975: URL: https://github.com/apache/spark/pull/45975 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47706][BUILD] Bump json4s 4.0.7 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45838: URL: https://github.com/apache/spark/pull/45838#issuecomment-2046598213 Thank you, @pan3793 and all. Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47736][SQL] Add support for AbstractArrayType [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45891: URL: https://github.com/apache/spark/pull/45891#discussion_r1558908333 ## sql/api/src/main/scala/org/apache/spark/sql/types/ArrayType.scala: ## @@ -43,6 +43,23 @@ object ArrayType extends AbstractDataType { override private[spark]

Re: [PR] [SPARK-47775][SQL] Support remaining scalar types in the variant spec. [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45945: URL: https://github.com/apache/spark/pull/45945#discussion_r1558932654 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala: ## @@ -248,9 +253,10 @@ case object VariantGet {

Re: [PR] [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45876: URL: https://github.com/apache/spark/pull/45876#discussion_r1558957480 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -686,17 +687,19 @@ private[hive] class HiveClientImpl( } catch {

[PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db opened a new pull request, #45978: URL: https://github.com/apache/spark/pull/45978 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45973: URL: https://github.com/apache/spark/pull/45973#issuecomment-2046704824 Merged to master. Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45973: [SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1 URL: https://github.com/apache/spark/pull/45973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47736][SQL] Add support for AbstractArrayType [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45891: URL: https://github.com/apache/spark/pull/45891#discussion_r155890 ## sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractArrayType.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-10 Thread via GitHub
chaoqin-li1123 opened a new pull request, #45977: URL: https://github.com/apache/spark/pull/45977 ### What changes were proposed in this pull request? SimpleDataSourceStreamReader is a simplified version of the DataSourceStreamReader interface. There are 3 functions that

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558920991 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetCatalogsOperation.scala: ## @@ -40,7 +41,7 @@ private[hive] class

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558921090 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
gengliangwang commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558920135 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala: ## @@ -126,7 +126,9 @@ private[hive] class

Re: [PR] [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45974: URL: https://github.com/apache/spark/pull/45974#issuecomment-2046667248 Thank you, @yaooqinn . This will unblock the previous `commons-compress` PR (on branch-3.5). -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45876: URL: https://github.com/apache/spark/pull/45876#discussion_r1558957480 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -686,17 +687,19 @@ private[hive] class HiveClientImpl( } catch {

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558992976 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala: ## @@ -58,9 +59,12 @@ private[hive] class

Re: [PR] [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun closed pull request #45974: [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1 URL: https://github.com/apache/spark/pull/45974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-47797] Skip deleting pod from k8s if the pod does not exists [spark]

2024-04-10 Thread via GitHub
leesf opened a new pull request, #45979: URL: https://github.com/apache/spark/pull/45979 ### What changes were proposed in this pull request? Skip deleting pod from k8s if the pod does not exists ### Why are the changes needed? Do not send to many requests to k8s api

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

[PR] [MINOR][PYTHON][TESTS] Enable `test_udf_cache` parity test [spark]

2024-04-10 Thread via GitHub
zhengruifeng opened a new pull request, #45980: URL: https://github.com/apache/spark/pull/45980 ### What changes were proposed in this pull request? Enable `test_udf_cache` parity test ### Why are the changes needed? test coverage ### Does this PR introduce _any_

Re: [PR] [SPARK-47797][K8S] Skip deleting pod from k8s if the pod does not exists [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45979: URL: https://github.com/apache/spark/pull/45979#issuecomment-2047033990 BTW, thank you for making a PR, @leesf . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-10 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1559250767 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Observation.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45946: URL: https://github.com/apache/spark/pull/45946#discussion_r1559288999 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1062,4 +1062,10 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
stefankandic commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559327591 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on code in PR #45816: URL: https://github.com/apache/spark/pull/45816#discussion_r1559330336 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -447,28 +442,50 @@ private UTF8String toUpperCaseSlow() { return

Re: [PR] [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1 [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on PR #45974: URL: https://github.com/apache/spark/pull/45974#issuecomment-2046913255 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47709][BUILD] Upgrade tink to 1.13.0 [spark]

2024-04-10 Thread via GitHub
LuciferYang commented on PR #45843: URL: https://github.com/apache/spark/pull/45843#issuecomment-2046948064 Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1559190807 ## docs/job-scheduling.md: ## @@ -92,6 +96,8 @@ In standalone mode, simply start your workers with `spark.shuffle.service.enable In YARN mode, follow the

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2047129492 > Is this a required change, @beliefer? I think we should add these document so as follows the other cluster mangers. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2047130102 cc @yaooqinn @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-10 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1559262517 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Observation.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-04-10 Thread via GitHub
TakawaAkirayo commented on PR #45367: URL: https://github.com/apache/spark/pull/45367#issuecomment-2047258464 Hi @LuciferYang @beliefer Just a follow up on this, what's the next action items? Any other reviewers need be involved? -- This is an automated message from the Apache Git

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559299218 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2046941212 > I don't think `CSVLegacyTimeParserSuite` is related to you, but it would probably be a very good idea to setup Maven so that you can run/debug all tests locally in general

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559135258 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

[PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn opened a new pull request, #45981: URL: https://github.com/apache/spark/pull/45981 ### What changes were proposed in this pull request? When parsing/reading a decimal column/field from json, jdbc, etc., if the column or field contains some values exceeding the

Re: [PR] [SPARK-47617][SQL] Add TPC-DS testing infrastructure for collations [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on code in PR #45739: URL: https://github.com/apache/spark/pull/45739#discussion_r1559228727 ## sql/core/src/test/scala/org/apache/spark/sql/TPCDSCollationQueryTestSuite.scala: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-10 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1559249905 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -363,6 +363,8 @@ object

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
dbatomic commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559290838 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559302476 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47797] Skip deleting pod from k8s if the pod does not exists [spark]

2024-04-10 Thread via GitHub
leesf commented on PR #45979: URL: https://github.com/apache/spark/pull/45979#issuecomment-2046981740 cc @HyukjinKwon would you please help to review this PR. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
mihailom-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559297852 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2213,8 +2213,8 @@ case class Levenshtein( }

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
panbingkun commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1558978194 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala: ## @@ -126,7 +126,9 @@ private[hive] class

Re: [PR] [SPARK-47617][SQL] Add TPC-DS testing infrastructure for collations [spark]

2024-04-10 Thread via GitHub
stefankandic commented on PR #45739: URL: https://github.com/apache/spark/pull/45739#issuecomment-2046858666 @nikolamand-db could we also run this test on delta in runtime? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
uros-db commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2046875237 I don't think `CSVLegacyTimeParserSuite` is related to you, but it would probably be a very good idea to setup Maven so that you can run/debug all tests locally in general -- This is

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559114967 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47791][SQL] Truncate exceed decimals with scale first instead of precision from JDBC datasource [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45976: URL: https://github.com/apache/spark/pull/45976#issuecomment-2046981192 Merged to master. Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47791][SQL] Truncate exceed decimals with scale first instead of precision from JDBC datasource [spark]

2024-04-10 Thread via GitHub
yaooqinn closed pull request #45976: [SPARK-47791][SQL] Truncate exceed decimals with scale first instead of precision from JDBC datasource URL: https://github.com/apache/spark/pull/45976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on PR #45963: URL: https://github.com/apache/spark/pull/45963#issuecomment-2047209932 Please review collation team @dbatomic @stefankandic @uros-db @mihailom-db @stevomitric. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559298235 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
dbatomic commented on code in PR #45816: URL: https://github.com/apache/spark/pull/45816#discussion_r1559134534 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -447,28 +442,50 @@ private UTF8String toUpperCaseSlow() { return

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1559159692 ## docs/job-scheduling.md: ## @@ -92,6 +96,8 @@ In standalone mode, simply start your workers with `spark.shuffle.service.enable In YARN mode, follow the

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45926: URL: https://github.com/apache/spark/pull/45926#discussion_r1559179777 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala: ## @@ -58,9 +59,12 @@ private[hive] class

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559304749 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2047328959 @uros-db I got the file-writing tests to work locally when I simply `export SPARK_HOME=/Users/gideon/repos/spark` prior to running my maven tests. More importantly, All GHA

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559304159 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
stefankandic commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559333644 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other,

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559298235 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
stefankandic commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559333644 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other,

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
dbatomic commented on code in PR #45816: URL: https://github.com/apache/spark/pull/45816#discussion_r1559131029 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -447,28 +442,50 @@ private UTF8String toUpperCaseSlow() { return

Re: [PR] [SPARK-47797] Skip deleting pod from k8s if the pod does not exists [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45979: URL: https://github.com/apache/spark/pull/45979#discussion_r1559145766 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala: ## @@ -166,15 +166,19 @@

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
dongjoon-hyun commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1559154695 ## docs/job-scheduling.md: ## @@ -53,7 +53,11 @@ Resource allocation can be configured as follows, based on the cluster type: on the cluster

Re: [PR] [SPARK-47795][DOC] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
beliefer commented on code in PR #45982: URL: https://github.com/apache/spark/pull/45982#discussion_r1559169283 ## docs/job-scheduling.md: ## @@ -53,7 +53,11 @@ Resource allocation can be configured as follows, based on the cluster type: on the cluster

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-10 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1559250767 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Observation.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47545][CONNECT] Dataset `observe` support for the Scala client [spark]

2024-04-10 Thread via GitHub
xupefei commented on code in PR #45701: URL: https://github.com/apache/spark/pull/45701#discussion_r1559266569 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -3397,7 +3488,11 @@ class Dataset[T] private[sql] (

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559311139 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559311139 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-10 Thread via GitHub
GideonPotok commented on PR #45738: URL: https://github.com/apache/spark/pull/45738#issuecomment-2046822456 > did you try to run this suite locally and investigate any potential issues? @uros-db I would love to. I need to fix my setup though, first -- there is an issue I have been

Re: [PR] [SPARK-47617][SQL] Add TPC-DS testing infrastructure for collations [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45739: URL: https://github.com/apache/spark/pull/45739#discussion_r1559086524 ## sql/core/src/test/scala/org/apache/spark/sql/TPCDSCollationQueryTestSuite.scala: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45876: URL: https://github.com/apache/spark/pull/45876#discussion_r1559174377 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -686,17 +687,19 @@ private[hive] class HiveClientImpl( } catch {

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
stefankandic commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559326485 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other,

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559340404 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other,

Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
dbatomic commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559427040 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559441689 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559441689 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45805: [SPARK-47680][SQL] Add variant_explode expression. URL: https://github.com/apache/spark/pull/45805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45805: URL: https://github.com/apache/spark/pull/45805#issuecomment-2047758862 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47692][SQL] Addition of priority flag to StringType [spark]

2024-04-10 Thread via GitHub
srielau commented on code in PR #45819: URL: https://github.com/apache/spark/pull/45819#discussion_r1559602846 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -645,6 +646,34 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47798][SQL] Enrich the error message for the reading failures of decimal values [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45981: URL: https://github.com/apache/spark/pull/45981#issuecomment-2047430690 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal mapping remarks with JDBC data sources [spark]

2024-04-10 Thread via GitHub
yaooqinn opened a new pull request, #45984: URL: https://github.com/apache/spark/pull/45984 ### What changes were proposed in this pull request? Followup of SPARK-47781, SPARK-47791 and SPARK-47798, to update the decimal mapping remarks with JDBC data sources ### Why

Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559445525 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47680][SQL] Add variant_explode expression. [spark]

2024-04-10 Thread via GitHub
srielau commented on code in PR #45805: URL: https://github.com/apache/spark/pull/45805#discussion_r1559582943 ## sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala: ## @@ -298,4 +300,28 @@ class VariantSuite extends QueryTest with SharedSparkSession { }

Re: [PR] [SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
dbatomic commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559428506 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1559431415 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -89,6 +89,73 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47775][SQL] Support remaining scalar types in the variant spec. [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45945: URL: https://github.com/apache/spark/pull/45945#issuecomment-2047765218 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
nikolamand-db commented on code in PR #45816: URL: https://github.com/apache/spark/pull/45816#discussion_r1559642438 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -424,21 +424,16 @@ public UTF8String toUpperCase() { if (numBytes == 0)

Re: [PR] [DRAFT][SPARK-47410][SQL] refactor UTF8String and CollationFactory [spark]

2024-04-10 Thread via GitHub
uros-db commented on code in PR #45978: URL: https://github.com/apache/spark/pull/45978#discussion_r1559355890 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationStringExpressions.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47775][SQL] Support remaining scalar types in the variant spec. [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45945: [SPARK-47775][SQL] Support remaining scalar types in the variant spec. URL: https://github.com/apache/spark/pull/45945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45816: [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation URL: https://github.com/apache/spark/pull/45816 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47693][SQL] Add optimization for lowercase comparison of UTF8String used in UTF8_BINARY_LCASE collation [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45816: URL: https://github.com/apache/spark/pull/45816#issuecomment-2047847538 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2047412733 I didn't even notice this page or section. The dropdown from the Navi Bar is enough for me. Do we support scheduling jobs across applications? It's odd to me. -- This is an

Re: [PR] [SPARK-47795][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-10 Thread via GitHub
yaooqinn commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2047422813 Nit: Use K8s instead of K8S, the former is the official abbreviation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

  1   2   3   >