Re: [PR] [SPARK-48017] Add Spark application submission worker for operator [spark-kubernetes-operator]

2024-05-14 Thread via GitHub
dongjoon-hyun commented on code in PR #10: URL: https://github.com/apache/spark-kubernetes-operator/pull/10#discussion_r1600381515 ## gradle.properties: ## @@ -18,17 +18,23 @@ group=org.apache.spark.k8s.operator version=0.1.0 +# Caution: fabric8 version should be aligned

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-05-14 Thread via GitHub
sahnib commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1600366874 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -219,10 +222,41 @@ object

Re: [PR] [SPARK-48017] Add Spark application submission worker for operator [spark-kubernetes-operator]

2024-05-14 Thread via GitHub
dongjoon-hyun commented on PR #10: URL: https://github.com/apache/spark-kubernetes-operator/pull/10#issuecomment-2110695911 Thank you for updating. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
GideonPotok commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600041876 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## Review Comment: 0. Note, by the way that because we are relying on supportsBinaryEquality,

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600292083 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [SPARK-19426][SQL] Custom coalescer for Dataset [spark]

2024-05-14 Thread via GitHub
SubhamSinghal commented on PR #46541: URL: https://github.com/apache/spark/pull/46541#issuecomment-2110596696 @hvanhovell Coalesce does not enforce uniform data distribution across partitions. We would like to pass custom size based coalescer to have more uniform data distribution. This

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46574: [SPARK-48263] Collate function support for non UTF8_BINARY strings URL: https://github.com/apache/spark/pull/46574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46574: URL: https://github.com/apache/spark/pull/46574#issuecomment-2110557374 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46577: [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite URL: https://github.com/apache/spark/pull/46577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47301][SQL][TESTS][FOLLOWUP] Remove workaround for ParquetIOSuite [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46577: URL: https://github.com/apache/spark/pull/46577#issuecomment-2110553464 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46437: URL: https://github.com/apache/spark/pull/46437#issuecomment-2110549856 merged to master/3.5/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46437: [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects URL: https://github.com/apache/spark/pull/46437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48233][SQL][STREAMING] Tests for streaming on columns with non-default collations [spark]

2024-05-14 Thread via GitHub
jose-torres commented on code in PR #46247: URL: https://github.com/apache/spark/pull/46247#discussion_r1600227231 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala: ## @@ -1364,6 +1364,35 @@ class StreamingQuerySuite extends StreamTest with

Re: [PR] [SPARK-48233][SQL][STREAMING] Tests for streaming on columns with non-default collations [spark]

2024-05-14 Thread via GitHub
dbatomic commented on PR #46247: URL: https://github.com/apache/spark/pull/46247#issuecomment-2110406204 > Let's make clear the scope of tests we are adding here. I see the PR title is about "stateless" but you are also aware that deduplication is "stateful". While I agree that we probably

Re: [PR] [SPARK-48233][SQL][STREAMING] Tests for non-stateful streaming on columns with collations [spark]

2024-05-14 Thread via GitHub
dbatomic commented on code in PR #46247: URL: https://github.com/apache/spark/pull/46247#discussion_r1600128613 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala: ## @@ -1364,6 +1364,35 @@ class StreamingQuerySuite extends StreamTest with

Re: [PR] [SPARK-48233][SQL][STREAMING] Tests for non-stateful streaming on columns with collations [spark]

2024-05-14 Thread via GitHub
dbatomic commented on code in PR #46247: URL: https://github.com/apache/spark/pull/46247#discussion_r1600126986 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -484,6 +486,52 @@ class StreamingDeduplicationSuite extends

Re: [PR] [SPARK-48233][SQL][STREAMING] Tests for non-stateful streaming on columns with collations [spark]

2024-05-14 Thread via GitHub
dbatomic commented on code in PR #46247: URL: https://github.com/apache/spark/pull/46247#discussion_r1600124892 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -484,6 +486,52 @@ class StreamingDeduplicationSuite extends

Re: [PR] [SPARK-19426][SQL] Custom coalescer for Dataset [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on PR #46541: URL: https://github.com/apache/spark/pull/46541#issuecomment-2110340780 Can you walk me through the actual use case for this? Coalesce - historically - is incredibly hard to use for most end user, so before adding this I'd like to understand why. --

Re: [PR] [SPARK-48271][SQL] Support char/varchar in RowEncoder [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46575: URL: https://github.com/apache/spark/pull/46575#discussion_r1600102948 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala: ## @@ -209,7 +209,8 @@ object AgnosticEncoders { // Nullable leaf

[PR] [SPARK-48273] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-14 Thread via GitHub
nikolamand-db opened a new pull request, #46580: URL: https://github.com/apache/spark/pull/46580 ### What changes were proposed in this pull request? `PlanWithUnresolvedIdentifier` is rewritten later in analysis which causes rules like `SubstituteUnresolvedOrdinals`

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46288: URL: https://github.com/apache/spark/pull/46288#discussion_r1600020913 ## project/SparkBuild.scala: ## @@ -266,7 +266,7 @@ object SparkBuild extends PomBuild { .orElse(sys.props.get("java.home").map { p => new

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46288: URL: https://github.com/apache/spark/pull/46288#discussion_r1600020913 ## project/SparkBuild.scala: ## @@ -266,7 +266,7 @@ object SparkBuild extends PomBuild { .orElse(sys.props.get("java.home").map { p => new

[PR] [Only Test] Test unidocGenjavadocVersion 0.19 [spark]

2024-05-14 Thread via GitHub
panbingkun opened a new pull request, #46579: URL: https://github.com/apache/spark/pull/46579 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] Collation Expr walker [spark]

2024-05-14 Thread via GitHub
dbatomic opened a new pull request, #46578: URL: https://github.com/apache/spark/pull/46578 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599971309 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -1358,6 +1359,37 @@ def test_verify_col_name(self):

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599969786 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -844,6 +844,11 @@ def test_union_classmethod_usage(self): def test_isinstance_dataframe(self):

Re: [PR] [SPARK-47579][CORE][PART2] Migrate logInfo with variables to structured logging framework [spark]

2024-05-14 Thread via GitHub
zeotuan commented on code in PR #46494: URL: https://github.com/apache/spark/pull/46494#discussion_r1599968517 ## core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala: ## @@ -117,10 +117,11 @@ private[deploy] class

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599965345 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -484,3 +485,9 @@ message CreateResourceProfileCommandResult { // (Required)

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599960560 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -315,6 +315,12 @@ object SparkConnectService

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599956202 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -199,6 +200,17 @@ message AnalyzePlanRequest { // (Required) The logical plan to

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599953711 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -106,7 +106,7 @@ case class SessionHolder(userId: String,

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599948324 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -199,6 +200,17 @@ message AnalyzePlanRequest { // (Required) The logical plan to

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-14 Thread via GitHub
LuciferYang commented on code in PR #46288: URL: https://github.com/apache/spark/pull/46288#discussion_r1599902760 ## project/SparkBuild.scala: ## @@ -266,7 +266,7 @@ object SparkBuild extends PomBuild { .orElse(sys.props.get("java.home").map { p => new

Re: [PR] [SPARK-48251][BUILD] Disable `maven local cache` on GA's step `MIMA test` [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46551: URL: https://github.com/apache/spark/pull/46551#discussion_r1599874698 ## project/SparkBuild.scala: ## @@ -257,6 +257,7 @@ object SparkBuild extends PomBuild { val noLintOnCompile = sys.env.contains("NOLINT_ON_COMPILE") &&

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599873181 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -102,20 +102,30 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
uros-db commented on PR #46574: URL: https://github.com/apache/spark/pull/46574#issuecomment-2109979646 btw, I'd say that this PR _does_ introduce some user-facing changes - so I'd update the PR description to reflect this with more details -- This is an automated message from the Apache

Re: [PR] [SPARK-47301][SQL][TESTS] Remove workaround for ParquetIOSuite [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46577: URL: https://github.com/apache/spark/pull/46577#issuecomment-2109940241 cc @cloud-fan @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48272][SQL][PYTHON][CONNECT] Add function `timestamp_diff` [spark]

2024-05-14 Thread via GitHub
zhengruifeng commented on code in PR #46576: URL: https://github.com/apache/spark/pull/46576#discussion_r1599831662 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1823,6 +1823,11 @@ class SparkConnectPlanner(

[PR] [SPARK-48272][SQL][PYTHON][CONNECT] Add function `timestamp_diff` [spark]

2024-05-14 Thread via GitHub
zhengruifeng opened a new pull request, #46576: URL: https://github.com/apache/spark/pull/46576 ### What changes were proposed in this pull request? Add function `timestamp_diff`, by reusing existing proto

Re: [PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #45403: URL: https://github.com/apache/spark/pull/45403#issuecomment-2109888017 > +1 @cloud-fan > > Since the LOCs have been moved to `ParquetIOWithoutOutputCommitCoordinationSuite`, we need a followup for reverting Let me to do it. -- This is an

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599772157 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [SPARK-48271][SQL] Support char/varchar in RowEncoder [spark]

2024-05-14 Thread via GitHub
yaooqinn commented on PR #46575: URL: https://github.com/apache/spark/pull/46575#issuecomment-2109814895 > Not really. RowEncoder is a private API. I guess it allows char and varchar in UDF APIs -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599736471 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599689228 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
nebojsa-db commented on PR #46574: URL: https://github.com/apache/spark/pull/46574#issuecomment-2109753354 Please take a look :) @uros-db @stefankandic @nikolamand-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46741][SQL] Cache Table with CTE won't work [spark]

2024-05-14 Thread via GitHub
AngersZh commented on PR #44767: URL: https://github.com/apache/spark/pull/44767#issuecomment-2109748573 @cloud-fan Changed follow the discussion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
nebojsa-db commented on code in PR #46574: URL: https://github.com/apache/spark/pull/46574#discussion_r1599714895 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -71,6 +71,13 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-48263] Collate function support for non UTF8_BINARY strings [spark]

2024-05-14 Thread via GitHub
nebojsa-db commented on code in PR #46574: URL: https://github.com/apache/spark/pull/46574#discussion_r1599705816 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -57,14 +57,14 @@ object CollateExpressionBuilder extends

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599701817 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -959,6 +959,37 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46523: [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec URL: https://github.com/apache/spark/pull/46523 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46523: URL: https://github.com/apache/spark/pull/46523#issuecomment-2109727034 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599697850 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1275,6 +1275,38 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599697850 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1275,6 +1275,38 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
nebojsa-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599695088 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1275,6 +1275,38 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [SPARK-48271][SQL] Support char/varchar in RowEncoder [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46575: URL: https://github.com/apache/spark/pull/46575#issuecomment-2109719920 cc @gengliangwang @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-48271][SQL] Support char/varchar in RowEncoder [spark]

2024-05-14 Thread via GitHub
cloud-fan opened a new pull request, #46575: URL: https://github.com/apache/spark/pull/46575 ### What changes were proposed in this pull request? Today we can't create `RowEncoder` with char/varchar data type, because we believe this can't happen. Spark will turn char/varchar

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599693830 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599689228 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2109705338 > https://github.com/com-lihaoyi/Ammonite/releases/tag/3.0.0-M2 > > 3.0.0-M2 released ~ @panbingkun Thanks~ ❤️ Updated. -- This is an automated message from the

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-14 Thread via GitHub
LuciferYang commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2109686654 https://github.com/com-lihaoyi/Ammonite/releases/tag/3.0.0-M2 3.0.0-M2 released ~ @panbingkun -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48263] Collate function not working when default collation config set [spark]

2024-05-14 Thread via GitHub
mihailom-db commented on code in PR #46574: URL: https://github.com/apache/spark/pull/46574#discussion_r1599646376 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -71,6 +71,13 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-14 Thread via GitHub
nebojsa-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1599656074 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -1509,12 +1515,62 @@ public boolean semanticEquals(final UTF8String other, int

Re: [PR] [SPARK-48263] Collate function not working when default collation config set [spark]

2024-05-14 Thread via GitHub
mihailom-db commented on code in PR #46574: URL: https://github.com/apache/spark/pull/46574#discussion_r1599651700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -57,14 +57,14 @@ object CollateExpressionBuilder extends

Re: [PR] [SPARK-48263] Collate function not working when default collation config set [spark]

2024-05-14 Thread via GitHub
mihailom-db commented on code in PR #46574: URL: https://github.com/apache/spark/pull/46574#discussion_r1599646376 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -71,6 +71,13 @@ class CollationSuite extends DatasourceV2SQLBase with

[PR] [SPARK-48263] Collate function not working when default collation config set [spark]

2024-05-14 Thread via GitHub
nebojsa-db opened a new pull request, #46574: URL: https://github.com/apache/spark/pull/46574 ### What changes were proposed in this pull request? collate("xx", "") does not work when there is a config for default collation set which configures non UTF8_BINARY collation as default.

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
HyukjinKwon commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599621151 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,6 +137,14 @@ def __init__( # by __repr__ and _repr_html_ while eager evaluation opens.

Re: [PR] [SPARK-48251][BUILD] Disable `maven local cache` on GA's step `MIMA test` [spark]

2024-05-14 Thread via GitHub
LuciferYang commented on code in PR #46551: URL: https://github.com/apache/spark/pull/46551#discussion_r1599618314 ## project/SparkBuild.scala: ## @@ -257,6 +257,7 @@ object SparkBuild extends PomBuild { val noLintOnCompile = sys.env.contains("NOLINT_ON_COMPILE") &&

Re: [PR] [DO-NOT-MERGE][SPARK-48258][PYTHON][CONNECT] Checkpoint and localCheckpoint in Spark Connect [spark]

2024-05-14 Thread via GitHub
zhengruifeng commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1599610542 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,6 +137,14 @@ def __init__( # by __repr__ and _repr_html_ while eager evaluation opens.

Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2109554457 Ready for it, @gengliangwang @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2109554338 > Or how about having these modules depend on the `common/utils` module? `common/utils` doesn't seem to be a heavyweight module, in this way, the existing cases can be fixed.

Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2109543568 sh dev/lint-java ``` Using `mvn` from path: /Users/panbingkun/Developer/infra/maven/maven/bin/mvn -e Checkstyle checks failed at following occurrences: [ERROR]

Re: [PR] [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates [spark]

2024-05-14 Thread via GitHub
zml1206 commented on PR #44975: URL: https://github.com/apache/spark/pull/44975#issuecomment-2109527145 Thank you all for review. @cloud-fan @kelvinjian-db @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #44975: [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates URL: https://github.com/apache/spark/pull/44975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #44975: URL: https://github.com/apache/spark/pull/44975#issuecomment-2109516986 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48269][DOCS][TESTS] DB2: Document Mapping Spark SQL Data Types from DB2 and add tests [spark]

2024-05-14 Thread via GitHub
yaooqinn opened a new pull request, #46572: URL: https://github.com/apache/spark/pull/46572 ### What changes were proposed in this pull request? Document Mapping Spark SQL Data Types from DB2 and add tests ### Why are the changes needed? improvement for docs

Re: [PR] [SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured logging framework [spark]

2024-05-14 Thread via GitHub
panbingkun commented on PR #46527: URL: https://github.com/apache/spark/pull/46527#issuecomment-2109471092 > Let's continue on this one :) Ready for it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47972][SQL] Restrict CAST expression for collations [spark]

2024-05-14 Thread via GitHub
mihailom-db commented on code in PR #46474: URL: https://github.com/apache/spark/pull/46474#discussion_r1599507122 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -851,6 +852,30 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [SPARK-47972][SQL] Restrict CAST expression for collations [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46474: URL: https://github.com/apache/spark/pull/46474#discussion_r1599500946 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -851,6 +852,30 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [DO-NOT-MERGE][SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [spark]

2024-05-14 Thread via GitHub
mridulm commented on code in PR #46571: URL: https://github.com/apache/spark/pull/46571#discussion_r1599499818 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -373,7 +373,7 @@ class SparkContext(config: SparkConf) extends Logging { private[spark] def

Re: [PR] [SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured logging framework [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46527: URL: https://github.com/apache/spark/pull/46527#discussion_r1599491825 ## mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala: ## @@ -129,9 +130,9 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599471162 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1275,6 +1275,38 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [SPARK-48215][SQL] Extending support for collated strings on date_format expression [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46561: URL: https://github.com/apache/spark/pull/46561#discussion_r1599470164 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1275,6 +1275,38 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599455730 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48267][SS] Regression e2e test with SPARK-47305 [spark]

2024-05-14 Thread via GitHub
HeartSaVioR commented on PR #46569: URL: https://github.com/apache/spark/pull/46569#issuecomment-2109409434 Also cherry-picked to 3.5 as well as it's a clean cherry-pick. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48267][SS] Regression e2e test with SPARK-47305 [spark]

2024-05-14 Thread via GitHub
HeartSaVioR closed pull request #46569: [SPARK-48267][SS] Regression e2e test with SPARK-47305 URL: https://github.com/apache/spark/pull/46569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured logging framework [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46527: URL: https://github.com/apache/spark/pull/46527#discussion_r1599456251 ## mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala: ## @@ -179,8 +179,8 @@ class LinearSVC @Since("2.2.0") ( maxBlockSizeInMB)

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599447728 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -102,20 +102,30 @@ public void testContains() throws SparkException {

Re: [PR] [SPARK-48267][SS] Regression e2e test with SPARK-47305 [spark]

2024-05-14 Thread via GitHub
HeartSaVioR commented on PR #46569: URL: https://github.com/apache/spark/pull/46569#issuecomment-2109390060 Thanks @viirya for quick reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured logging framework [spark]

2024-05-14 Thread via GitHub
panbingkun commented on code in PR #46527: URL: https://github.com/apache/spark/pull/46527#discussion_r1599449547 ## mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala: ## @@ -451,8 +451,8 @@ class KMeans @Since("1.5.0") ( private def trainWithBlock(dataset:

Re: [PR] [SPARK-48157][SQL] Add collation support for CSV expressions [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46504: [SPARK-48157][SQL] Add collation support for CSV expressions URL: https://github.com/apache/spark/pull/46504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48157][SQL] Add collation support for CSV expressions [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46504: URL: https://github.com/apache/spark/pull/46504#issuecomment-2109363247 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]

2024-05-14 Thread via GitHub
cloud-fan closed pull request #46503: [SPARK-48229][SQL] Add collation support for inputFile expressions URL: https://github.com/apache/spark/pull/46503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]

2024-05-14 Thread via GitHub
cloud-fan commented on PR #46503: URL: https://github.com/apache/spark/pull/46503#issuecomment-2109358482 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48265][SQL] Infer window group limit batch should do constant folding [spark]

2024-05-13 Thread via GitHub
beliefer commented on PR #46568: URL: https://github.com/apache/spark/pull/46568#issuecomment-2109342800 LGTM later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48251][BUILD] Disable `maven local cache` on GA's step `MIMA test` [spark]

2024-05-13 Thread via GitHub
panbingkun commented on code in PR #46551: URL: https://github.com/apache/spark/pull/46551#discussion_r1599416657 ## project/SparkBuild.scala: ## @@ -273,10 +273,9 @@ object SparkBuild extends PomBuild { // Google Mirror of Maven Central, placed first so that it's used

[PR] [SPARK-46707][SQL][FOLLOWUP] Push down throwable predicate through aggregates [spark]

2024-05-13 Thread via GitHub
zml1206 opened a new pull request, #44975: URL: https://github.com/apache/spark/pull/44975 ### What changes were proposed in this pull request? Push down throwable predicate through aggregates and add ut for "can't push down nondeterministic filter through aggregate". ### Why are

Re: [PR] [SPARK-48266][CONNECT] Move package object `org.apache.spark.sql.connect.dsl` to test directory [spark]

2024-05-13 Thread via GitHub
LuciferYang commented on PR #46567: URL: https://github.com/apache/spark/pull/46567#issuecomment-2109338837 Thanks @HyukjinKwon @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48265][SQL] Infer window group limit batch should do constant folding [spark]

2024-05-13 Thread via GitHub
cloud-fan closed pull request #46568: [SPARK-48265][SQL] Infer window group limit batch should do constant folding URL: https://github.com/apache/spark/pull/46568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48265][SQL] Infer window group limit batch should do constant folding [spark]

2024-05-13 Thread via GitHub
cloud-fan commented on PR #46568: URL: https://github.com/apache/spark/pull/46568#issuecomment-2109331575 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

<    1   2   3   4   5   6   7   8   9   10   >