[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199562650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] panbingkun opened a new pull request, #41241: [SPARK-43597][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0017

2023-05-19 Thread via GitHub
panbingkun opened a new pull request, #41241: URL: https://github.com/apache/spark/pull/41241 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0017. ### Why are the changes needed? The changes improve the

[GitHub] [spark] wangyum commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread via GitHub
wangyum commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1555428658 @Kimahriman Do you have a way to reproduce? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516457 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -164,9 +194,34 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516321 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala: ## @@ -177,6 +185,33 @@ class RocksDBStateStoreSuite

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516278 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -280,34 +342,34 @@ class RocksDBFileManager( val

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515952 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515925 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -205,19 +229,39 @@ class RocksDBFileManager(

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515899 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515734 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -129,17 +140,36 @@ class RocksDB( * Note that this will copy

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -56,6 +56,15 @@ class RocksDB( hadoopConf: Configuration =

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199513814 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -280,34 +342,34 @@ class RocksDBFileManager( val

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser

2023-05-19 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1199511465 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,42 @@ class AstBuilder extends

[GitHub] [spark] github-actions[bot] commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

2023-05-19 Thread via GitHub
github-actions[bot] commented on PR #39825: URL: https://github.com/apache/spark/pull/39825#issuecomment-1555391404 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39861: [WIP][SPARK-42291] Enable dropping of columns for non V2 tables

2023-05-19 Thread via GitHub
github-actions[bot] closed pull request #39861: [WIP][SPARK-42291] Enable dropping of columns for non V2 tables URL: https://github.com/apache/spark/pull/39861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #39838: [SPARK-42270][SQL] Improve sort merge join stability with large stream side

2023-05-19 Thread via GitHub
github-actions[bot] closed pull request #39838: [SPARK-42270][SQL] Improve sort merge join stability with large stream side URL: https://github.com/apache/spark/pull/39838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] ueshin opened a new pull request, #41240: [SPARK-43545][SQL][PYTHON] Support nested timestamp type

2023-05-19 Thread via GitHub
ueshin opened a new pull request, #41240: URL: https://github.com/apache/spark/pull/41240 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

[GitHub] [spark] rangadi commented on pull request #40959: [SPARK-43511][CONNECT][SS]Implemented MapGroupsWithState and FlatMapGroupsWithState APIs for Spark Connect

2023-05-19 Thread via GitHub
rangadi commented on PR #40959: URL: https://github.com/apache/spark/pull/40959#issuecomment-1555381692 > Not tested yet, will perform the test when I'm back. Is this tested yet? Could you update the PR description? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] ericm-db commented on pull request #41205: [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with same

2023-05-19 Thread via GitHub
ericm-db commented on PR #41205: URL: https://github.com/apache/spark/pull/41205#issuecomment-1555324959 Thanks for the review! I've made the changes, and I think it's ready to merge now @MaxGekk @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] xinrong-meng commented on pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng commented on PR #41147: URL: https://github.com/apache/spark/pull/41147#issuecomment-1555303500 Please free to leave comments if any, I'll adjust them in follow-ups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] xinrong-meng commented on pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng commented on PR #41147: URL: https://github.com/apache/spark/pull/41147#issuecomment-1555302733 Merged to master, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng closed pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng closed pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF URL: https://github.com/apache/spark/pull/41147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk closed pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003

2023-05-19 Thread via GitHub
MaxGekk closed pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003 URL: https://github.com/apache/spark/pull/41200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dtenedor commented on pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-19 Thread via GitHub
dtenedor commented on PR #41007: URL: https://github.com/apache/spark/pull/41007#issuecomment-1555198846 > @dtenedor There are still 30 multipartIdentifier usages that do NOT support IDENTIFIER() notation. So we would trade mechanical churn in the grammar for code changes in

[GitHub] [spark] MaxGekk commented on pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003

2023-05-19 Thread via GitHub
MaxGekk commented on PR #41200: URL: https://github.com/apache/spark/pull/41200#issuecomment-1555176941 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Kimahriman commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread via GitHub
Kimahriman commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1555080586 Actually hit a new issue related to this after finally being able to test out 3.4 from the Delta release. Because of the bump to slf4j 2, it seems `log4j-slf4j2-impl` doesn't get

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1555078951 Thank you, @anigos . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] kaiubiferreira closed pull request #41239: Span array function

2023-05-19 Thread via GitHub
kaiubiferreira closed pull request #41239: Span array function URL: https://github.com/apache/spark/pull/41239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] kaiubiferreira opened a new pull request, #41239: Span array function

2023-05-19 Thread via GitHub
kaiubiferreira opened a new pull request, #41239: URL: https://github.com/apache/spark/pull/41239 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] Fokko opened a new pull request, #41238: [SPARK-43594][SQL] Add LocalDateTime to anyToMicros

2023-05-19 Thread via GitHub
Fokko opened a new pull request, #41238: URL: https://github.com/apache/spark/pull/41238 ### What changes were proposed in this pull request? Small change to `anyToMicros` to also accept `LocalDateTime` that's being returned when working with `TIMESTAMP_NTZ`. This simplifies

[GitHub] [spark] dongjoon-hyun commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1555032827 Thank you, @panbingkun , @LuciferYang , @zhenlineo ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun closed pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins URL: https://github.com/apache/spark/pull/41226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] shuwang21 commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-19 Thread via GitHub
shuwang21 commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1555014603 > > Do you think when `spark.network.crypto.saslFallback=true` and L95 from `AuthRpcHandler.java`. > > ``` > > saslHandler = new SaslRpcHandler(conf, channel, null,

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199195571 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -123,6 +125,7 @@ class RocksDBFileManager(

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199194346 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] otterc commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-19 Thread via GitHub
otterc commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1554906573 > Do you think when `spark.network.crypto.saslFallback=true` and L95 from `AuthRpcHandler.java`. > > ``` > saslHandler = new SaslRpcHandler(conf, channel, null, secretKeyHolder);

[GitHub] [spark] tgravescs commented on a diff in pull request #41173: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers

2023-05-19 Thread via GitHub
tgravescs commented on code in PR #41173: URL: https://github.com/apache/spark/pull/41173#discussion_r1199162022 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala: ## @@ -780,7 +771,7 @@ private[yarn] class YarnAllocator(

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] anigos commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
anigos commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554814355 This was small but much needed as it confuses developers. Thanks @dongjoon-hyun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] zhenlineo commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
zhenlineo commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554804591 I checked locally. MiMa 1.1.2 can find errors about missing private classes e.g. `private[sql] object Dataset` ``` object org.apache.spark.sql.Dataset does not have a

[GitHub] [spark] dongjoon-hyun closed pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString` URL: https://github.com/apache/spark/pull/41234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41234: URL: https://github.com/apache/spark/pull/41234#issuecomment-1554761464 Thank you again, @LuciferYang ! Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] srowen closed pull request #40398: [MINOR][DOCS] Update `translate` docblock

2023-05-19 Thread via GitHub
srowen closed pull request #40398: [MINOR][DOCS] Update `translate` docblock URL: https://github.com/apache/spark/pull/40398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wankunde opened a new pull request, #41237: [SPARK-43593][SQL] Support the minimum number of range shuffle partitions

2023-05-19 Thread via GitHub
wankunde opened a new pull request, #41237: URL: https://github.com/apache/spark/pull/41237 ### What changes were proposed in this pull request? If there are few distinct values in the RangePartitioner, there will be very few partitions that could be very large. We can

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] srowen commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread via GitHub
srowen commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1554550499 Seems reasonable then. Let's just get the tests to run again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198915674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction(

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198914316 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction(

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198880920 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -149,9 +149,13 @@ class EquivalentExpressions( //

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198873238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends

[GitHub] [spark] panbingkun opened a new pull request, #41236: [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013

2023-05-19 Thread via GitHub
panbingkun opened a new pull request, #41236: URL: https://github.com/apache/spark/pull/41236 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0013. ### Why are the changes needed? The changes improve the

[GitHub] [spark] pan3793 commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2023-05-19 Thread via GitHub
pan3793 commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1554421469 cc @yaooqinn @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2023-05-19 Thread via GitHub
pan3793 commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1554409912 IMO we need to partially backport this patch to branch-3.3. The base64 function behavior changed since SPARK-37820 (3.3.0), causes some queries, e.g. `select unbase64("abcs==")`,

[GitHub] [spark] LuciferYang commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-19 Thread via GitHub
LuciferYang commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1198789118 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -45,12 +53,36 @@ object functions { messageName: String,

[GitHub] [spark] LuciferYang commented on pull request #40925: [SPARK-43246][BUILD] Ignore `privateClasses` and `privateMembers` from connect mima check as default

2023-05-19 Thread via GitHub
LuciferYang commented on PR #40925: URL: https://github.com/apache/spark/pull/40925#issuecomment-1554342893 > Can you make sure we don't exclude too many cases? Will double check this later -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang opened a new pull request, #41235: [SPARK-43590][CONNECT] Make `connect-jvm-client-mima-check` to support mima check with `protobuf` module

2023-05-19 Thread via GitHub
LuciferYang opened a new pull request, #41235: URL: https://github.com/apache/spark/pull/41235 ### What changes were proposed in this pull request? This pr make `connect-jvm-client-mima-check` to support mima check between `connect-client-jvm` and `protobuf` module. ### Why

[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

2023-05-19 Thread via GitHub
beliefer commented on PR #40782: URL: https://github.com/apache/spark/pull/40782#issuecomment-1554336985 @ueshin @hvanhovell Recently, https://github.com/apache/spark/pull/41064 added the rowCount statistics to `LocalRelation`. In this PR, @ueshin also suggested to add the row count as

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
HeartSaVioR commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198779061 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
HeartSaVioR commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198581431 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -56,6 +56,15 @@ class RocksDB( hadoopConf: Configuration = new

[GitHub] [spark] LuciferYang commented on pull request #41233: [SPARK-42958][CONNECT][FOLLOWUP] Correct the parameter passed to `checkMiMaCompatibilityWithAvroModule` to `avroJar`

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41233: URL: https://github.com/apache/spark/pull/41233#issuecomment-1554318417 cc @HyukjinKwon FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #41234: URL: https://github.com/apache/spark/pull/41234 ### What changes were proposed in this pull request? This is a backporting of #41232 This PR aims to fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

[GitHub] [spark] LuciferYang opened a new pull request, #41233: [SPARK-42958][CONNECT][FOLLOWUP] Correct the parameter passed to `checkMiMaCompatibilityWithAvroModule` as `avroJar`

2023-05-19 Thread via GitHub
LuciferYang opened a new pull request, #41233: URL: https://github.com/apache/spark/pull/41233 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554293573 Merged to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString` URL: https://github.com/apache/spark/pull/41232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41231: URL: https://github.com/apache/spark/pull/41231#issuecomment-1554277884 Thanks @dongjoon-hyun @yaooqinn @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41231: URL: https://github.com/apache/spark/pull/41231#issuecomment-1554275933 Thank you all! Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5 URL: https://github.com/apache/spark/pull/41231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #41212: [SPARK-43573][BUILD] Make SparkBuilder could config the heap size of test JVM.

2023-05-19 Thread via GitHub
beliefer commented on PR #41212: URL: https://github.com/apache/spark/pull/41212#issuecomment-1554271802 > Do you think you can make this new test environment variable works for both Maven and SBT, @beliefer ? AFAIK, `SparkBuilder` only used for SBT. -- This is an automated

[GitHub] [spark] peter-toth commented on pull request #41119: [SPARK-42551][SQL] Support more subexpression elimination cases

2023-05-19 Thread via GitHub
peter-toth commented on PR #41119: URL: https://github.com/apache/spark/pull/41119#issuecomment-1554266843 > Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks Hi @wankunde, thanks for pinging me. I can take a look at this PR sometime next week... -- This is an

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554208491 Oh, +1 for @LuciferYang 's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554205847 cc@zhenlineo I remember you mentioned a bug in mima 1.1.1: `where the MiMa will not be able to check the class methods if the object is marked private`, so Spark have been using

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41176: [SPARK-43516] [ML] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator

2023-05-19 Thread via GitHub
WeichenXu123 commented on code in PR #41176: URL: https://github.com/apache/spark/pull/41176#discussion_r1198664831 ## python/pyspark/mlv2/feature.py: ## @@ -0,0 +1,127 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] MaxGekk closed pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-19 Thread via GitHub
MaxGekk closed pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206] URL: https://github.com/apache/spark/pull/41020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-19 Thread via GitHub
MaxGekk commented on PR #41020: URL: https://github.com/apache/spark/pull/41020#issuecomment-1554182276 +1, LGTM. Merging to master. Thank you, @imback82. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554166101 Thank you so much, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
zhengruifeng commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554162004 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
zhengruifeng closed pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization URL: https://github.com/apache/spark/pull/41188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198625757 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -362,6 +423,7 @@ class RocksDBFileManager( }

[GitHub] [spark] MaxGekk commented on pull request #41205: [WIP] [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with

2023-05-19 Thread via GitHub
MaxGekk commented on PR #41205: URL: https://github.com/apache/spark/pull/41205#issuecomment-1554129823 @ericm-db Could you allow GitHub actions in your fork and re-trigger GAs, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] MaxGekk commented on a diff in pull request #41205: [WIP] [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming

2023-05-19 Thread via GitHub
MaxGekk commented on code in PR #41205: URL: https://github.com/apache/spark/pull/41205#discussion_r1198618103 ## core/src/main/resources/error/error-classes.json: ## @@ -202,6 +202,13 @@ "Another instance of this query was just started by a concurrent session." ]

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198620563 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -334,25 +405,59 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198619715 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +343,58 @@ class RocksDB( */ def commit(): Long =

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198619375 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -164,9 +194,34 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] justaparth commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
justaparth commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554108357 cc @HyukjinKwon would you mind taking a look and merging this one? thanks  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun commented on pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003

2023-05-19 Thread via GitHub
panbingkun commented on PR #41200: URL: https://github.com/apache/spark/pull/41200#issuecomment-1554099340 > @panbingkun Could you wrap `op` by `toSQLStmt()` at: > >

[GitHub] [spark] mrmadira commented on pull request #39474: [SPARK-41958][CORE] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-05-19 Thread via GitHub
mrmadira commented on PR #39474: URL: https://github.com/apache/spark/pull/39474#issuecomment-1554098087 Hi - Is it possible to get a backporting to Spark 3.3 for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] Hisoka-X commented on pull request #41156: [SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer

2023-05-19 Thread via GitHub
Hisoka-X commented on PR #41156: URL: https://github.com/apache/spark/pull/41156#issuecomment-1554096090 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554094150 Could you review this PR, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rangadi commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
rangadi commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554081799 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #41232: URL: https://github.com/apache/spark/pull/41232 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun commented on pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41229: URL: https://github.com/apache/spark/pull/41229#issuecomment-1554075901 Merged to master/3.4/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
MaxGekk commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198595578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] dongjoon-hyun closed pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM URL: https://github.com/apache/spark/pull/41229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

  1   2   >