[GitHub] [spark] yaooqinn commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-19 Thread via GitHub
yaooqinn commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1554062154 The Hadoop configurations can be propagated after https://github.com/apache/spark/pull/27735. And putting and locating extra configuration files in SPARK_HOME/conf is also a suggested

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #41232: URL: https://github.com/apache/spark/pull/41232 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] zhengruifeng closed pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
zhengruifeng closed pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization URL: https://github.com/apache/spark/pull/41188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
zhengruifeng commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554162004 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198576125 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -149,10 +179,10 @@ class RocksDB( } else {

[GitHub] [spark] rangadi commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
rangadi commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554081799 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] panbingkun commented on pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003

2023-05-19 Thread via GitHub
panbingkun commented on PR #41200: URL: https://github.com/apache/spark/pull/41200#issuecomment-1554099340 > @panbingkun Could you wrap `op` by `toSQLStmt()` at: > >

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198619375 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -164,9 +194,34 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198619715 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +343,58 @@ class RocksDB( */ def commit(): Long =

[GitHub] [spark] MaxGekk commented on a diff in pull request #41205: [WIP] [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming

2023-05-19 Thread via GitHub
MaxGekk commented on code in PR #41205: URL: https://github.com/apache/spark/pull/41205#discussion_r1198618103 ## core/src/main/resources/error/error-classes.json: ## @@ -202,6 +202,13 @@ "Another instance of this query was just started by a concurrent session." ]

[GitHub] [spark] MaxGekk closed pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-19 Thread via GitHub
MaxGekk closed pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206] URL: https://github.com/apache/spark/pull/41020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41229: URL: https://github.com/apache/spark/pull/41229#issuecomment-1554052891 Thank you, @LuciferYang ! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #41227: [CONNECT] Refactor test case in UserDefinedFunctionE2ETestSuite to reproduce NPE

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41227: URL: https://github.com/apache/spark/pull/41227#issuecomment-1554067227 Expected NPE occurred: ``` Warning: Unable to serialize throwable of type io.grpc.StatusRuntimeException for TestFailed(Ordinal(0, 271),INTERNAL: Job aborted due to stage

[GitHub] [spark] dongjoon-hyun closed pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM URL: https://github.com/apache/spark/pull/41229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] justaparth commented on pull request #41188: [SPARK-43361][PROTOBUF] update documentation for errors related to enum serialization

2023-05-19 Thread via GitHub
justaparth commented on PR #41188: URL: https://github.com/apache/spark/pull/41188#issuecomment-1554108357 cc @HyukjinKwon would you mind taking a look and merging this one? thanks  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198620563 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -334,25 +405,59 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] dongjoon-hyun commented on pull request #41229: [SPARK-43587][CORE][TESTS] Run `HealthTrackerIntegrationSuite` in a dedicated JVM

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41229: URL: https://github.com/apache/spark/pull/41229#issuecomment-1554075901 Merged to master/3.4/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
MaxGekk commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198595578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554094150 Could you review this PR, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554166101 Thank you so much, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198577882 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala: ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-19 Thread via GitHub
cloud-fan commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1198584893 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -368,7 +369,7 @@ class AstBuilder extends

[GitHub] [spark] wankunde commented on pull request #41119: [SPARK-42551][SQL] Support more subexpression elimination cases

2023-05-19 Thread via GitHub
wankunde commented on PR #41119: URL: https://github.com/apache/spark/pull/41119#issuecomment-1554063667 Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] mrmadira commented on pull request #39474: [SPARK-41958][CORE] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-05-19 Thread via GitHub
mrmadira commented on PR #39474: URL: https://github.com/apache/spark/pull/39474#issuecomment-1554098087 Hi - Is it possible to get a backporting to Spark 3.3 for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] Hisoka-X commented on pull request #41156: [SPARK-40129][SQL] Fix Decimal multiply can produce the wrong answer

2023-05-19 Thread via GitHub
Hisoka-X commented on PR #41156: URL: https://github.com/apache/spark/pull/41156#issuecomment-1554096090 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk commented on pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-19 Thread via GitHub
MaxGekk commented on PR #41020: URL: https://github.com/apache/spark/pull/41020#issuecomment-1554182276 +1, LGTM. Merging to master. Thank you, @imback82. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554208491 Oh, +1 for @LuciferYang 's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-19 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1198579470 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -368,7 +369,7 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef]

[GitHub] [spark] MaxGekk commented on pull request #41205: [WIP] [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with

2023-05-19 Thread via GitHub
MaxGekk commented on PR #41205: URL: https://github.com/apache/spark/pull/41205#issuecomment-1554129823 @ericm-db Could you allow GitHub actions in your fork and re-trigger GAs, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41176: [SPARK-43516] [ML] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator

2023-05-19 Thread via GitHub
WeichenXu123 commented on code in PR #41176: URL: https://github.com/apache/spark/pull/41176#discussion_r1198664831 ## python/pyspark/mlv2/feature.py: ## @@ -0,0 +1,127 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] LuciferYang commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554205847 cc@zhenlineo I remember you mentioned a bug in mima 1.1.1: `where the MiMa will not be able to check the class methods if the object is marked private`, so Spark have been using

[GitHub] [spark] rangadi commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-19 Thread via GitHub
rangadi commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1198582814 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -45,12 +53,36 @@ object functions { messageName: String,

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198625757 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -362,6 +423,7 @@ class RocksDBFileManager( }

[GitHub] [spark] zhenlineo commented on pull request #41226: [SPARK-43584][BUILD] Update `sbt-assembly`, `sbt-revolver`, `sbt-mima-plugin` plugins

2023-05-19 Thread via GitHub
zhenlineo commented on PR #41226: URL: https://github.com/apache/spark/pull/41226#issuecomment-1554804591 I checked locally. MiMa 1.1.2 can find errors about missing private classes e.g. `private[sql] object Dataset` ``` object org.apache.spark.sql.Dataset does not have a

[GitHub] [spark] anigos commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
anigos commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554814355 This was small but much needed as it confuses developers. Thanks @dongjoon-hyun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] dongjoon-hyun commented on pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41234: URL: https://github.com/apache/spark/pull/41234#issuecomment-1554761464 Thank you again, @LuciferYang ! Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString` URL: https://github.com/apache/spark/pull/41234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] tgravescs commented on a diff in pull request #41173: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers

2023-05-19 Thread via GitHub
tgravescs commented on code in PR #41173: URL: https://github.com/apache/spark/pull/41173#discussion_r1199162022 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala: ## @@ -780,7 +771,7 @@ private[yarn] class YarnAllocator(

[GitHub] [spark] otterc commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-19 Thread via GitHub
otterc commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1554906573 > Do you think when `spark.network.crypto.saslFallback=true` and L95 from `AuthRpcHandler.java`. > > ``` > saslHandler = new SaslRpcHandler(conf, channel, null, secretKeyHolder);

[GitHub] [spark] wankunde opened a new pull request, #41237: [SPARK-43593][SQL] Support the minimum number of range shuffle partitions

2023-05-19 Thread via GitHub
wankunde opened a new pull request, #41237: URL: https://github.com/apache/spark/pull/41237 ### What changes were proposed in this pull request? If there are few distinct values in the RangePartitioner, there will be very few partitions that could be very large. We can

[GitHub] [spark] srowen closed pull request #40398: [MINOR][DOCS] Update `translate` docblock

2023-05-19 Thread via GitHub
srowen closed pull request #40398: [MINOR][DOCS] Update `translate` docblock URL: https://github.com/apache/spark/pull/40398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41231: URL: https://github.com/apache/spark/pull/41231#issuecomment-1554277884 Thanks @dongjoon-hyun @yaooqinn @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198873238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends

[GitHub] [spark] panbingkun opened a new pull request, #41236: [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013

2023-05-19 Thread via GitHub
panbingkun opened a new pull request, #41236: URL: https://github.com/apache/spark/pull/41236 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0013. ### Why are the changes needed? The changes improve the

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198914316 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction(

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] dongjoon-hyun commented on pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41232: URL: https://github.com/apache/spark/pull/41232#issuecomment-1554293573 Merged to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
HeartSaVioR commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198779061 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] pan3793 commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2023-05-19 Thread via GitHub
pan3793 commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1554421469 cc @yaooqinn @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] peter-toth commented on pull request #41119: [SPARK-42551][SQL] Support more subexpression elimination cases

2023-05-19 Thread via GitHub
peter-toth commented on PR #41119: URL: https://github.com/apache/spark/pull/41119#issuecomment-1554266843 > Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks Hi @wankunde, thanks for pinging me. I can take a look at this PR sometime next week... -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
dongjoon-hyun commented on PR #41231: URL: https://github.com/apache/spark/pull/41231#issuecomment-1554275933 Thank you all! Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang opened a new pull request, #41233: [SPARK-42958][CONNECT][FOLLOWUP] Correct the parameter passed to `checkMiMaCompatibilityWithAvroModule` as `avroJar`

2023-05-19 Thread via GitHub
LuciferYang opened a new pull request, #41233: URL: https://github.com/apache/spark/pull/41233 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] LuciferYang commented on pull request #41233: [SPARK-42958][CONNECT][FOLLOWUP] Correct the parameter passed to `checkMiMaCompatibilityWithAvroModule` to `avroJar`

2023-05-19 Thread via GitHub
LuciferYang commented on PR #41233: URL: https://github.com/apache/spark/pull/41233#issuecomment-1554318417 cc @HyukjinKwon FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang opened a new pull request, #41235: [SPARK-43590][CONNECT] Make `connect-jvm-client-mima-check` to support mima check with `protobuf` module

2023-05-19 Thread via GitHub
LuciferYang opened a new pull request, #41235: URL: https://github.com/apache/spark/pull/41235 ### What changes were proposed in this pull request? This pr make `connect-jvm-client-mima-check` to support mima check between `connect-client-jvm` and `protobuf` module. ### Why

[GitHub] [spark] panbingkun commented on a diff in pull request #41214: [SPARK-43549][SQL] Convert `_LEGACY_ERROR_TEMP_0036` to INVALID_SQL_SYNTAX

2023-05-19 Thread via GitHub
panbingkun commented on code in PR #41214: URL: https://github.com/apache/spark/pull/41214#discussion_r1198690104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -407,8 +407,8 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] dongjoon-hyun closed pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5 URL: https://github.com/apache/spark/pull/41231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2023-05-19 Thread via GitHub
pan3793 commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1554409912 IMO we need to partially backport this patch to branch-3.3. The base64 function behavior changed since SPARK-37820 (3.3.0), causes some queries, e.g. `select unbase64("abcs==")`,

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198880920 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -149,9 +149,13 @@ class EquivalentExpressions( //

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
HeartSaVioR commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198581431 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -56,6 +56,15 @@ class RocksDB( hadoopConf: Configuration = new

[GitHub] [spark] LuciferYang commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-19 Thread via GitHub
LuciferYang commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1198789118 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -45,12 +53,36 @@ object functions { messageName: String,

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] dongjoon-hyun closed pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun closed pull request #41232: [SPARK-43589][SQL] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString` URL: https://github.com/apache/spark/pull/41232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41234: [SPARK-43589][SQL][3.3] Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

2023-05-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #41234: URL: https://github.com/apache/spark/pull/41234 ### What changes were proposed in this pull request? This is a backporting of #41232 This PR aims to fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-19 Thread via GitHub
bjornjorgensen commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] srowen commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread via GitHub
srowen commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1554550499 Seems reasonable then. Let's just get the tests to run again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] beliefer commented on pull request #41212: [SPARK-43573][BUILD] Make SparkBuilder could config the heap size of test JVM.

2023-05-19 Thread via GitHub
beliefer commented on PR #41212: URL: https://github.com/apache/spark/pull/41212#issuecomment-1554271802 > Do you think you can make this new test environment variable works for both Maven and SBT, @beliefer ? AFAIK, `SparkBuilder` only used for SBT. -- This is an automated

[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

2023-05-19 Thread via GitHub
beliefer commented on PR #40782: URL: https://github.com/apache/spark/pull/40782#issuecomment-1554336985 @ueshin @hvanhovell Recently, https://github.com/apache/spark/pull/41064 added the rowCount statistics to `LocalRelation`. In this PR, @ueshin also suggested to add the row count as

[GitHub] [spark] LuciferYang commented on pull request #40925: [SPARK-43246][BUILD] Ignore `privateClasses` and `privateMembers` from connect mima check as default

2023-05-19 Thread via GitHub
LuciferYang commented on PR #40925: URL: https://github.com/apache/spark/pull/40925#issuecomment-1554342893 > Can you make sure we don't exclude too many cases? Will double check this later -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] Kimahriman commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-19 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198915674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction(

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199562650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] kaiubiferreira opened a new pull request, #41239: Span array function

2023-05-19 Thread via GitHub
kaiubiferreira opened a new pull request, #41239: URL: https://github.com/apache/spark/pull/41239 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] kaiubiferreira closed pull request #41239: Span array function

2023-05-19 Thread via GitHub
kaiubiferreira closed pull request #41239: Span array function URL: https://github.com/apache/spark/pull/41239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] MaxGekk closed pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003

2023-05-19 Thread via GitHub
MaxGekk closed pull request #41200: [SPARK-43539][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0003 URL: https://github.com/apache/spark/pull/41200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] xinrong-meng closed pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng closed pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF URL: https://github.com/apache/spark/pull/41147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xinrong-meng commented on pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng commented on PR #41147: URL: https://github.com/apache/spark/pull/41147#issuecomment-1555302733 Merged to master, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ueshin opened a new pull request, #41240: [SPARK-43545][SQL][PYTHON] Support nested timestamp type

2023-05-19 Thread via GitHub
ueshin opened a new pull request, #41240: URL: https://github.com/apache/spark/pull/41240 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

[GitHub] [spark] github-actions[bot] closed pull request #39838: [SPARK-42270][SQL] Improve sort merge join stability with large stream side

2023-05-19 Thread via GitHub
github-actions[bot] closed pull request #39838: [SPARK-42270][SQL] Improve sort merge join stability with large stream side URL: https://github.com/apache/spark/pull/39838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser

2023-05-19 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1199511465 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,42 @@ class AstBuilder extends

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516321 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala: ## @@ -177,6 +185,33 @@ class RocksDBStateStoreSuite

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516457 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -164,9 +194,34 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] panbingkun opened a new pull request, #41241: [SPARK-43597][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0017

2023-05-19 Thread via GitHub
panbingkun opened a new pull request, #41241: URL: https://github.com/apache/spark/pull/41241 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0017. ### Why are the changes needed? The changes improve the

[GitHub] [spark] xinrong-meng commented on pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-19 Thread via GitHub
xinrong-meng commented on PR #41147: URL: https://github.com/apache/spark/pull/41147#issuecomment-1555303500 Please free to leave comments if any, I'll adjust them in follow-ups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] ericm-db commented on pull request #41205: [SPARK-43542] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with same

2023-05-19 Thread via GitHub
ericm-db commented on PR #41205: URL: https://github.com/apache/spark/pull/41205#issuecomment-1555324959 Thanks for the review! I've made the changes, and I think it's ready to merge now @MaxGekk @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] rangadi commented on pull request #40959: [SPARK-43511][CONNECT][SS]Implemented MapGroupsWithState and FlatMapGroupsWithState APIs for Spark Connect

2023-05-19 Thread via GitHub
rangadi commented on PR #40959: URL: https://github.com/apache/spark/pull/40959#issuecomment-1555381692 > Not tested yet, will perform the test when I'm back. Is this tested yet? Could you update the PR description? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199513814 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -280,34 +342,34 @@ class RocksDBFileManager( val

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199516278 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -280,34 +342,34 @@ class RocksDBFileManager( val

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515952 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] github-actions[bot] closed pull request #39861: [WIP][SPARK-42291] Enable dropping of columns for non V2 tables

2023-05-19 Thread via GitHub
github-actions[bot] closed pull request #39861: [WIP][SPARK-42291] Enable dropping of columns for non V2 tables URL: https://github.com/apache/spark/pull/39861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

2023-05-19 Thread via GitHub
github-actions[bot] commented on PR #39825: URL: https://github.com/apache/spark/pull/39825#issuecomment-1555391404 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515734 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -129,17 +140,36 @@ class RocksDB( * Note that this will copy

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515925 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -205,19 +229,39 @@ class RocksDBFileManager(

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -56,6 +56,15 @@ class RocksDB( hadoopConf: Configuration =

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199515899 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

[GitHub] [spark] wangyum commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread via GitHub
wangyum commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1555428658 @Kimahriman Do you have a way to reproduce? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1199194346 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -134,6 +137,27 @@ class RocksDBFileManager( private

  1   2   >