[GitHub] [spark] otterc commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
otterc commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1553952178 > I'm wondering if this can unblock [SPARK-36744](https://issues.apache.org/jira/browse/SPARK-36744) (Support IO encryption for push-based shuffle). If not, do you happen to know what are

[GitHub] [spark] manuzhang commented on pull request #41173: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers

2023-05-18 Thread via GitHub
manuzhang commented on PR #41173: URL: https://github.com/apache/spark/pull/41173#issuecomment-1553978308 @tgravescs we have seen it quite often when YARN queues were full and the containers were **immediately preempted after launch**. I've updated the PR keeping track of containers

[GitHub] [spark] dongjoon-hyun commented on pull request #41229: [SPARK-43587][CORE][TESTS] Run HealthTrackerIntegrationSuite in a dedicated JVM

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41229: URL: https://github.com/apache/spark/pull/41229#issuecomment-1553998773 Could you review this please, @HyukjinKwon? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198557839 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala: ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] otterc opened a new pull request, #41225: [SPARK-43583] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
otterc opened a new pull request, #41225: URL: https://github.com/apache/spark/pull/41225 ### What changes were proposed in this pull request? This change fixes a bug with push-based shuffle when encryption is enabled on the server. The meta requests for push-merged blocks fail with NPE:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41176: [SPARK-43516] [ML] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator

2023-05-18 Thread via GitHub
zhengruifeng commented on code in PR #41176: URL: https://github.com/apache/spark/pull/41176#discussion_r1198449915 ## python/pyspark/mlv2/base.py: ## @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] wangyum commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
wangyum commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553945799 OK. so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum closed pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
wangyum closed pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9 URL: https://github.com/apache/spark/pull/41219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198528181 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198529114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends

[GitHub] [spark] wangyum commented on pull request #41213: [SPARK-43572][SQL][TEST] Add a test for scrollable result set through thrift server

2023-05-18 Thread via GitHub
wangyum commented on PR #41213: URL: https://github.com/apache/spark/pull/41213#issuecomment-1553995933 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] srielau commented on pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-18 Thread via GitHub
srielau commented on PR #41007: URL: https://github.com/apache/spark/pull/41007#issuecomment-1553790687 @dtenedor There are still 30 multipartIdentifier usages that do NOT support IDENTIFIER() notation. So we would trade mechanical churn in the grammar for code changes in AstBuilder

[GitHub] [spark] cloud-fan commented on a diff in pull request #41216: [SPARK-43383][SQL][FOLLOWUP] LocalRelation should not report row count in tests

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #41216: URL: https://github.com/apache/spark/pull/41216#discussion_r1198414077 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala: ## @@ -78,9 +79,18 @@ case class LocalRelation( } } -

[GitHub] [spark] dongjoon-hyun commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553857614 Well, `cyclonedx-maven-plugin` is not a test case. Also, it's used at release process, not a runtime. > I'd like to find a failed test case to add to

[GitHub] [spark] dongjoon-hyun commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553861234 Maybe, you want to hide `cyclonedx-maven-plugin` under a profile? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun opened a new pull request, #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
panbingkun opened a new pull request, #41224: URL: https://github.com/apache/spark/pull/41224 ### What changes were proposed in this pull request? This PR aims to upgrade `sbt-pom-reader` from 2.2.0 to 2.4.0. ### Why are the changes needed? Since v2.3.0, organization has moved

[GitHub] [spark] LuciferYang commented on pull request #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
LuciferYang commented on PR #41218: URL: https://github.com/apache/spark/pull/41218#issuecomment-1553924881 > IIRC Currently, we have only added compilation checks on `-Ywarn-unused:imports` for Scala 2.12 (the behavior of Scala 2.13 is slightly different).

[GitHub] [spark] advancedxy commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-18 Thread via GitHub
advancedxy commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1198495651 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import

[GitHub] [spark] dongjoon-hyun closed pull request #41216: [SPARK-43383][SQL][FOLLOWUP] LocalRelation should not report row count in tests

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41216: [SPARK-43383][SQL][FOLLOWUP] LocalRelation should not report row count in tests URL: https://github.com/apache/spark/pull/41216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #41216: [SPARK-43383][SQL][FOLLOWUP] LocalRelation should not report row count in tests

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41216: URL: https://github.com/apache/spark/pull/41216#issuecomment-1553938639 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #41178: [SPARK-43519][BUILD][SQL] Bump Parquet to 1.13.1

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41178: [SPARK-43519][BUILD][SQL] Bump Parquet to 1.13.1 URL: https://github.com/apache/spark/pull/41178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #41227: [WIP] Refactor test case in UserDefinedFunctionE2ETestSuite

2023-05-18 Thread via GitHub
LuciferYang opened a new pull request, #41227: URL: https://github.com/apache/spark/pull/41227 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] otterc commented on a diff in pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
otterc commented on code in PR #41225: URL: https://github.com/apache/spark/pull/41225#discussion_r1198505545 ## common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java: ## @@ -139,8 +139,4 @@ protected boolean doAuthChallenge( return true;

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198555761 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -286,44 +343,58 @@ class RocksDB( */ def commit(): Long = {

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198555415 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -164,9 +194,34 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] dongjoon-hyun commented on pull request #41229: [SPARK-43587][CORE][TESTS] Run HealthTrackerIntegrationSuite in a dedicated JVM

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41229: URL: https://github.com/apache/spark/pull/41229#issuecomment-1554045519 Could you review this, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] srielau commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-18 Thread via GitHub
srielau commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1198398925 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -368,7 +369,7 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef]

[GitHub] [spark] dongjoon-hyun commented on pull request #41221: [SPARK-43541][SQL][3.3] Propagate all `Project` tags in resolving of expressions and missing columns

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41221: URL: https://github.com/apache/spark/pull/41221#issuecomment-1553845924 Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #41221: [SPARK-43541][SQL][3.3] Propagate all `Project` tags in resolving of expressions and missing columns

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41221: [SPARK-43541][SQL][3.3] Propagate all `Project` tags in resolving of expressions and missing columns URL: https://github.com/apache/spark/pull/41221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41223: [SPARK-43581][BUILD][K8S] Upgrade `kubernetes-client` to 6.6.2

2023-05-18 Thread via GitHub
dongjoon-hyun opened a new pull request, #41223: URL: https://github.com/apache/spark/pull/41223 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1198426047 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -368,7 +369,7 @@ class AstBuilder extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41223: [SPARK-43581][BUILD][K8S] Upgrade `kubernetes-client` to 6.6.2

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41223: URL: https://github.com/apache/spark/pull/41223#issuecomment-1553926407 Thank you, @HyukjinKwon ! K8s (Unit and Integration) tests passed at the first commit and dependency test passed at the second commit already. Let me merge this. -- This is

[GitHub] [spark] advancedxy commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-18 Thread via GitHub
advancedxy commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1198495651 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import

[GitHub] [spark] otterc commented on pull request #41225: [SPARK-43583] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
otterc commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1553937512 @mridulm @shuwang21 @zhouyejoe Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198526911 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -130,6 +134,23 @@ case class LambdaFunction( override

[GitHub] [spark] yaooqinn commented on pull request #41213: [SPARK-43572][SQL][TEST] Add a test for scrollable result set through thrift server

2023-05-18 Thread via GitHub
yaooqinn commented on PR #41213: URL: https://github.com/apache/spark/pull/41213#issuecomment-1554019805 thanks @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
HyukjinKwon commented on code in PR #41218: URL: https://github.com/apache/spark/pull/41218#discussion_r1198402967 ## core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala: ## @@ -128,8 +128,6 @@ class HadoopRDD[K, V]( protected val jobConfCacheKey: String =

[GitHub] [spark] github-actions[bot] commented on pull request #39838: [SPARK-42270][SQL] Improve sort merge join stability with large stream side

2023-05-18 Thread via GitHub
github-actions[bot] commented on PR #39838: URL: https://github.com/apache/spark/pull/39838#issuecomment-1553839218 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39515: [SPARK-38743][SQL][TEST] Test the error class: MISSING_STATIC_PARTITION_COLUMN

2023-05-18 Thread via GitHub
github-actions[bot] closed pull request #39515: [SPARK-38743][SQL][TEST] Test the error class: MISSING_STATIC_PARTITION_COLUMN URL: https://github.com/apache/spark/pull/39515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] closed pull request #38875: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value

2023-05-18 Thread via GitHub
github-actions[bot] closed pull request #38875: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value URL: https://github.com/apache/spark/pull/38875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] commented on pull request #39861: [WIP][SPARK-42291] Enable dropping of columns for non V2 tables

2023-05-18 Thread via GitHub
github-actions[bot] commented on PR #39861: URL: https://github.com/apache/spark/pull/39861#issuecomment-1553839163 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] dongjoon-hyun commented on pull request #41178: [SPARK-43519][BUILD][SQL] Bump Parquet 1.13.1

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41178: URL: https://github.com/apache/spark/pull/41178#issuecomment-1553912457 +1 for @wangyum 's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
panbingkun commented on PR #41224: URL: https://github.com/apache/spark/pull/41224#issuecomment-1553915757 > Oh, could you take a look at the build error? This looks like a breaking change. > > ``` > [error] /home/runner/work/spark/spark/project/SparkBuild.scala:34:21: object sbt

[GitHub] [spark] dongjoon-hyun closed pull request #41223: [SPARK-43581][BUILD][K8S] Upgrade `kubernetes-client` to 6.6.2

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41223: [SPARK-43581][BUILD][K8S] Upgrade `kubernetes-client` to 6.6.2 URL: https://github.com/apache/spark/pull/41223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] panbingkun opened a new pull request, #41228: [WIP] Upgrade maven plugins

2023-05-18 Thread via GitHub
panbingkun opened a new pull request, #41228: URL: https://github.com/apache/spark/pull/41228 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] dongjoon-hyun closed pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0 URL: https://github.com/apache/spark/pull/41224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41224: URL: https://github.com/apache/spark/pull/41224#issuecomment-1554000199 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
panbingkun commented on code in PR #41218: URL: https://github.com/apache/spark/pull/41218#discussion_r1198432812 ## core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala: ## @@ -128,8 +128,6 @@ class HadoopRDD[K, V]( protected val jobConfCacheKey: String =

[GitHub] [spark] dongjoon-hyun commented on pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41224: URL: https://github.com/apache/spark/pull/41224#issuecomment-1553928960 Thank you for update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
panbingkun commented on code in PR #41224: URL: https://github.com/apache/spark/pull/41224#discussion_r1198493858 ## project/SparkBuild.scala: ## @@ -31,7 +31,7 @@ import sbt.Keys._ import sbt.librarymanagement.{ VersionNumber, SemanticSelector } import

[GitHub] [spark] panbingkun opened a new pull request, #41226: [SPARK-43584][BUILD] Update some sbt plugins

2023-05-18 Thread via GitHub
panbingkun opened a new pull request, #41226: URL: https://github.com/apache/spark/pull/41226 ### What changes were proposed in this pull request? The pr aims to update some sbt plugins to newest version. include: - - - - ### Why are the changes needed? Routine

[GitHub] [spark] dongjoon-hyun commented on pull request #41178: [SPARK-43519][BUILD][SQL] Bump Parquet to 1.13.1

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41178: URL: https://github.com/apache/spark/pull/41178#issuecomment-1553948802 Thank you for updating. Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] otterc commented on a diff in pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
otterc commented on code in PR #41225: URL: https://github.com/apache/spark/pull/41225#discussion_r1198505545 ## common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java: ## @@ -139,8 +139,4 @@ protected boolean doAuthChallenge( return true;

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41176: [SPARK-43516] [ML] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator

2023-05-18 Thread via GitHub
WeichenXu123 commented on code in PR #41176: URL: https://github.com/apache/spark/pull/41176#discussion_r1198515956 ## python/pyspark/mlv2/tests/test_feature.py: ## @@ -0,0 +1,103 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] LuciferYang opened a new pull request, #41230: [SPARK-43586][SQL] Use the smaller value of `Range.numElements` and `Range.numSlices` as `numSlices` of `RangeExec`

2023-05-18 Thread via GitHub
LuciferYang opened a new pull request, #41230: URL: https://github.com/apache/spark/pull/41230 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] shuwang21 commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
shuwang21 commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1554049148 LGTM. Thanks for your efforts! Do you think when `spark.network.crypto.saslFallback=true` and L95 from `AuthRpcHandler.java`. ``` saslHandler = new SaslRpcHandler(conf,

[GitHub] [spark] wangyum commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
wangyum commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553807202 I'd like to find a failed test case to add to https://github.com/Homebrew/homebrew-core/pull/131189. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41222: [SPARK-43580][PYTHON][TESTS] Add `https://dlcdn.apache.org/` to default_sites of get_preferred_mirrors

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41222: URL: https://github.com/apache/spark/pull/41222#issuecomment-1553804886 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun closed pull request #41222: [SPARK-43580][PYTHON][TESTS] Add `https://dlcdn.apache.org/` to default_sites of get_preferred_mirrors

2023-05-18 Thread via GitHub
dongjoon-hyun closed pull request #41222: [SPARK-43580][PYTHON][TESTS] Add `https://dlcdn.apache.org/` to default_sites of get_preferred_mirrors URL: https://github.com/apache/spark/pull/41222 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] huanliwang-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
huanliwang-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198393224 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -149,10 +179,10 @@ class RocksDB( } else {

[GitHub] [spark] wangyum commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
wangyum commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553862383 2.7.9 also contains other bug fixes and improvements: https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.9

[GitHub] [spark] chaoqin-li1123 commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
chaoqin-li1123 commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1553989670 Hi @HyukjinKwon, the tests I add in this pr cause the sql test ci to timeout, is it possible to move streaming tests to a separate test runnner?

[GitHub] [spark] dongjoon-hyun commented on pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41225: URL: https://github.com/apache/spark/pull/41225#issuecomment-1554001410 Thank you for sharing the rich context, @otterc . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] panbingkun commented on pull request #41231: [SPARK-43588][BUILD] Upgrade ASM to 9.5

2023-05-18 Thread via GitHub
panbingkun commented on PR #41231: URL: https://github.com/apache/spark/pull/41231#issuecomment-1554050789 +1, LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on pull request #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
HyukjinKwon commented on PR #41218: URL: https://github.com/apache/spark/pull/41218#issuecomment-1553807658 @panbingkun how did you find them? FWIW, I thought we added a compilation feature to disallow unused variables IIRC, @LuciferYang . -- This is an automated message from the

[GitHub] [spark] Stove-hust commented on a diff in pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-05-18 Thread via GitHub
Stove-hust commented on code in PR #40412: URL: https://github.com/apache/spark/pull/40412#discussion_r1198448075 ## core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala: ## @@ -273,7 +273,7 @@ private[spark] class DiskBlockManager(

[GitHub] [spark] panbingkun commented on pull request #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
panbingkun commented on PR #41218: URL: https://github.com/apache/spark/pull/41218#issuecomment-1553870280 > @panbingkun how did you find them? > > FWIW, I thought we added a compilation feature to disallow unused variables IIRC, @LuciferYang . @HyukjinKwon 1.When I

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41224: [SPARK-43582][BUILD] Upgrade `sbt-pom-reader` to 2.4.0

2023-05-18 Thread via GitHub
dongjoon-hyun commented on code in PR #41224: URL: https://github.com/apache/spark/pull/41224#discussion_r1198474441 ## project/plugins.sbt: ## @@ -43,6 +43,6 @@ libraryDependencies += "org.ow2.asm" % "asm-commons" % "9.4" addSbtPlugin("com.simplytyped" % "sbt-antlr4" %

[GitHub] [spark] dongjoon-hyun commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553903460 Do you think Apache Spark's SBOM artifacts are affected by one of them? > 2.7.9 also contains other bug fixes and improvements: -- This is an automated message from the Apache

[GitHub] [spark] pan3793 commented on pull request #41178: [SPARK-43519][BUILD][SQL] Bump Parquet 1.13.1

2023-05-18 Thread via GitHub
pan3793 commented on PR #41178: URL: https://github.com/apache/spark/pull/41178#issuecomment-1553917876 @wangyum @dongjoon-hyun, the PR description is updated, please take a look again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41225: [SPARK-43583][CORE] get MergedBlockedMetaReqHandler from the delegate instead of the SaslRpcHandler instance

2023-05-18 Thread via GitHub
dongjoon-hyun commented on code in PR #41225: URL: https://github.com/apache/spark/pull/41225#discussion_r1198502341 ## common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java: ## @@ -139,8 +139,4 @@ protected boolean doAuthChallenge( return

[GitHub] [spark] dongjoon-hyun commented on pull request #41219: [SPARK-43577][BUILD] Upgrade cyclonedx-maven-plugin to 2.7.9

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41219: URL: https://github.com/apache/spark/pull/41219#issuecomment-1553947929 Thank you for closing. We can reopen this when `master` branch is ready for `Java 21` and we are able to verify this patch. -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on a diff in pull request #34558: [SPARK-37019][SQL] Add codegen support to array higher-order functions

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1198524171 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -149,9 +149,13 @@ class EquivalentExpressions( //

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser

2023-05-18 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1198544619 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,42 @@ class AstBuilder extends

[GitHub] [spark] wangyum closed pull request #41213: [SPARK-43572][SQL][TEST] Add a test for scrollable result set through thrift server

2023-05-18 Thread via GitHub
wangyum closed pull request #41213: [SPARK-43572][SQL][TEST] Add a test for scrollable result set through thrift server URL: https://github.com/apache/spark/pull/41213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41229: [SPARK-43587][CORE][TESTS] Run HealthTrackerIntegrationSuite in a dedicated JVM

2023-05-18 Thread via GitHub
dongjoon-hyun opened a new pull request, #41229: URL: https://github.com/apache/spark/pull/41229 ### What changes were proposed in this pull request? This PR aims to run `HealthTrackerIntegrationSuite` in a dedicated JVM. ### Why are the changes needed? -

[GitHub] [spark] anishshri-db commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-18 Thread via GitHub
anishshri-db commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1198556470 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -334,25 +405,59 @@ class RocksDB( loadedVersion = -1 //

[GitHub] [spark] pan3793 commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-18 Thread via GitHub
pan3793 commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1197682260 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -18,21 +18,22 @@ package

[GitHub] [spark] pan3793 opened a new pull request, #41217: [SPARK-43575][BUILD][SS] Exclude duplicated classes from kafka assembly jar

2023-05-18 Thread via GitHub
pan3793 opened a new pull request, #41217: URL: https://github.com/apache/spark/pull/41217 ### What changes were proposed in this pull request? Exclude `javax.activation:activation:jar:1.1.1` and `org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.20.0` from

[GitHub] [spark] panbingkun opened a new pull request, #41214: [SPARK-43549][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0036

2023-05-18 Thread via GitHub
panbingkun opened a new pull request, #41214: URL: https://github.com/apache/spark/pull/41214 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0036. ### Why are the changes needed? The changes improve the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-18 Thread via GitHub
cloud-fan commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1197666951 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon opened a new pull request, #41215: [SPARK-43574][PYTHON] Support to set Python executable in workers during runtime

2023-05-18 Thread via GitHub
HyukjinKwon opened a new pull request, #41215: URL: https://github.com/apache/spark/pull/41215 ### What changes were proposed in this pull request? This PR proposes a new configuration `spark.sql.execution.pyspark.python` that sets the Python executable on worker nodes. Note

[GitHub] [spark] pan3793 commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-18 Thread via GitHub
pan3793 commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1197682260 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -18,21 +18,22 @@ package

[GitHub] [spark] pan3793 commented on pull request #41217: [SPARK-43575][BUILD][SS] Exclude duplicated classes from kafka assembly jar

2023-05-18 Thread via GitHub
pan3793 commented on PR #41217: URL: https://github.com/apache/spark/pull/41217#issuecomment-1552936070 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on pull request #41212: [SPARK-43573][BUILD] Make SparkBuilder could config the heap size of test JVM.

2023-05-18 Thread via GitHub
beliefer commented on PR #41212: URL: https://github.com/apache/spark/pull/41212#issuecomment-1552886306 ping @dongjoon-hyun @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun opened a new pull request, #41218: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-18 Thread via GitHub
panbingkun opened a new pull request, #41218: URL: https://github.com/apache/spark/pull/41218 ### What changes were proposed in this pull request? The pr aims to remove unused declarations from `Core` module ### Why are the changes needed? Make code clean. ### Does this

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-18 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197607669 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -414,6 +414,9 @@ private[spark] class SparkSubmit extends Logging { // directory too.

[GitHub] [spark] cloud-fan opened a new pull request, #41216: WIP

2023-05-18 Thread via GitHub
cloud-fan opened a new pull request, #41216: URL: https://github.com/apache/spark/pull/41216 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] itholic commented on pull request #40658: [WIP][SPARK-43024][PS] Upgrade pandas to 2.0.0

2023-05-18 Thread via GitHub
itholic commented on PR #40658: URL: https://github.com/apache/spark/pull/40658#issuecomment-1552517228 I just have opened a [new PR](https://github.com/apache/spark/pull/41211) that does not include any behavior changes related to the pandas API on Spark. Since there are already

[GitHub] [spark] itholic opened a new pull request, #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-18 Thread via GitHub
itholic opened a new pull request, #41211: URL: https://github.com/apache/spark/pull/41211 ### What changes were proposed in this pull request? This PR proposes to upgrade pandas to 2.0.0. ### Why are the changes needed? To support latest pandas. ### Does this PR

[GitHub] [spark] cloud-fan closed pull request #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array

2023-05-18 Thread via GitHub
cloud-fan closed pull request #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array URL: https://github.com/apache/spark/pull/41187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-18 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197526307 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -414,6 +414,9 @@ private[spark] class SparkSubmit extends Logging { // directory too.

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-18 Thread via GitHub
dongjoon-hyun commented on code in PR #41211: URL: https://github.com/apache/spark/pull/41211#discussion_r1197526532 ## python/pyspark/pandas/tests/data_type_ops/test_date_ops.py: ## @@ -61,6 +63,10 @@ def test_add(self): for psser in self.pssers:

[GitHub] [spark] advancedxy commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-18 Thread via GitHub
advancedxy commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197525305 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -414,6 +414,9 @@ private[spark] class SparkSubmit extends Logging { // directory

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-18 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197526307 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -414,6 +414,9 @@ private[spark] class SparkSubmit extends Logging { // directory too.

[GitHub] [spark] yaooqinn opened a new pull request, #41213: [SPARK-43572][SQL][TEST] Add a test for scrollable result set through thrift server

2023-05-18 Thread via GitHub
yaooqinn opened a new pull request, #41213: URL: https://github.com/apache/spark/pull/41213 ### What changes were proposed in this pull request? Add a new test for scrollable result set support, which is uncovered yet through jdbc APIs ### Why are the changes

[GitHub] [spark] advancedxy commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-18 Thread via GitHub
advancedxy commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197555100 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -414,6 +414,9 @@ private[spark] class SparkSubmit extends Logging { // directory

[GitHub] [spark] dongjoon-hyun commented on pull request #41202: [SPARK-43413][SQL][FOLLOWUP] Show a directional message in ListQuery nullability assertion

2023-05-18 Thread via GitHub
dongjoon-hyun commented on PR #41202: URL: https://github.com/apache/spark/pull/41202#issuecomment-1552472727 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array

2023-05-18 Thread via GitHub
cloud-fan commented on PR #41187: URL: https://github.com/apache/spark/pull/41187#issuecomment-1552522063 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #40658: [SPARK-XXXXX][PS] Matching the behavior of pandas API on Spark to pandas 2.0.0

2023-05-18 Thread via GitHub
itholic commented on PR #40658: URL: https://github.com/apache/spark/pull/40658#issuecomment-1552522173 Let me close this PR for now, will revisit when we ready for the next Apache Spark major release. -- This is an automated message from the Apache Git Service. To respond to the

  1   2   >