[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643575373 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643575296 **[Test build #123964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123964/testReport)** for PR 28708 at commit [`da1db47`](https://github.com/apache/spark/commit/da1db4740778b3f12df88e4c28aa0602ff15417e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643575373 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643560584 **[Test build #123960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123960/testReport)** for PR 28708 at commit [`fe34308`](https://github.com/apache/spark/commit/fe34308ae700540559d50094e817f58cb681b402). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643575263 **[Test build #123960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123960/testReport)** for PR 28708 at commit [`fe34308`](https://github.com/apache/spark/commit/fe34308ae700540559d50094e817f58cb681b402). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ShuffleBlockInfo(shuffleId: Int, mapId: Long)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
AmplabJenkins removed a comment on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643573682 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123957/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
AmplabJenkins removed a comment on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643573679 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
SparkQA removed a comment on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643555188 **[Test build #123957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123957/testReport)** for PR 28817 at commit [`ea8efc7`](https://github.com/apache/spark/commit/ea8efc7781c9e1c387efe03f6720b8c80c7f9482). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
AmplabJenkins commented on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643573679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
SparkQA commented on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643573565 **[Test build #123957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123957/testReport)** for PR 28817 at commit [`ea8efc7`](https://github.com/apache/spark/commit/ea8efc7781c9e1c387efe03f6720b8c80c7f9482). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
AmplabJenkins commented on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-643572368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
AmplabJenkins removed a comment on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-643572368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
SparkQA removed a comment on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-643549980 **[Test build #123953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123953/testReport)** for PR 28641 at commit [`681d9e5`](https://github.com/apache/spark/commit/681d9e5d14800a171b97e6d73a02ae6f14eba19d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
SparkQA commented on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-643572136 **[Test build #123953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123953/testReport)** for PR 28641 at commit [`681d9e5`](https://github.com/apache/spark/commit/681d9e5d14800a171b97e6d73a02ae6f14eba19d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r439710508 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ## @@ -165,6 +166,26 @@ class WholeStageCodegenSuite extends QueryTest with SharedSparkSession } } + test("SPARK-: Avoid spill in partial aggregation " + +"when spark.sql.aggregate.spill.partialaggregate.disabled is set") { +withSQLConf((SQLConf.SPILL_PARTIAL_AGGREGATE_DISABLED.key, "true"), Review comment: @maropu I figured out few more improvements taht can be made to the generated code, I will test them and also addd the benchmark number. Adding **WIP** tag to the title. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643571011 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123956/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643571010 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643571010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA removed a comment on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643555190 **[Test build #123956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123956/testReport)** for PR 28818 at commit [`2ff94ec`](https://github.com/apache/spark/commit/2ff94ece8eb9b33524fd6705bfcaca6427855450). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643570926 **[Test build #123956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123956/testReport)** for PR 28818 at commit [`2ff94ec`](https://github.com/apache/spark/commit/2ff94ece8eb9b33524fd6705bfcaca6427855450). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643570837 **[Test build #123963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123963/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643570278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643570278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643570195 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28806: [SPARK-31967][UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading time regression
sarutak commented on pull request #28806: URL: https://github.com/apache/spark/pull/28806#issuecomment-643569783 > This downgraded solution is better. In my local setup, the perf issue can't not be reproduced, and by default the infinite timeline redrawing won't happen unless the "Enable zooming" checkbox. I believe this is a better solution. Downgrading `vis.js` is better though, actually, infinite redrawing can happen even if `Enable zooming` is not checked. https://github.com/apache/spark/pull/28806#issuecomment-643116427 . It seems to depend on browser. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28752: [SPARK-31642][FOLLOWUP] Fix Sorting for duration column and make Status column sortable
sarutak commented on pull request #28752: URL: https://github.com/apache/spark/pull/28752#issuecomment-643569254 > LGTM. Do you all have an opinion on back porting to branch-3.0? I think I would. Yeah, it would be better. What do you think @iRakson ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643568937 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123962/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643568935 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643568935 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564457 **[Test build #123962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123962/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643568905 **[Test build #123962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123962/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
agrawaldevesh commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643566981 > Although I'm a little fuzzy on what you mean by "eager" (if you mean as soon as the migrations are completed then yes) Thank you for confirming ! By *eager*, I specifically mean _somehow_ triggering a code path that can ASAP trigger `DAGScheduler#handleExecutorLost(_, workerLost = true)` codepath, such that it can clear out the shuffle map files. This is more about not having fetch failures from decom as opposed to recouping resources. One way that this is triggered today is by `CoarseGrainedSchedulerBackend.DriverEndpoint#onDisconnected`, but I don't really know if there is a timeout at play here. This `workerLost = true` bit is set only in a few cases unfortunately, so we might have to add some code (or do some testing) to achieve this. I think https://issues.apache.org/jira/browse/SPARK-31197 is meant for this ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643566829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643566824 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28582/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643566829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-643565961 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123952/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-643565959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
AmplabJenkins removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-643565959 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
SparkQA removed a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-643547384 **[Test build #123952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123952/testReport)** for PR 28769 at commit [`2b8586b`](https://github.com/apache/spark/commit/2b8586b0cc660aea1a2cd63d12b202b3c9256219). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
SparkQA commented on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-643565763 **[Test build #123952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123952/testReport)** for PR 28769 at commit [`2b8586b`](https://github.com/apache/spark/commit/2b8586b0cc660aea1a2cd63d12b202b3c9256219). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643564647 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28582/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564557 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564557 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r439706623 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule coalesces one side of the `SortMergeJoin` if the following conditions are met: + * - Two bucketed tables are joined. + * - The larger bucket number is divisible by the smaller bucket number. + * - COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED is set to true. + * - The ratio of the number of buckets is less than the value set in + * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO. + */ +case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan] { + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if (large % small == 0 && + large / small <= conf.getConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO)) { + Some(small) +} else { + None +} + } + + private def updateNumCoalescedBuckets(plan: SparkPlan, numCoalescedBuckets: Int): SparkPlan = { +plan.transformUp { + case f: FileSourceScanExec => +f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) +} + } + + def apply(plan: SparkPlan): SparkPlan = { +if (!conf.getConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED)) { + return plan +} + +plan transform { + case ExtractSortMergeJoinWithBuckets(smj, numLeftBuckets, numRightBuckets) +if numLeftBuckets != numRightBuckets => +mayCoalesce(numLeftBuckets, numRightBuckets, conf).map { numCoalescedBuckets => + if (numCoalescedBuckets != numLeftBuckets) { +smj.copy(left = updateNumCoalescedBuckets(smj.left, numCoalescedBuckets)) + } else { +smj.copy(right = updateNumCoalescedBuckets(smj.right, numCoalescedBuckets)) + } +}.getOrElse(smj) + case other => other +} + } +} + +/** + * An extractor that extracts `SortMergeJoinExec` where both sides of the join have the bucketed + * tables and are consisted of only the scan operation. + */ +object ExtractSortMergeJoinWithBuckets { + private def isScanOperation(plan: SparkPlan): Boolean = plan match { +case f: FilterExec => isScanOperation(f.child) +case p: ProjectExec => isScanOperation(p.child) Review comment: Would this be a cleaner approach? Or handling buckets in one place (`FileSourceScanExec`) would be cleaner? What do you think @cloud-fan / @maropu? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564457 **[Test build #123962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123962/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r439706576 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -165,6 +166,7 @@ case class FileSourceScanExec( requiredSchema: StructType, partitionFilters: Seq[Expression], optionalBucketSet: Option[BitSet], +optionalNumCoalescedBuckets: Option[Int], Review comment: Yes, good idea. @cloud-fan/@maropu Do you want me to include this in this PR or do it as a follow up since this PR is already approved? I am fine with either one. Thanks @viirya! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r439706394 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule coalesces one side of the `SortMergeJoin` if the following conditions are met: + * - Two bucketed tables are joined. Review comment: Since `SortMergeJoinExec` is created only for the equi-join case, I don't think we don't need to check it in this rule. I can update the PR description to remove `equality conditions` if it causes a confusion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564022 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28819: [SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates
AmplabJenkins removed a comment on pull request #28819: URL: https://github.com/apache/spark/pull/28819#issuecomment-643563705 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28819: [SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates
AmplabJenkins commented on pull request #28819: URL: https://github.com/apache/spark/pull/28819#issuecomment-643563821 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r439706256 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,7 +99,34 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getMRTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) Review comment: Several HDFS scratch directories are created during start SessionState. If the session scratch directory is created in the path specified in `_hive.hdfs.session.path`, the directory should be used. If it is not specified, then we just use scratchDir. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643563711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins commented on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643563711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28819: [SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates
AmplabJenkins commented on pull request #28819: URL: https://github.com/apache/spark/pull/28819#issuecomment-643563705 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 opened a new pull request #28819: [SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates
TJX2014 opened a new pull request #28819: URL: https://github.com/apache/spark/pull/28819 ### What changes were proposed in this pull request? Add judge equal condition as bigger in `org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl#eval` Unit for interval `day`, `month`, `year` ### Why are the changes needed? Bug exists when sequence input get a equal start and end date, which will occur `while loop` forever ### Does this PR introduce _any_ user-facing change? Yes, Before this PR, people will get a `java.lang.ArrayIndexOutOfBoundsException`, when eval as below: `sql("select sequence(cast('2011-03-01' as date), cast('2011-03-01' as date), interval 1 year)").show(false) ` ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
SparkQA commented on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643563372 **[Test build #123944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123944/testReport)** for PR 28807 at commit [`17b8f2c`](https://github.com/apache/spark/commit/17b8f2cdfd7ef6e8658ce26791dc445ee99d41e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
SparkQA removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643502351 **[Test build #123944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123944/testReport)** for PR 28807 at commit [`17b8f2c`](https://github.com/apache/spark/commit/17b8f2cdfd7ef6e8658ce26791dc445ee99d41e3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643562280 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28580/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643562282 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643562282 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r439705159 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -124,11 +153,24 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { val hiveVersion = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging") val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive") +logDebug(s"path '${path.toString}', staging dir '$stagingDir', " + + s"scratch dir '$scratchDir' are used") if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) { oldVersionExternalTempPath(path, hadoopConf, scratchDir) } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) { Review comment: It is because old versions do not support data copy between different type of DFSs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-643561834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-643561834 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-643561728 **[Test build #123961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123961/testReport)** for PR 28593 at commit [`e29409d`](https://github.com/apache/spark/commit/e29409d9f358082fe999e079bc2a38b32bcc89c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643560584 **[Test build #123960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123960/testReport)** for PR 28708 at commit [`fe34308`](https://github.com/apache/spark/commit/fe34308ae700540559d50094e817f58cb681b402). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643560550 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28580/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643559589 Although I'm a little fuzzy on what you mean by "eager" (if you mean as soon as the migrations are completed then yes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
maropu commented on a change in pull request #28807: URL: https://github.com/apache/spark/pull/28807#discussion_r439702584 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala ## @@ -388,12 +391,24 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { val reservedKeywordsInAnsiMode = allCandidateKeywords -- nonReservedKeywordsInAnsiMode test("check # of reserved keywords") { -val numReservedKeywords = 78 +val numReservedKeywords = 75 assert(reservedKeywordsInAnsiMode.size == numReservedKeywords, s"The expected number of reserved keywords is $numReservedKeywords, but " + s"${reservedKeywordsInAnsiMode.size} found.") } + test("should follow reserved keywords in SQL:2016") { +withTempDir { dir => + val tmpFile = new File(dir, "tmp") + val is = Thread.currentThread().getContextClassLoader +.getResourceAsStream("ansi-sql-2016-reserved-keywords.txt") + Files.copy(is, tmpFile.toPath) + val reservedKeywordsInSql2016 = Files.readAllLines(tmpFile.toPath) +.asScala.filterNot(_.startsWith("--")).map(_.trim).toSet + assert(((reservedKeywordsInAnsiMode -- Set("!")) -- reservedKeywordsInSql2016).isEmpty) Review comment: I noticed that, since `NOT` reserved, `!` also reserved, too. (`NOT: 'NOT' | '!';`); ``` scala> sql("SET spark.sql.ansi.enabled=false") scala> spark.sql("create table r2 (! int);") res2: org.apache.spark.sql.DataFrame = [] scala> sql("SET spark.sql.ansi.enabled=true") scala> spark.sql("create table r2 (! int);") org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input '!'(line 1, pos 17) == SQL == create table r2 (! int); -^^^ ``` @cloud-fan Is this expected? FYI: It seems PostgreSQL cannot accept `!` in column names; ``` postgres=# create table r2 (! int); 2020-06-13 11:30:24.495 JST [36406] ERROR: syntax error at or near "!" at character 18 2020-06-13 11:30:24.495 JST [36406] STATEMENT: create table r2 (! int); ERROR: syntax error at or near "!" LINE 1: create table r2 (! int); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643557580 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123945/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643557577 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643557577 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643557413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643502389 **[Test build #123945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123945/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins commented on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643557413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643557363 **[Test build #123945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123945/testReport)** for PR 28123 at commit [`cf5b835`](https://github.com/apache/spark/commit/cf5b83535c0998a91a3ad2e232516a4a219fff92). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
SparkQA commented on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643557325 **[Test build #123959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123959/testReport)** for PR 28807 at commit [`086a2ba`](https://github.com/apache/spark/commit/086a2ba8dc96d7aa913940b0b553f29496099e6a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643555962 **[Test build #123958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123958/testReport)** for PR 28708 at commit [`0ea927d`](https://github.com/apache/spark/commit/0ea927d9148089a3799b2e94a35589795967e819). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28801: [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties
AmplabJenkins commented on pull request #28801: URL: https://github.com/apache/spark/pull/28801#issuecomment-643555627 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28801: [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties
AmplabJenkins removed a comment on pull request #28801: URL: https://github.com/apache/spark/pull/28801#issuecomment-643555627 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28801: [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties
SparkQA removed a comment on pull request #28801: URL: https://github.com/apache/spark/pull/28801#issuecomment-643520932 **[Test build #123947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123947/testReport)** for PR 28801 at commit [`2c77bbb`](https://github.com/apache/spark/commit/2c773ea11c299bc73b80aed925cc5b0b3b92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643555300 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
AmplabJenkins removed a comment on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643555289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643555300 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28801: [SPARK-31970][CORE] Make MDC configuration step be consistent between setLocalProperty and log4j.properties
SparkQA commented on pull request #28801: URL: https://github.com/apache/spark/pull/28801#issuecomment-643555328 **[Test build #123947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123947/testReport)** for PR 28801 at commit [`2c77bbb`](https://github.com/apache/spark/commit/2c773ea11c299bc73b80aed925cc5b0b3b92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
AmplabJenkins commented on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643555289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
SparkQA commented on pull request #28817: URL: https://github.com/apache/spark/pull/28817#issuecomment-643555188 **[Test build #123957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123957/testReport)** for PR 28817 at commit [`ea8efc7`](https://github.com/apache/spark/commit/ea8efc7781c9e1c387efe03f6720b8c80c7f9482). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #28818: URL: https://github.com/apache/spark/pull/28818#issuecomment-643555190 **[Test build #123956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123956/testReport)** for PR 28818 at commit [`2ff94ec`](https://github.com/apache/spark/commit/2ff94ece8eb9b33524fd6705bfcaca6427855450). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28814: [SPARK-31968][SQL]Duplicate partition columns check when writing data
AmplabJenkins removed a comment on pull request #28814: URL: https://github.com/apache/spark/pull/28814#issuecomment-643554396 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk opened a new pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
holdenk opened a new pull request #28818: URL: https://github.com/apache/spark/pull/28818 This is WIP since it is on top of SPARK-31197 (which itself is WIP on top off SPARK-20629 ) and should probably have more testing. ### What changes were proposed in this pull request? If graceful decommissioning is enabled, Spark's dynamic scaling uses this instead of directly killing executors. ### Why are the changes needed? When scaling down Spark we should avoid triggering recomputes as much as possible. ### Does this PR introduce _any_ user-facing change? Hopefully their jobs run faster. It also enables experimental shuffle service free decommissioning when graceful decommissioning is enabled. ### How was this patch tested? For now I've extended the ExecutorAllocationManagerSuite to cover this. I'll also add a more integration style test on K8s. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #27636: [SPARK-30873][CORE][YARN]Handling Node Decommissioning for Yarn cluster manger in Spark
agrawaldevesh commented on a change in pull request #27636: URL: https://github.com/apache/spark/pull/27636#discussion_r439683319 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ## @@ -436,6 +449,72 @@ private[yarn] class YarnAllocator( logDebug("Finished processing %d completed containers. Current running executor count: %d." .format(completedContainers.size, getNumExecutorsRunning)) } + +// If the flags is enabled than GRACEFUL_DECOMMISSION_ENABLE Review comment: This comment is superfluous. thank -> then ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ## @@ -436,6 +449,72 @@ private[yarn] class YarnAllocator( logDebug("Finished processing %d completed containers. Current running executor count: %d." .format(completedContainers.size, getNumExecutorsRunning)) } + +// If the flags is enabled than GRACEFUL_DECOMMISSION_ENABLE +// than handling the Node loss scenario using the decommission tracker. +if (sparkConf.get(GRACEFUL_DECOMMISSION_ENABLE)) { + processGracefulDecommission(allocateResponse) +} + } + + // Helper method to get NodeState of the Yarn. + def getYarnNodeState(state: YarnNodeState): NodeState.Value = { +// In hadoop-2.7 there is no support for node state DECOMMISSIONING +// In Hadoop-2.8, hadoop3.1 and later version of spark there is a support +// to node state DECOMMISSIONING. +// Inorder to build the spark using hadoop2 and hadoop3, not +// using YarnNodeState for the node state DECOMMISSIONING here and +// and for other state we are matching the YarnNodeState and assigning +// the node state at spark end +if (state.toString.equals(NodeState.DECOMMISSIONING.toString)) { + NodeState.DECOMMISSIONING +} else { + state match { +case YarnNodeState.RUNNING => NodeState.RUNNING +case YarnNodeState.DECOMMISSIONED => NodeState.DECOMMISSIONED +case YarnNodeState.LOST => NodeState.LOST +case YarnNodeState.UNHEALTHY => NodeState.LOST +case _ => NodeState.OTHER + } +} + } + + def processGracefulDecommission(allocateResponse: AllocateResponse): Unit = { +// Create a consolidated node decommission info report. +val nodeInfos = new HashMap[String, NodeInfo] + +// node with updated information. +val getUpdatedNodes = allocateResponse.getUpdatedNodes() +if (getUpdatedNodes != null) { + val updatedNodes = getUpdatedNodes.asScala + for (x <- updatedNodes) { +if (x.getNodeState.toString.equals(NodeState.DECOMMISSIONING.toString)) { + // In hadoop 2.7 there is no support getDecommissioningTimeout whereas + // In hadoop 3.1 and later version of hadoop there is support + // of getDecommissioningTimeout So the method call made using reflection + // to update the value nodeTerminationTime and for lower version of hadoop2.7 + // use the config spark.graceful.decommission.node.timeout which is specific to cloud + var nodeTerminationTime = clock.getTimeMillis() + nodeLossInterval * 1000 + try { + val decommiossioningTimeout = x.getClass.getMethod( Review comment: typo ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ## @@ -436,6 +449,72 @@ private[yarn] class YarnAllocator( logDebug("Finished processing %d completed containers. Current running executor count: %d." .format(completedContainers.size, getNumExecutorsRunning)) } + +// If the flags is enabled than GRACEFUL_DECOMMISSION_ENABLE +// than handling the Node loss scenario using the decommission tracker. +if (sparkConf.get(GRACEFUL_DECOMMISSION_ENABLE)) { + processGracefulDecommission(allocateResponse) +} + } + + // Helper method to get NodeState of the Yarn. + def getYarnNodeState(state: YarnNodeState): NodeState.Value = { +// In hadoop-2.7 there is no support for node state DECOMMISSIONING +// In Hadoop-2.8, hadoop3.1 and later version of spark there is a support +// to node state DECOMMISSIONING. +// Inorder to build the spark using hadoop2 and hadoop3, not +// using YarnNodeState for the node state DECOMMISSIONING here and +// and for other state we are matching the YarnNodeState and assigning +// the node state at spark end +if (state.toString.equals(NodeState.DECOMMISSIONING.toString)) { + NodeState.DECOMMISSIONING +} else { + state match { +case YarnNodeState.RUNNING => NodeState.RUNNING +case YarnNodeState.DECOMMISSIONED => NodeState.DECOMMISSIONED +case YarnNodeState.LOST => NodeState.LOST +case YarnNodeState.UNHEALTHY => NodeState.LOST +case _ => NodeState.OTHER + } +} + } + + def
[GitHub] [spark] holdenk opened a new pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629
holdenk opened a new pull request #28817: URL: https://github.com/apache/spark/pull/28817 ### What changes were proposed in this pull request? Exit the executor when it has been asked to decommission and there is nothing left for it to do. ### Why are the changes needed? If we want to use decommissioning in Spark's own scale down we should terminate the executor once finished. ### Does this PR introduce _any_ user-facing change? The decommissioned executors will exit and the end of decommissioning. This is sort of a user facing change, however decommissioning hasn't been in any releases yet. ### How was this patch tested? I changed the unit test to not send the executor exit message and still wait on the executor exited message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28814: [SPARK-31968][SQL]Duplicate partition columns check when writing data
AmplabJenkins commented on pull request #28814: URL: https://github.com/apache/spark/pull/28814#issuecomment-643554396 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28814: [SPARK-31968][SQL]Duplicate partition columns check when writing data
SparkQA commented on pull request #28814: URL: https://github.com/apache/spark/pull/28814#issuecomment-643554228 **[Test build #123955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123955/testReport)** for PR 28814 at commit [`4f0bc9f`](https://github.com/apache/spark/commit/4f0bc9f5ca2702e5daa4bad54e3158011d5041a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shanyu commented on pull request #27843: [SPARK-31029] Avoid using global execution context in driver main thread for YarnSchedulerBackend
shanyu commented on pull request #27843: URL: https://github.com/apache/spark/pull/27843#issuecomment-643554083 Hi @tgravescs are we good to merge this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r439700057 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.util.concurrent.ExecutorService + +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.shuffle.MigratableResolver +import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock +import org.apache.spark.util.ThreadUtils + +/** + * Class to handle block manager decommissioning retries. + * It creates a Thread to retry offloading all RDD cache and Shuffle blocks + */ +private[storage] class BlockManagerDecommissioner( + conf: SparkConf, + bm: BlockManager) extends Logging { + + private val maxReplicationFailuresForDecommission = +conf.get(config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK) + + private class ShuffleMigrationRunnable(peer: BlockManagerId) extends Runnable { +@volatile var running = true +override def run(): Unit = { + var migrating: Option[(Int, Long)] = None + logInfo(s"Starting migration thread for ${peer}") + // Once a block fails to transfer to an executor stop trying to transfer more blocks + try { +while (running && !Thread.interrupted()) { + val migrating = Option(shufflesToMigrate.poll()) + migrating match { +case None => + logInfo("Nothing to migrate") + // Nothing to do right now, but maybe a transfer will fail or a new block + // will finish being committed. + val SLEEP_TIME_SECS = 1 + Thread.sleep(SLEEP_TIME_SECS * 1000L) +case Some((shuffleId, mapId)) => + logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to ${peer}") + val blocks = +bm.migratableResolver.getMigrationBlocks(shuffleId, mapId) + logInfo(s"Got migration sub-blocks ${blocks}") + blocks.foreach { case (blockId, buffer) => +logInfo(s"Migrating sub-block ${blockId}") +bm.blockTransferService.uploadBlockSync( + peer.host, + peer.port, + peer.executorId, + blockId, + buffer, + StorageLevel.DISK_ONLY, + null)// class tag, we don't need for shuffle +logInfo(s"Migrated sub block ${blockId}") + } + logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}") + } +} +// This catch is intentionally outside of the while running block. +// if we encounter errors migrating to an executor we want to stop. + } catch { +case e: Exception => + migrating match { +case Some(shuffleMap) => + logError(s"Error ${e} during migration, adding ${shuffleMap} back to migration queue") + shufflesToMigrate.add(shuffleMap) +case None => + logError(s"Error ${e} while waiting for block to migrate") + } + } +} + } + + // Shuffles which are either in queue for migrations or migrated + private val migratingShuffles = mutable.HashSet[(Int, Long)]() + + // Shuffles which are queued for migration + private[storage] val shufflesToMigrate = +new java.util.concurrent.ConcurrentLinkedQueue[(Int, Long)]() + + @volatile private var stopped = false + + private val migrationPeers = +mutable.HashMap[BlockManagerId, (ShuffleMigrationRunnable, ExecutorService)]() + + private lazy val blockMigrationExecutor = +ThreadUtils.newDaemonSingleThreadExecutor("block-manager-decommission") + + private val blockMigrationRunnable = new Runnable { +val sleepInterval = conf.get(config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +override def run(): Unit = { + var failures = 0 +
[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-643553994 Yeah I think supporting multiple ways of reducing the number of fetch failures makes sense here. I think migration is certainly a "best-case" scenario and we can't count on in migrating everything in overcommit environments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643553741 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123946/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643553739 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
AmplabJenkins commented on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643553739 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28807: [WIP][SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
SparkQA removed a comment on pull request #28807: URL: https://github.com/apache/spark/pull/28807#issuecomment-643506277 **[Test build #123946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123946/testReport)** for PR 28807 at commit [`f8cbbb1`](https://github.com/apache/spark/commit/f8cbbb1b9f76f491224decec1d6ff45c30fb94f6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org