[GitHub] [spark] SparkQA commented on pull request #29138: [SPARK-32338] [SQL] Overload slice to accept Column for start and length
SparkQA commented on pull request #29138: URL: https://github.com/apache/spark/pull/29138#issuecomment-659906327 **[Test build #126014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126014/testReport)** for PR 29138 at commit [`8ee58cd`](https://github.com/apache/spark/commit/8ee58cdf024b02ea4e62f1b744e164efea4bb520). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29138: [SPARK-32338] [SQL] Overload slice to accept Column for start and length
SparkQA removed a comment on pull request #29138: URL: https://github.com/apache/spark/pull/29138#issuecomment-659779238 **[Test build #126014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126014/testReport)** for PR 29138 at commit [`8ee58cd`](https://github.com/apache/spark/commit/8ee58cdf024b02ea4e62f1b744e164efea4bb520). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions
dongjoon-hyun commented on a change in pull request #29133: URL: https://github.com/apache/spark/pull/29133#discussion_r456254309 ## File path: project/SparkBuild.scala ## @@ -1027,6 +1027,11 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace and duration in test cases. testOptions in Test += Tests.Argument("-oDF"), +// Show only the failed test cases in github action to make the log more readable. +testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest, + sys.env.get("GITHUB_ACTIONS").map { _ => +Seq("-eNCXEHLOPQMDSF") Review comment: It seems that you explicitly enabled all available standard error options except `W` and `U`. If then, could you describe the reason why you choose some and exclude those two, please? ``` W - without color U - unformatted mode ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
viirya commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r456254750 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,7 +99,42 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) +logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used") +val path = new Path(mrScratchDir, "-mr-1") +val scheme = Option(path.toUri.getScheme).getOrElse("") +if (scheme.equals("file")) { + logWarning("Temporary data will be written into a local file system " + +"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in local mode, " + Review comment: s"" for string interpolation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #29139: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
huaxingao commented on pull request #29139: URL: https://github.com/apache/spark/pull/29139#issuecomment-659905157 cc @srowen @viirya @zhengruifeng I scanned through. I think the doc is well written and the information is useful. Here are the screen capture: before change: https://user-images.githubusercontent.com/13592258/87757151-01bbfe80-c7bf-11ea-8159-00629a076f25.png";> after change: https://user-images.githubusercontent.com/13592258/87757162-05e81c00-c7bf-11ea-917a-8a244a9b7b6b.png";> https://user-images.githubusercontent.com/13592258/87757166-097ba300-c7bf-11ea-9278-80d3fd11a468.png";> https://user-images.githubusercontent.com/13592258/87757172-0b456680-c7bf-11ea-968c-2b1b29c186ed.png";> https://user-images.githubusercontent.com/13592258/87757175-0d0f2a00-c7bf-11ea-8a37-4a38c7c30e1f.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions
dongjoon-hyun commented on a change in pull request #29133: URL: https://github.com/apache/spark/pull/29133#discussion_r456254721 ## File path: project/SparkBuild.scala ## @@ -1027,6 +1027,11 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace and duration in test cases. testOptions in Test += Tests.Argument("-oDF"), +// Show only the failed test cases in github action to make the log more readable. +testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest, + sys.env.get("GITHUB_ACTIONS").map { _ => +Seq("-eNCXEHLOPQMDSF") Review comment: Also, could you add the link into the comment at below line 1030 because this is non-trivial? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions
dongjoon-hyun commented on a change in pull request #29133: URL: https://github.com/apache/spark/pull/29133#discussion_r456254309 ## File path: project/SparkBuild.scala ## @@ -1027,6 +1027,11 @@ object TestSettings { }.getOrElse(Nil): _*), // Show full stack trace and duration in test cases. testOptions in Test += Tests.Argument("-oDF"), +// Show only the failed test cases in github action to make the log more readable. +testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest, + sys.env.get("GITHUB_ACTIONS").map { _ => +Seq("-eNCXEHLOPQMDSF") Review comment: It seems that you explicitly enabled all available options except `W` and `U`. If then, could you describe the reason why you choose some and exclude those two, please? ``` W - without color U - unformatted mode ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-659899832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins removed a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-659899973 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins commented on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-659899973 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #29135: URL: https://github.com/apache/spark/pull/29135#issuecomment-659899832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659895646 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126010/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659895637 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.
AmplabJenkins removed a comment on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-659877788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
AmplabJenkins commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659895637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.
SparkQA commented on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-659895843 **[Test build #126034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126034/testReport)** for PR 29142 at commit [`e544ca3`](https://github.com/apache/spark/commit/e544ca3649ed6c31abdbd46eab9937adde1025b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
SparkQA removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659773807 **[Test build #126010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126010/testReport)** for PR 29104 at commit [`e44c516`](https://github.com/apache/spark/commit/e44c5163f0874804944b58cab324abbc7451f97a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
SparkQA commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659894625 **[Test build #126010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126010/testReport)** for PR 29104 at commit [`e44c516`](https://github.com/apache/spark/commit/e44c5163f0874804944b58cab324abbc7451f97a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659870428 **[Test build #126031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659893058 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
SparkQA commented on pull request #27690: URL: https://github.com/apache/spark/pull/27690#issuecomment-659893200 **[Test build #126033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126033/testReport)** for PR 27690 at commit [`dd243c2`](https://github.com/apache/spark/commit/dd243c213e5c366f9e1c765cf503e42e39d5b6d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659893058 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659892894 **[Test build #126031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
holdenk commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r456238717 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java ## @@ -61,4 +63,17 @@ public MetricSet shuffleMetrics() { // Return an empty MetricSet by default. return () -> Collections.emptyMap(); } + + /** + * Request the local disk directories, which are specified by DiskBlockManager, for the executors + * from the external shuffle service (when this is a ExternalBlockStoreClient) or BlockManager + * (when this is a NettyBlockTransferService). Note there's only one executor when this is a + * NettyBlockTransferService because we ask one specific executor at a time. Review comment: Can you clarify the last sentence here? ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1391,10 +1391,12 @@ package object config { private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED = ConfigBuilder("spark.shuffle.readHostLocalDisk") - .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is disabled and external " + -s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " + -"blocks requested from those block managers which are running on the same host are read " + -"from the disk directly instead of being fetched as remote blocks over the network.") + .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is disabled and 1) external " + +s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled or 2) ${DYN_ALLOCATION_ENABLED.key}" + +s" is disabled), shuffle blocks requested from those block managers which are running on " + Review comment: Why does dynamic allocation need to be disabled? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled
holdenk commented on pull request #28911: URL: https://github.com/apache/spark/pull/28911#issuecomment-659882158 Personally, I'd save locality changes for a follow up PR. Making changes in core is pretty hard, so long as we have a JIRA and it's a good incremental chunk of work keeping it smaller for review (and potential revert if something goes wrong) is better (of course there are situations where that isn't possible, but I think changing locality calculations would be strictly additive.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
cloud-fan commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659879374 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
Ngone51 commented on a change in pull request #29014: URL: https://github.com/apache/spark/pull/29014#discussion_r456236145 ## File path: core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +/** + * Provides more detail when an executor is being decommissioned. + * @param message Human readable reason for why the decommissioning is happening. + * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in other places) is + * being decommissioned too. Used to infer if the shuffle data might + * be lost if external shuffle service is enabled. + */ +private[spark] +case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: Boolean) { Review comment: Ok, never mind. I saw there's committer's approval in #29032. Just rebase this PR later should be fine:) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
cloud-fan closed pull request #29015: URL: https://github.com/apache/spark/pull/29015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.
AmplabJenkins commented on pull request #29142: URL: https://github.com/apache/spark/pull/29142#issuecomment-659877788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
AmplabJenkins removed a comment on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659874570 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-659875108 Addressed all comments besides the only one that - I am still keeping two ratio configs separately (SMJ and SHJ). Let me know if I need to change this. cc @maropu and @viirya, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
AmplabJenkins removed a comment on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-659874311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
AmplabJenkins removed a comment on pull request #27690: URL: https://github.com/apache/spark/pull/27690#issuecomment-659874439 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
SparkQA removed a comment on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659810256 **[Test build #126017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126017/testReport)** for PR 29128 at commit [`5f7fe1b`](https://github.com/apache/spark/commit/5f7fe1bd4d9673d52151320f3a4193c313683736). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
AmplabJenkins commented on pull request #27690: URL: https://github.com/apache/spark/pull/27690#issuecomment-659874439 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456233046 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -103,46 +119,69 @@ class CoalesceBucketsInSortMergeJoinSuite extends SQLTestUtils with SharedSparkS } test("bucket coalescing - basic") { -withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> "true") { +withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "true") { + run(JoinSetting( +RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = sortMergeJoin)) + run(JoinSetting( +RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = shuffledHashJoin, +shjBuildSide = Some(BuildLeft))) + // Coalescing bucket should not happen when the target is on shuffled hash join Review comment: @imback82 - yes, extracting this to a new test - `bucket coalescing shouldn't be applied to shuffled hash join build side`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.
ulysses-you opened a new pull request #29142: URL: https://github.com/apache/spark/pull/29142 ### What changes were proposed in this pull request? Add more aggregate function and make these case support eliminate sorts. ### Why are the changes needed? Make `EliminateSorts` match more case. ### Does this PR introduce _any_ user-facing change? Yes, if match case user will see the different execution plan. ### How was this patch tested? Not need. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
AmplabJenkins commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-659874311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
cloud-fan commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r456233068 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ## @@ -135,7 +136,16 @@ class SessionCatalog( private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = { Review comment: ah that's a good point. We should probably investigate how to design the data source API so that sources don't need to infer schema can skip this cache. It's hard to use the JDBC data source as we need to run REFRESH TABLE (or wait for TTL after this PR) once the table is changed outside of spark (which is common to JDBC source). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
AmplabJenkins commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659874570 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
SparkQA commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-659874701 **[Test build #126032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126032/testReport)** for PR 29079 at commit [`d620940`](https://github.com/apache/spark/commit/d6209407731bbed2602c1d6a05c7c50982561faf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
SparkQA commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659873878 **[Test build #126017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126017/testReport)** for PR 29128 at commit [`5f7fe1b`](https://github.com/apache/spark/commit/5f7fe1bd4d9673d52151320f3a4193c313683736). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456232826 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -178,7 +235,16 @@ class CoalesceBucketsInSortMergeJoinSuite extends SQLTestUtils with SharedSparkS rightKeys = rCols.reverse, leftRelation = lRel, rightRelation = RelationSetting(rCols, 8, Some(4)), -isSortMergeJoin = true)) +joinOperator = sortMergeJoin, +shjBuildSide = None)) + + run(JoinSetting( +leftKeys = lCols.reverse, +rightKeys = rCols.reverse, +leftRelation = lRel, +rightRelation = RelationSetting(rCols, 8, Some(4)), +joinOperator = shuffledHashJoin, +shjBuildSide = Some(BuildLeft))) Review comment: @imback82 - updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456232773 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight} +import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.joins.{BaseJoinExec, ShuffledHashJoinExec, SortMergeJoinExec} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin` + * if the following conditions are met: + * - Two bucketed tables are joined. + * - Join keys match with output partition expressions on their respective sides. + * - The larger bucket number is divisible by the smaller bucket number. + * - COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true. + * - The ratio of the number of buckets is less than the value set in + * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO (`SortMergeJoin`) or, + * COALESCE_BUCKETS_IN_SHUFFLED_HASH_JOIN_MAX_BUCKET_RATIO (`ShuffledHashJoin`). + */ +case class CoalesceBucketsInJoin(conf: SQLConf) extends Rule[SparkPlan] { + private def updateNumCoalescedBuckets( + join: BaseJoinExec, + numLeftBuckets: Int, + numRightBucket: Int, + numCoalescedBuckets: Int): BaseJoinExec = { +if (numCoalescedBuckets != numLeftBuckets) { + val leftCoalescedChild = join.left transformUp { +case f: FileSourceScanExec => + f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) + } + join match { +case j: SortMergeJoinExec => j.copy(left = leftCoalescedChild) +case j: ShuffledHashJoinExec => j.copy(left = leftCoalescedChild) + } +} else { + val rightCoalescedChild = join.right transformUp { +case f: FileSourceScanExec => + f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) + } + join match { +case j: SortMergeJoinExec => j.copy(right = rightCoalescedChild) +case j: ShuffledHashJoinExec => j.copy(right = rightCoalescedChild) + } +} + } + + private def isCoalesceSHJStreamSide( + join: ShuffledHashJoinExec, + numLeftBuckets: Int, + numRightBucket: Int, + numCoalescedBuckets: Int): Boolean = { +if (numCoalescedBuckets == numLeftBuckets) { + join.buildSide != BuildRight +} else { + join.buildSide != BuildLeft +} + } + + def apply(plan: SparkPlan): SparkPlan = { +if (!conf.coalesceBucketsInJoinEnabled) { + return plan +} + +plan transform { + case ExtractJoinWithBuckets(join, numLeftBuckets, numRightBuckets) => +val bucketRatio = math.max(numLeftBuckets, numRightBuckets) / + math.min(numLeftBuckets, numRightBuckets) +val numCoalescedBuckets = math.min(numLeftBuckets, numRightBuckets) +join match { + case j: SortMergeJoinExec +if bucketRatio <= conf.coalesceBucketsInSortMergeJoinMaxBucketRatio => +updateNumCoalescedBuckets(j, numLeftBuckets, numRightBuckets, numCoalescedBuckets) + case j: ShuffledHashJoinExec +// Only coalesce the buckets for shuffled hash join stream side, +// to avoid OOM for build side. +if bucketRatio <= conf.coalesceBucketsInShuffledHashJoinMaxBucketRatio && + isCoalesceSHJStreamSide(j, numLeftBuckets, numRightBuckets, numCoalescedBuckets) => +updateNumCoalescedBuckets(j, numLeftBuckets, numRightBuckets, numCoalescedBuckets) + case other => other +} + case other => other +} + } +} + +/** + * An extractor that extracts `SortMergeJoinE
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456232703 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -103,46 +119,69 @@ class CoalesceBucketsInSortMergeJoinSuite extends SQLTestUtils with SharedSparkS } test("bucket coalescing - basic") { -withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> "true") { +withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "true") { + run(JoinSetting( +RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = sortMergeJoin)) + run(JoinSetting( +RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = shuffledHashJoin, +shjBuildSide = Some(BuildLeft))) + // Coalescing bucket should not happen when the target is on shuffled hash join + // build side. run(JoinSetting( -RelationSetting(4, None), RelationSetting(8, Some(4)), isSortMergeJoin = true)) +RelationSetting(4, None), RelationSetting(8, None), joinOperator = shuffledHashJoin, +shjBuildSide = Some(BuildRight))) } -withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> "false") { - run(JoinSetting(RelationSetting(4, None), RelationSetting(8, None), isSortMergeJoin = true)) +withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "false") { + run(JoinSetting( +RelationSetting(4, None), RelationSetting(8, None), joinOperator = broadcastHashJoin)) Review comment: @cloud-fan - updated with extra test for SMJ. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zsxwing commented on pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround
zsxwing commented on pull request #29131: URL: https://github.com/apache/spark/pull/29131#issuecomment-659873692 Thanks for raising the PR. Could you clarify what's the cost to keep this? I believe KAFKA-7703 has been fixed since you have verified it using my reproduction codes. However I'd be more conservative. Although I did report KAFKA-7703, I didn't have any evidence that this was exactly the issue we hit in production, or that was the only possible issue. There were no enough logs to prove it unfortunately. What I know is the workaround we patched in Spark did prevent Kafka consumer from reporting incorrect offsets, but it could hide other potential unknown issues. Currently there is no Spark release using Kafka 2.5.0, so I don't feel confident that there are no other unknown issues causing the same incorrect offset issue. If the cost to keep this workaround is minor, can we wait until a Spark release using Kafka 2.5.0 is out for a while? Once there is a Spark release available and people start to use it, I can look at our internal logs to see if the warning log in `fetchLatestOffsets` is really gone, which will be an evidence to prove KAFKA-7703 is likely the only issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456232535 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight} +import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.joins.{BaseJoinExec, ShuffledHashJoinExec, SortMergeJoinExec} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin` + * if the following conditions are met: + * - Two bucketed tables are joined. + * - Join keys match with output partition expressions on their respective sides. + * - The larger bucket number is divisible by the smaller bucket number. + * - COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true. + * - The ratio of the number of buckets is less than the value set in + * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO (`SortMergeJoin`) or, + * COALESCE_BUCKETS_IN_SHUFFLED_HASH_JOIN_MAX_BUCKET_RATIO (`ShuffledHashJoin`). + */ +case class CoalesceBucketsInJoin(conf: SQLConf) extends Rule[SparkPlan] { + private def updateNumCoalescedBuckets( + join: BaseJoinExec, + numLeftBuckets: Int, + numRightBucket: Int, + numCoalescedBuckets: Int): BaseJoinExec = { +if (numCoalescedBuckets != numLeftBuckets) { + val leftCoalescedChild = join.left transformUp { +case f: FileSourceScanExec => + f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) + } Review comment: @maropu - sure. updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable
c21 commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r456232607 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -19,17 +19,21 @@ package org.apache.spark.sql.execution.bucketing import org.apache.spark.sql.catalyst.catalog.BucketSpec import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} -import org.apache.spark.sql.catalyst.optimizer.BuildLeft +import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight, BuildSide} import org.apache.spark.sql.catalyst.plans.Inner import org.apache.spark.sql.execution.{BinaryExecNode, FileSourceScanExec, SparkPlan} import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, InMemoryFileIndex, PartitionSpec} import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat -import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, SortMergeJoinExec} +import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, ShuffledHashJoinExec, SortMergeJoinExec} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} import org.apache.spark.sql.types.{IntegerType, StructType} -class CoalesceBucketsInSortMergeJoinSuite extends SQLTestUtils with SharedSparkSession { +class CoalesceBucketsInJoinSuite extends SQLTestUtils with SharedSparkSession { + private val sortMergeJoin = "sortMergeJoin" Review comment: @cloud-fan - sure. updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r456232227 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,46 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) +logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used") +val path = new Path(mrScratchDir, "-mr-1") +val scheme = Option(path.toUri.getScheme).getOrElse("") +if (scheme.equals("file")) { + logWarning(s"Temporary data will be written into a local file system " + +s"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in local mode, " + +s"you might need to configure 'hive.exec.scratchdir' " + +s"to use accessible file system (e.g. HDFS path) from any executors in the cluster.") Review comment: Removed `s` in the head. BTW there are a lot of existing code which includes it, but I left it as it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
moomindani commented on a change in pull request #27690: URL: https://github.com/apache/spark/pull/27690#discussion_r456231877 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -97,12 +99,46 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { options = Map.empty) } - protected def getExternalTmpPath( + // Mostly copied from Context.java#getMRTmpPath of Hive 2.3. + // Visible for testing. + private[execution] def getNonBlobTmpPath( + hadoopConf: Configuration, + sessionScratchDir: String, + scratchDir: String): Path = { + +// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1', +// which is ruled by 'hive.exec.scratchdir' including file system. +// This is the same as Spark's #oldVersionExternalTempPath. +// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090. +// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir' +// Here it uses session_path unless it's emtpy, otherwise uses scratchDir. +val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir +val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath) +logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used") +val path = new Path(mrScratchDir, "-mr-1") +val scheme = Option(path.toUri.getScheme).getOrElse("") +if (scheme.equals("file")) { + logWarning(s"Temporary data will be written into a local file system " + +s"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in local mode, " + +s"you might need to configure 'hive.exec.scratchdir' " + +s"to use accessible file system (e.g. HDFS path) from any executors in the cluster.") +} +path + } + + private def supportSchemeToUseNonBlobStore(path: Path): Boolean = { +path != null && { + val supportedBlobSchemes = SQLConf.get.supportedSchemesToUseNonBlobstore + val scheme = Option(path.toUri.getScheme).getOrElse("") + Utils.stringToSeq(supportedBlobSchemes).contains(scheme.toLowerCase(Locale.ROOT)) +} + } + + def getExternalTmpPath( sparkSession: SparkSession, hadoopConf: Configuration, path: Path): Path = { import org.apache.spark.sql.hive.client.hive._ - Review comment: Thanks for pointing it. Reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
cloud-fan commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659872039 LGTM. It's a much simpler and robust solution! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659870258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659870428 **[Test build #126031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659848866 **[Test build #126028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659870258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659870049 **[Test build #126028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 edited a comment on pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.
venkata91 edited a comment on pull request #28994: URL: https://github.com/apache/spark/pull/28994#issuecomment-659869847 This is an interesting idea and a good start. Just considering the runTime of a task alone might not be useful in many cases. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.
venkata91 commented on pull request #28994: URL: https://github.com/apache/spark/pull/28994#issuecomment-659869847 This is an interesting idea and a good start. Just considering the runTime of a task alone might not be useful in many cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on a change in pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.
venkata91 commented on a change in pull request #28994: URL: https://github.com/apache/spark/pull/28994#discussion_r456228668 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ## @@ -1125,6 +1142,78 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = { recomputeLocality() } + + /** + * A class for checking inefficient tasks to be speculated, the inefficient tasks come from + * the tasks which may be speculated by the previous strategy. + */ + private class InefficientTask { +private var taskData: Map[Long, TaskData] = null +private var successTaskProgress = 0.0 +private val checkInefficientTask = speculationTaskMinDuration > 0 + +if (checkInefficientTask) { + val appStatusStore = sched.sc.statusTracker.getAppStatusStore + if (appStatusStore != null) { +successTaskProgress = + computeSuccessTaskProgress(taskSet.stageId, taskSet.stageAttemptId, appStatusStore) +val stageData = appStatusStore.stageAttempt(taskSet.stageId, taskSet.stageAttemptId, true) +if (stageData != null) { + taskData = stageData._1.tasks.orNull +} + } +} + +private def computeSuccessTaskProgress(stageId: Int, stageAttemptId: Int, + appStatusStore: AppStatusStore): Double = { + var sumInputRecords, sumShuffleReadRecords, sumExecutorRunTime = 0.0 + appStatusStore.taskList(stageId, stageAttemptId, Int.MaxValue).filter { +_.status == "SUCCESS" + }.map(_.taskMetrics).filter(_.isDefined).map(_.get).foreach { task => +if (task.inputMetrics != null) { + sumInputRecords += task.inputMetrics.recordsRead +} Review comment: how about recordsWritten? Should that also be considered wrt progress same wrt shuffleRecordsWritten? ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ## @@ -1125,6 +1142,78 @@ private[spark] class TaskSetManager( def executorAdded(): Unit = { recomputeLocality() } + + /** + * A class for checking inefficient tasks to be speculated, the inefficient tasks come from + * the tasks which may be speculated by the previous strategy. + */ + private class InefficientTask { +private var taskData: Map[Long, TaskData] = null +private var successTaskProgress = 0.0 +private val checkInefficientTask = speculationTaskMinDuration > 0 + +if (checkInefficientTask) { + val appStatusStore = sched.sc.statusTracker.getAppStatusStore + if (appStatusStore != null) { +successTaskProgress = + computeSuccessTaskProgress(taskSet.stageId, taskSet.stageAttemptId, appStatusStore) +val stageData = appStatusStore.stageAttempt(taskSet.stageId, taskSet.stageAttemptId, true) +if (stageData != null) { + taskData = stageData._1.tasks.orNull +} + } +} + +private def computeSuccessTaskProgress(stageId: Int, stageAttemptId: Int, + appStatusStore: AppStatusStore): Double = { + var sumInputRecords, sumShuffleReadRecords, sumExecutorRunTime = 0.0 + appStatusStore.taskList(stageId, stageAttemptId, Int.MaxValue).filter { +_.status == "SUCCESS" + }.map(_.taskMetrics).filter(_.isDefined).map(_.get).foreach { task => +if (task.inputMetrics != null) { + sumInputRecords += task.inputMetrics.recordsRead +} +if (task.shuffleReadMetrics != null) { + sumShuffleReadRecords += task.shuffleReadMetrics.recordsRead +} +sumExecutorRunTime += task.executorRunTime + } + if (sumExecutorRunTime > 0) { +(sumInputRecords + sumShuffleReadRecords) / (sumExecutorRunTime / 1000.0) + } else 0 +} + +def maySpeculateTask(tid: Long, runtimeMs: Long, taskInfo: TaskInfo): Boolean = { + // note: 1) only check inefficient tasks when 'SPECULATION_TASK_DURATION_THRESHOLD' > 0. + // 2) some tasks may have neither input records nor shuffleRead records, so + // the 'successTaskProgress' may be zero all the time, this case we should not consider, + // eg: some spark-sql like that 'msck repair table' or 'drop table' and so on. + if (!checkInefficientTask || successTaskProgress <= 0) { +true + } else if (runtimeMs < speculationTaskMinDuration) { +false + } else if (taskData != null && taskData.contains(tid) && taskData(tid) != null && +taskData(tid).taskMetrics.isDefined) { +val taskMetrics = taskData(tid).taskMetrics.get +val currentTaskProgressRate = (taskMetrics.inputMetrics.recordsRead + Review comment: would it make sense to add taskProgress as part of taskMetrics that way it can also be shown in SparkUI? Although taskProgress for tasks which doesn't involve input/output/shuffle records would be hard to measure? ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala #
[GitHub] [spark] SparkQA removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join
SparkQA removed a comment on pull request #29120: URL: https://github.com/apache/spark/pull/29120#issuecomment-659819077 **[Test build #126021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126021/testReport)** for PR 29120 at commit [`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join
AmplabJenkins removed a comment on pull request #29120: URL: https://github.com/apache/spark/pull/29120#issuecomment-659862630 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126021/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
AmplabJenkins removed a comment on pull request #29141: URL: https://github.com/apache/spark/pull/29141#issuecomment-659852368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join
AmplabJenkins removed a comment on pull request #29120: URL: https://github.com/apache/spark/pull/29120#issuecomment-659862619 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
SparkQA commented on pull request #29141: URL: https://github.com/apache/spark/pull/29141#issuecomment-659862760 **[Test build #126030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126030/testReport)** for PR 29141 at commit [`3210002`](https://github.com/apache/spark/commit/321000236e5571545912af0db1c02a3fa06f1a9a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join
AmplabJenkins commented on pull request #29120: URL: https://github.com/apache/spark/pull/29120#issuecomment-659862619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join
SparkQA commented on pull request #29120: URL: https://github.com/apache/spark/pull/29120#issuecomment-659862399 **[Test build #126021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126021/testReport)** for PR 29120 at commit [`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang edited a comment on pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions
gengliangwang edited a comment on pull request #29133: URL: https://github.com/apache/spark/pull/29133#issuecomment-659821378 @HyukjinKwon I have updated the PR description. Meanwhile, I created a PR on my repo to see what the test failure log will look like: https://github.com/gengliangwang/spark/pull/6 Here is an example of failed log output: https://github.com/gengliangwang/spark/pull/6/checks?check_run_id=880362871 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
dongjoon-hyun commented on pull request #29141: URL: https://github.com/apache/spark/pull/29141#issuecomment-659859923 Thank you, @cloud-fan ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking edited a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group
xuanyuanking edited a comment on pull request #28977: URL: https://github.com/apache/spark/pull/28977#issuecomment-659327525 Summary for separating all `hive.execution` suites Test | Worker | Scala test time | - | - https://github.com/apache/spark/pull/28977#issuecomment-659309943 | worker-03 | s https://github.com/apache/spark/pull/28977#issuecomment-659486466 | worker-04 | 8403s This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659856381 **[Test build #126029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126029/testReport)** for PR 29032 at commit [`9356fac`](https://github.com/apache/spark/commit/9356facb887328a2e781f46dc533f41eb6751392). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659856207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659856207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659856015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659856015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
SparkQA removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659746319 **[Test build #126007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126007/testReport)** for PR 28840 at commit [`94fa132`](https://github.com/apache/spark/commit/94fa132ca4d58f631cc7666e25b126bc28c7f34e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
SparkQA commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659855587 **[Test build #126007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126007/testReport)** for PR 28840 at commit [`94fa132`](https://github.com/apache/spark/commit/94fa132ca4d58f631cc7666e25b126bc28c7f34e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659855366 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
viirya commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r456219067 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ## @@ -135,7 +136,16 @@ class SessionCatalog( private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = { Review comment: Hmm, I think this cache is still useful for avoiding inferring schema again. This is also an expensive operation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
AmplabJenkins commented on pull request #29141: URL: https://github.com/apache/spark/pull/29141#issuecomment-659852368 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
cloud-fan commented on pull request #29141: URL: https://github.com/apache/spark/pull/29141#issuecomment-659852057 cc @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan opened a new pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value
cloud-fan opened a new pull request #29141: URL: https://github.com/apache/spark/pull/29141 backport https://github.com/apache/spark/pull/29125 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659850454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659850454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659849344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659849344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659848866 **[Test build #126028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)** for PR 29117 at commit [`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29140: [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation
cloud-fan closed pull request #29140: URL: https://github.com/apache/spark/pull/29140 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29140: [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation
cloud-fan commented on pull request #29140: URL: https://github.com/apache/spark/pull/29140#issuecomment-659848160 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
cloud-fan commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659846826 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
AmplabJenkins removed a comment on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659845800 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
AmplabJenkins commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659845800 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
cloud-fan commented on a change in pull request #28852: URL: https://github.com/apache/spark/pull/28852#discussion_r456214684 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ## @@ -135,7 +136,16 @@ class SessionCatalog( private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = { Review comment: For external data sources, it's common that data are changed outside of Spark. I think it's more important to make sure we get the latest data in a new query. Maybe we should disable this relation cache by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659844495 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126025/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
SparkQA commented on pull request #28852: URL: https://github.com/apache/spark/pull/28852#issuecomment-659844682 **[Test build #126027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126027/testReport)** for PR 28852 at commit [`3e761dc`](https://github.com/apache/spark/commit/3e761dcd790b9c30e5cee7bffe916dfc2c82b7a5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
SparkQA removed a comment on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659773878 **[Test build #126012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126012/testReport)** for PR 29015 at commit [`31b231e`](https://github.com/apache/spark/commit/31b231e1b0a984ebdfc408beedaadeec6881ddff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
SparkQA commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659844348 **[Test build #126012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126012/testReport)** for PR 29015 at commit [`31b231e`](https://github.com/apache/spark/commit/31b231e1b0a984ebdfc408beedaadeec6881ddff). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class DecommissionWorkersOnHosts(hostnames: Seq[String])` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659834294 **[Test build #126025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126025/testReport)** for PR 29117 at commit [`f6207b0`](https://github.com/apache/spark/commit/f6207b038a67b575b65ed4adba1c407b6a0d0ecd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659844490 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-65984 **[Test build #126025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126025/testReport)** for PR 29117 at commit [`f6207b0`](https://github.com/apache/spark/commit/f6207b038a67b575b65ed4adba1c407b6a0d0ecd). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659844490 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org