[GitHub] [spark] SparkQA commented on pull request #29138: [SPARK-32338] [SQL] Overload slice to accept Column for start and length

2020-07-16 Thread GitBox


SparkQA commented on pull request #29138:
URL: https://github.com/apache/spark/pull/29138#issuecomment-659906327


   **[Test build #126014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126014/testReport)**
 for PR 29138 at commit 
[`8ee58cd`](https://github.com/apache/spark/commit/8ee58cdf024b02ea4e62f1b744e164efea4bb520).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29138: [SPARK-32338] [SQL] Overload slice to accept Column for start and length

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29138:
URL: https://github.com/apache/spark/pull/29138#issuecomment-659779238


   **[Test build #126014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126014/testReport)**
 for PR 29138 at commit 
[`8ee58cd`](https://github.com/apache/spark/commit/8ee58cdf024b02ea4e62f1b744e164efea4bb520).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-16 Thread GitBox


dongjoon-hyun commented on a change in pull request #29133:
URL: https://github.com/apache/spark/pull/29133#discussion_r456254309



##
File path: project/SparkBuild.scala
##
@@ -1027,6 +1027,11 @@ object TestSettings {
   }.getOrElse(Nil): _*),
 // Show full stack trace and duration in test cases.
 testOptions in Test += Tests.Argument("-oDF"),
+// Show only the failed test cases in github action to make the log more 
readable.
+testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest,
+  sys.env.get("GITHUB_ACTIONS").map { _ =>
+Seq("-eNCXEHLOPQMDSF")

Review comment:
   It seems that you explicitly enabled all available standard error 
options except `W` and `U`. If then, could you describe the reason why you 
choose some and exclude those two, please?
   ```
   W - without color
   U - unformatted mode
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


viirya commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r456254750



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,7 +99,42 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)
+logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used")
+val path = new Path(mrScratchDir, "-mr-1")
+val scheme = Option(path.toUri.getScheme).getOrElse("")
+if (scheme.equals("file")) {
+  logWarning("Temporary data will be written into a local file system " +
+"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in 
local mode, " +

Review comment:
   s"" for string interpolation. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29139: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-16 Thread GitBox


huaxingao commented on pull request #29139:
URL: https://github.com/apache/spark/pull/29139#issuecomment-659905157


   cc @srowen @viirya @zhengruifeng 
   
   I scanned through. I think the doc is well written and the information is 
useful. 
   
   Here are the screen capture:
   
   before change:
   https://user-images.githubusercontent.com/13592258/87757151-01bbfe80-c7bf-11ea-8159-00629a076f25.png";>
   
   after change:
   https://user-images.githubusercontent.com/13592258/87757162-05e81c00-c7bf-11ea-917a-8a244a9b7b6b.png";>
   https://user-images.githubusercontent.com/13592258/87757166-097ba300-c7bf-11ea-9278-80d3fd11a468.png";>
   https://user-images.githubusercontent.com/13592258/87757172-0b456680-c7bf-11ea-968c-2b1b29c186ed.png";>
   https://user-images.githubusercontent.com/13592258/87757175-0d0f2a00-c7bf-11ea-8a37-4a38c7c30e1f.png";>
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-16 Thread GitBox


dongjoon-hyun commented on a change in pull request #29133:
URL: https://github.com/apache/spark/pull/29133#discussion_r456254721



##
File path: project/SparkBuild.scala
##
@@ -1027,6 +1027,11 @@ object TestSettings {
   }.getOrElse(Nil): _*),
 // Show full stack trace and duration in test cases.
 testOptions in Test += Tests.Argument("-oDF"),
+// Show only the failed test cases in github action to make the log more 
readable.
+testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest,
+  sys.env.get("GITHUB_ACTIONS").map { _ =>
+Seq("-eNCXEHLOPQMDSF")

Review comment:
   Also, could you add the link into the comment at below line 1030 because 
this is non-trivial?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-16 Thread GitBox


dongjoon-hyun commented on a change in pull request #29133:
URL: https://github.com/apache/spark/pull/29133#discussion_r456254309



##
File path: project/SparkBuild.scala
##
@@ -1027,6 +1027,11 @@ object TestSettings {
   }.getOrElse(Nil): _*),
 // Show full stack trace and duration in test cases.
 testOptions in Test += Tests.Argument("-oDF"),
+// Show only the failed test cases in github action to make the log more 
readable.
+testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest,
+  sys.env.get("GITHUB_ACTIONS").map { _ =>
+Seq("-eNCXEHLOPQMDSF")

Review comment:
   It seems that you explicitly enabled all available options except `W` 
and `U`. If then, could you describe the reason why you choose some and exclude 
those two, please?
   ```
   W - without color
   U - unformatted mode
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-659899832







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-659899973







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-659899973







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29135: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29135:
URL: https://github.com/apache/spark/pull/29135#issuecomment-659899832







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659895646


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126010/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659895637


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29142:
URL: https://github.com/apache/spark/pull/29142#issuecomment-659877788







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659895637







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.

2020-07-16 Thread GitBox


SparkQA commented on pull request #29142:
URL: https://github.com/apache/spark/pull/29142#issuecomment-659895843


   **[Test build #126034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126034/testReport)**
 for PR 29142 at commit 
[`e544ca3`](https://github.com/apache/spark/commit/e544ca3649ed6c31abdbd46eab9937adde1025b9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659773807


   **[Test build #126010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126010/testReport)**
 for PR 29104 at commit 
[`e44c516`](https://github.com/apache/spark/commit/e44c5163f0874804944b58cab324abbc7451f97a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-16 Thread GitBox


SparkQA commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659894625


   **[Test build #126010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126010/testReport)**
 for PR 29104 at commit 
[`e44c516`](https://github.com/apache/spark/commit/e44c5163f0874804944b58cab324abbc7451f97a).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659870428


   **[Test build #126031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659893058







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


SparkQA commented on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-659893200


   **[Test build #126033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126033/testReport)**
 for PR 27690 at commit 
[`dd243c2`](https://github.com/apache/spark/commit/dd243c213e5c366f9e1c765cf503e42e39d5b6d6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659893058







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659892894


   **[Test build #126031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-16 Thread GitBox


holdenk commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r456238717



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##
@@ -61,4 +63,17 @@ public MetricSet shuffleMetrics() {
 // Return an empty MetricSet by default.
 return () -> Collections.emptyMap();
   }
+
+  /**
+   * Request the local disk directories, which are specified by 
DiskBlockManager, for the executors
+   * from the external shuffle service (when this is a 
ExternalBlockStoreClient) or BlockManager
+   * (when this is a NettyBlockTransferService). Note there's only one 
executor when this is a
+   * NettyBlockTransferService because we ask one specific executor at a time.

Review comment:
   Can you clarify the last sentence here?

##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1391,10 +1391,12 @@ package object config {
 
   private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
 ConfigBuilder("spark.shuffle.readHostLocalDisk")
-  .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled and external " +
-s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " +
-"blocks requested from those block managers which are running on the 
same host are read " +
-"from the disk directly instead of being fetched as remote blocks over 
the network.")
+  .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled and 1) external " +
+s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled or 2) 
${DYN_ALLOCATION_ENABLED.key}" +
+s" is disabled), shuffle blocks requested from those block managers 
which are running on " +

Review comment:
   Why does dynamic allocation need to be disabled?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-07-16 Thread GitBox


holdenk commented on pull request #28911:
URL: https://github.com/apache/spark/pull/28911#issuecomment-659882158


   Personally, I'd save locality changes for a follow up PR. Making changes in 
core is pretty hard, so long as we have a JIRA and it's a good incremental 
chunk of work keeping it smaller for review (and potential revert if something 
goes wrong) is better (of course there are situations where that isn't 
possible, but I think changing locality calculations would be strictly 
additive.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


cloud-fan commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659879374


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-16 Thread GitBox


Ngone51 commented on a change in pull request #29014:
URL: https://github.com/apache/spark/pull/29014#discussion_r456236145



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+/**
+ * Provides more detail when an executor is being decommissioned.
+ * @param message Human readable reason for why the decommissioning is 
happening.
+ * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
+ * being decommissioned too. Used to infer if the 
shuffle data might
+ * be lost if external shuffle service is enabled.
+ */
+private[spark]
+case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean) {

Review comment:
   Ok, never mind. I saw there's committer's approval in #29032. Just 
rebase this PR later should be fine:)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


cloud-fan closed pull request #29015:
URL: https://github.com/apache/spark/pull/29015


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29142:
URL: https://github.com/apache/spark/pull/29142#issuecomment-659877788







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659874570







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-659875108


   Addressed all comments besides the only one that - I am still keeping two 
ratio configs separately (SMJ and SHJ). Let me know if I need to change this. 
cc @maropu and @viirya, thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-659874311







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-659874439







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659810256


   **[Test build #126017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126017/testReport)**
 for PR 29128 at commit 
[`5f7fe1b`](https://github.com/apache/spark/commit/5f7fe1bd4d9673d52151320f3a4193c313683736).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #27690:
URL: https://github.com/apache/spark/pull/27690#issuecomment-659874439







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456233046



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
##
@@ -103,46 +119,69 @@ class CoalesceBucketsInSortMergeJoinSuite extends 
SQLTestUtils with SharedSparkS
   }
 
   test("bucket coalescing - basic") {
-withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> 
"true") {
+withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "true") {
+  run(JoinSetting(
+RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = 
sortMergeJoin))
+  run(JoinSetting(
+RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = 
shuffledHashJoin,
+shjBuildSide = Some(BuildLeft)))
+  // Coalescing bucket should not happen when the target is on shuffled 
hash join

Review comment:
   @imback82 - yes, extracting this to a new test - `bucket coalescing 
shouldn't be applied to shuffled hash join build side`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you opened a new pull request #29142: [SPARK-29343][SQL][FOLLOW-UP] Add more aggregate function to support eliminate sorts.

2020-07-16 Thread GitBox


ulysses-you opened a new pull request #29142:
URL: https://github.com/apache/spark/pull/29142


   
   
   ### What changes were proposed in this pull request?
   
   Add more aggregate function and make these case support eliminate sorts.
   
   ### Why are the changes needed?
   
   Make `EliminateSorts` match more case.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, if match case user will see the different execution plan.
   
   ### How was this patch tested?
   
   Not  need.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-659874311







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-16 Thread GitBox


cloud-fan commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r456233068



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##
@@ -135,7 +136,16 @@ class SessionCatalog(
 
   private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {

Review comment:
   ah that's a good point. We should probably investigate how to design the 
data source API so that sources don't need to infer schema can skip this cache. 
It's hard to use the JDBC data source as we need to run REFRESH TABLE (or wait 
for TTL after this PR) once the table is changed outside of spark (which is 
common to JDBC source).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659874570







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


SparkQA commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-659874701


   **[Test build #126032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126032/testReport)**
 for PR 29079 at commit 
[`d620940`](https://github.com/apache/spark/commit/d6209407731bbed2602c1d6a05c7c50982561faf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox


SparkQA commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659873878


   **[Test build #126017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126017/testReport)**
 for PR 29128 at commit 
[`5f7fe1b`](https://github.com/apache/spark/commit/5f7fe1bd4d9673d52151320f3a4193c313683736).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456232826



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
##
@@ -178,7 +235,16 @@ class CoalesceBucketsInSortMergeJoinSuite extends 
SQLTestUtils with SharedSparkS
 rightKeys = rCols.reverse,
 leftRelation = lRel,
 rightRelation = RelationSetting(rCols, 8, Some(4)),
-isSortMergeJoin = true))
+joinOperator = sortMergeJoin,
+shjBuildSide = None))
+
+  run(JoinSetting(
+leftKeys = lCols.reverse,
+rightKeys = rCols.reverse,
+leftRelation = lRel,
+rightRelation = RelationSetting(rCols, 8, Some(4)),
+joinOperator = shuffledHashJoin,
+shjBuildSide = Some(BuildLeft)))

Review comment:
   @imback82 - updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456232773



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import scala.annotation.tailrec
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight}
+import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, 
ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.joins.{BaseJoinExec, 
ShuffledHashJoinExec, SortMergeJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin`
+ * if the following conditions are met:
+ *   - Two bucketed tables are joined.
+ *   - Join keys match with output partition expressions on their respective 
sides.
+ *   - The larger bucket number is divisible by the smaller bucket number.
+ *   - COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true.
+ *   - The ratio of the number of buckets is less than the value set in
+ * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO (`SortMergeJoin`) 
or,
+ * COALESCE_BUCKETS_IN_SHUFFLED_HASH_JOIN_MAX_BUCKET_RATIO 
(`ShuffledHashJoin`).
+ */
+case class CoalesceBucketsInJoin(conf: SQLConf) extends Rule[SparkPlan] {
+  private def updateNumCoalescedBuckets(
+  join: BaseJoinExec,
+  numLeftBuckets: Int,
+  numRightBucket: Int,
+  numCoalescedBuckets: Int): BaseJoinExec = {
+if (numCoalescedBuckets != numLeftBuckets) {
+  val leftCoalescedChild = join.left transformUp {
+case f: FileSourceScanExec =>
+  f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets))
+  }
+  join match {
+case j: SortMergeJoinExec => j.copy(left = leftCoalescedChild)
+case j: ShuffledHashJoinExec => j.copy(left = leftCoalescedChild)
+  }
+} else {
+  val rightCoalescedChild = join.right transformUp {
+case f: FileSourceScanExec =>
+  f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets))
+  }
+  join match {
+case j: SortMergeJoinExec => j.copy(right = rightCoalescedChild)
+case j: ShuffledHashJoinExec => j.copy(right = rightCoalescedChild)
+  }
+}
+  }
+
+  private def isCoalesceSHJStreamSide(
+  join: ShuffledHashJoinExec,
+  numLeftBuckets: Int,
+  numRightBucket: Int,
+  numCoalescedBuckets: Int): Boolean = {
+if (numCoalescedBuckets == numLeftBuckets) {
+  join.buildSide != BuildRight
+} else {
+  join.buildSide != BuildLeft
+}
+  }
+
+  def apply(plan: SparkPlan): SparkPlan = {
+if (!conf.coalesceBucketsInJoinEnabled) {
+  return plan
+}
+
+plan transform {
+  case ExtractJoinWithBuckets(join, numLeftBuckets, numRightBuckets) =>
+val bucketRatio = math.max(numLeftBuckets, numRightBuckets) /
+  math.min(numLeftBuckets, numRightBuckets)
+val numCoalescedBuckets = math.min(numLeftBuckets, numRightBuckets)
+join match {
+  case j: SortMergeJoinExec
+if bucketRatio <= 
conf.coalesceBucketsInSortMergeJoinMaxBucketRatio =>
+updateNumCoalescedBuckets(j, numLeftBuckets, numRightBuckets, 
numCoalescedBuckets)
+  case j: ShuffledHashJoinExec
+// Only coalesce the buckets for shuffled hash join stream side,
+// to avoid OOM for build side.
+if bucketRatio <= 
conf.coalesceBucketsInShuffledHashJoinMaxBucketRatio &&
+  isCoalesceSHJStreamSide(j, numLeftBuckets, numRightBuckets, 
numCoalescedBuckets) =>
+updateNumCoalescedBuckets(j, numLeftBuckets, numRightBuckets, 
numCoalescedBuckets)
+  case other => other
+}
+  case other => other
+}
+  }
+}
+
+/**
+ * An extractor that extracts `SortMergeJoinE

[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456232703



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
##
@@ -103,46 +119,69 @@ class CoalesceBucketsInSortMergeJoinSuite extends 
SQLTestUtils with SharedSparkS
   }
 
   test("bucket coalescing - basic") {
-withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> 
"true") {
+withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "true") {
+  run(JoinSetting(
+RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = 
sortMergeJoin))
+  run(JoinSetting(
+RelationSetting(4, None), RelationSetting(8, Some(4)), joinOperator = 
shuffledHashJoin,
+shjBuildSide = Some(BuildLeft)))
+  // Coalescing bucket should not happen when the target is on shuffled 
hash join
+  // build side.
   run(JoinSetting(
-RelationSetting(4, None), RelationSetting(8, Some(4)), isSortMergeJoin 
= true))
+RelationSetting(4, None), RelationSetting(8, None), joinOperator = 
shuffledHashJoin,
+shjBuildSide = Some(BuildRight)))
 }
-withSQLConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key -> 
"false") {
-  run(JoinSetting(RelationSetting(4, None), RelationSetting(8, None), 
isSortMergeJoin = true))
+withSQLConf(SQLConf.COALESCE_BUCKETS_IN_JOIN_ENABLED.key -> "false") {
+  run(JoinSetting(
+RelationSetting(4, None), RelationSetting(8, None), joinOperator = 
broadcastHashJoin))

Review comment:
   @cloud-fan - updated with extra test for SMJ.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zsxwing commented on pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround

2020-07-16 Thread GitBox


zsxwing commented on pull request #29131:
URL: https://github.com/apache/spark/pull/29131#issuecomment-659873692


   Thanks for raising the PR. Could you clarify what's the cost to keep this?
   
   I believe KAFKA-7703 has been fixed since you have verified it using my 
reproduction codes. However I'd be more conservative. Although I did report 
KAFKA-7703, I didn't have any evidence that this was exactly the issue we hit 
in production, or that was the only possible issue. There were no enough logs 
to prove it unfortunately. What I know is the workaround we patched in Spark 
did prevent Kafka consumer from reporting incorrect offsets, but it could hide 
other potential unknown issues.
   
   Currently there is no Spark release using Kafka 2.5.0, so I don't feel 
confident that there are no other unknown issues causing the same incorrect 
offset issue. If the cost to keep this workaround is minor, can we wait until a 
Spark release using Kafka 2.5.0 is out for a while? Once there is a Spark 
release available and people start to use it, I can look at our internal logs 
to see if the warning log in `fetchLatestOffsets` is really gone, which will be 
an evidence to prove KAFKA-7703 is likely the only issue.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456232535



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import scala.annotation.tailrec
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight}
+import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, 
ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.joins.{BaseJoinExec, 
ShuffledHashJoinExec, SortMergeJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin`
+ * if the following conditions are met:
+ *   - Two bucketed tables are joined.
+ *   - Join keys match with output partition expressions on their respective 
sides.
+ *   - The larger bucket number is divisible by the smaller bucket number.
+ *   - COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true.
+ *   - The ratio of the number of buckets is less than the value set in
+ * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO (`SortMergeJoin`) 
or,
+ * COALESCE_BUCKETS_IN_SHUFFLED_HASH_JOIN_MAX_BUCKET_RATIO 
(`ShuffledHashJoin`).
+ */
+case class CoalesceBucketsInJoin(conf: SQLConf) extends Rule[SparkPlan] {
+  private def updateNumCoalescedBuckets(
+  join: BaseJoinExec,
+  numLeftBuckets: Int,
+  numRightBucket: Int,
+  numCoalescedBuckets: Int): BaseJoinExec = {
+if (numCoalescedBuckets != numLeftBuckets) {
+  val leftCoalescedChild = join.left transformUp {
+case f: FileSourceScanExec =>
+  f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets))
+  }

Review comment:
   @maropu - sure. updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox


c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456232607



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
##
@@ -19,17 +19,21 @@ package org.apache.spark.sql.execution.bucketing
 
 import org.apache.spark.sql.catalyst.catalog.BucketSpec
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
-import org.apache.spark.sql.catalyst.optimizer.BuildLeft
+import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight, 
BuildSide}
 import org.apache.spark.sql.catalyst.plans.Inner
 import org.apache.spark.sql.execution.{BinaryExecNode, FileSourceScanExec, 
SparkPlan}
 import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
InMemoryFileIndex, PartitionSpec}
 import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
-import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
ShuffledHashJoinExec, SortMergeJoinExec}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
 import org.apache.spark.sql.types.{IntegerType, StructType}
 
-class CoalesceBucketsInSortMergeJoinSuite extends SQLTestUtils with 
SharedSparkSession {
+class CoalesceBucketsInJoinSuite extends SQLTestUtils with SharedSparkSession {
+  private val sortMergeJoin = "sortMergeJoin"

Review comment:
   @cloud-fan - sure. updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r456232227



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,46 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)
+logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used")
+val path = new Path(mrScratchDir, "-mr-1")
+val scheme = Option(path.toUri.getScheme).getOrElse("")
+if (scheme.equals("file")) {
+  logWarning(s"Temporary data will be written into a local file system " +
+s"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in 
local mode, " +
+s"you might need to configure 'hive.exec.scratchdir' " +
+s"to use accessible file system (e.g. HDFS path) from any executors in 
the cluster.")

Review comment:
   Removed `s` in the head. BTW there are a lot of existing code which 
includes it, but I left it as it is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-07-16 Thread GitBox


moomindani commented on a change in pull request #27690:
URL: https://github.com/apache/spark/pull/27690#discussion_r456231877



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
##
@@ -97,12 +99,46 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  // Mostly copied from Context.java#getMRTmpPath of Hive 2.3.
+  // Visible for testing.
+  private[execution] def getNonBlobTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath.
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090.
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir.
+val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else 
scratchDir
+val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), 
hadoopConf, sessionPath)
+logDebug(s"MR scratch dir '$mrScratchDir/-mr-1' is used")
+val path = new Path(mrScratchDir, "-mr-1")
+val scheme = Option(path.toUri.getScheme).getOrElse("")
+if (scheme.equals("file")) {
+  logWarning(s"Temporary data will be written into a local file system " +
+s"(scheme: '$scheme', path: '$mrScratchDir'). If your Spark is not in 
local mode, " +
+s"you might need to configure 'hive.exec.scratchdir' " +
+s"to use accessible file system (e.g. HDFS path) from any executors in 
the cluster.")
+}
+path
+  }
+
+  private def supportSchemeToUseNonBlobStore(path: Path): Boolean = {
+path != null && {
+  val supportedBlobSchemes = SQLConf.get.supportedSchemesToUseNonBlobstore
+  val scheme = Option(path.toUri.getScheme).getOrElse("")
+  
Utils.stringToSeq(supportedBlobSchemes).contains(scheme.toLowerCase(Locale.ROOT))
+}
+  }
+
+  def getExternalTmpPath(
   sparkSession: SparkSession,
   hadoopConf: Configuration,
   path: Path): Path = {
 import org.apache.spark.sql.hive.client.hive._
-

Review comment:
   Thanks for pointing it. Reverted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox


cloud-fan commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659872039


   LGTM. It's a much simpler and robust solution!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659870258







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659870428


   **[Test build #126031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126031/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659848866


   **[Test build #126028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659870258







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659870049


   **[Test build #126028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 edited a comment on pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.

2020-07-16 Thread GitBox


venkata91 edited a comment on pull request #28994:
URL: https://github.com/apache/spark/pull/28994#issuecomment-659869847


   This is an interesting idea and a good start. Just considering the runTime 
of a task alone might not be useful in many cases. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.

2020-07-16 Thread GitBox


venkata91 commented on pull request #28994:
URL: https://github.com/apache/spark/pull/28994#issuecomment-659869847


   This is an interesting idea and a good start. Just considering the runTime 
of a task alone might not be useful in many cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on a change in pull request #28994: [SPARK-32170][CORE] Improve the speculation for the inefficient tasks by the task metrics.

2020-07-16 Thread GitBox


venkata91 commented on a change in pull request #28994:
URL: https://github.com/apache/spark/pull/28994#discussion_r456228668



##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##
@@ -1125,6 +1142,78 @@ private[spark] class TaskSetManager(
   def executorAdded(): Unit = {
 recomputeLocality()
   }
+
+  /**
+   * A class for checking inefficient tasks to be speculated, the inefficient 
tasks come from
+   * the tasks which may be speculated by the previous strategy.
+   */
+  private class InefficientTask {
+private var taskData: Map[Long, TaskData] = null
+private var successTaskProgress = 0.0
+private val checkInefficientTask = speculationTaskMinDuration > 0
+
+if (checkInefficientTask) {
+  val appStatusStore = sched.sc.statusTracker.getAppStatusStore
+  if (appStatusStore != null) {
+successTaskProgress =
+  computeSuccessTaskProgress(taskSet.stageId, taskSet.stageAttemptId, 
appStatusStore)
+val stageData = appStatusStore.stageAttempt(taskSet.stageId, 
taskSet.stageAttemptId, true)
+if (stageData != null) {
+  taskData = stageData._1.tasks.orNull
+}
+  }
+}
+
+private def computeSuccessTaskProgress(stageId: Int, stageAttemptId: Int,
+  appStatusStore: AppStatusStore): Double = {
+  var sumInputRecords, sumShuffleReadRecords, sumExecutorRunTime = 0.0
+  appStatusStore.taskList(stageId, stageAttemptId, Int.MaxValue).filter {
+_.status == "SUCCESS"
+  }.map(_.taskMetrics).filter(_.isDefined).map(_.get).foreach { task =>
+if (task.inputMetrics != null) {
+  sumInputRecords += task.inputMetrics.recordsRead
+}

Review comment:
   how about recordsWritten? Should that also be considered wrt progress 
same wrt shuffleRecordsWritten?

##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##
@@ -1125,6 +1142,78 @@ private[spark] class TaskSetManager(
   def executorAdded(): Unit = {
 recomputeLocality()
   }
+
+  /**
+   * A class for checking inefficient tasks to be speculated, the inefficient 
tasks come from
+   * the tasks which may be speculated by the previous strategy.
+   */
+  private class InefficientTask {
+private var taskData: Map[Long, TaskData] = null
+private var successTaskProgress = 0.0
+private val checkInefficientTask = speculationTaskMinDuration > 0
+
+if (checkInefficientTask) {
+  val appStatusStore = sched.sc.statusTracker.getAppStatusStore
+  if (appStatusStore != null) {
+successTaskProgress =
+  computeSuccessTaskProgress(taskSet.stageId, taskSet.stageAttemptId, 
appStatusStore)
+val stageData = appStatusStore.stageAttempt(taskSet.stageId, 
taskSet.stageAttemptId, true)
+if (stageData != null) {
+  taskData = stageData._1.tasks.orNull
+}
+  }
+}
+
+private def computeSuccessTaskProgress(stageId: Int, stageAttemptId: Int,
+  appStatusStore: AppStatusStore): Double = {
+  var sumInputRecords, sumShuffleReadRecords, sumExecutorRunTime = 0.0
+  appStatusStore.taskList(stageId, stageAttemptId, Int.MaxValue).filter {
+_.status == "SUCCESS"
+  }.map(_.taskMetrics).filter(_.isDefined).map(_.get).foreach { task =>
+if (task.inputMetrics != null) {
+  sumInputRecords += task.inputMetrics.recordsRead
+}
+if (task.shuffleReadMetrics != null) {
+  sumShuffleReadRecords += task.shuffleReadMetrics.recordsRead
+}
+sumExecutorRunTime += task.executorRunTime
+  }
+  if (sumExecutorRunTime > 0) {
+(sumInputRecords + sumShuffleReadRecords) / (sumExecutorRunTime / 
1000.0)
+  } else 0
+}
+
+def maySpeculateTask(tid: Long, runtimeMs: Long, taskInfo: TaskInfo): 
Boolean = {
+  // note: 1) only check inefficient tasks when 
'SPECULATION_TASK_DURATION_THRESHOLD' > 0.
+  // 2) some tasks may have neither input records nor shuffleRead records, 
so
+  // the 'successTaskProgress' may be zero all the time, this case we 
should not consider,
+  // eg: some spark-sql like that 'msck repair table' or 'drop table' and 
so on.
+  if (!checkInefficientTask || successTaskProgress <= 0) {
+true
+  } else if (runtimeMs < speculationTaskMinDuration) {
+false
+  } else if (taskData != null && taskData.contains(tid) && taskData(tid) 
!= null &&
+taskData(tid).taskMetrics.isDefined) {
+val taskMetrics = taskData(tid).taskMetrics.get
+val currentTaskProgressRate = (taskMetrics.inputMetrics.recordsRead +

Review comment:
   would it make sense to add taskProgress as part of taskMetrics that way 
it can also be shown in SparkUI? Although taskProgress for tasks which doesn't 
involve input/output/shuffle records would be hard to measure?

##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
#

[GitHub] [spark] SparkQA removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-659819077


   **[Test build #126021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126021/testReport)**
 for PR 29120 at commit 
[`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-659862630


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126021/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29141:
URL: https://github.com/apache/spark/pull/29141#issuecomment-659852368







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-659862619


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


SparkQA commented on pull request #29141:
URL: https://github.com/apache/spark/pull/29141#issuecomment-659862760


   **[Test build #126030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126030/testReport)**
 for PR 29141 at commit 
[`3210002`](https://github.com/apache/spark/commit/321000236e5571545912af0db1c02a3fa06f1a9a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-659862619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-16 Thread GitBox


SparkQA commented on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-659862399


   **[Test build #126021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126021/testReport)**
 for PR 29120 at commit 
[`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang edited a comment on pull request #29133: [SPARK-32253][INFRA] Show errors only for the sbt tests of github actions

2020-07-16 Thread GitBox


gengliangwang edited a comment on pull request #29133:
URL: https://github.com/apache/spark/pull/29133#issuecomment-659821378


   @HyukjinKwon I have updated the PR description.
   Meanwhile, I created a PR on my repo to see what the test failure log will 
look like: https://github.com/gengliangwang/spark/pull/6
   
   Here is an example of failed log output: 
https://github.com/gengliangwang/spark/pull/6/checks?check_run_id=880362871
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


dongjoon-hyun commented on pull request #29141:
URL: https://github.com/apache/spark/pull/29141#issuecomment-659859923


   Thank you, @cloud-fan !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking edited a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-16 Thread GitBox


xuanyuanking edited a comment on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-659327525


   Summary for separating all `hive.execution` suites
   
   Test | Worker | Scala test time
    | - | -
   https://github.com/apache/spark/pull/28977#issuecomment-659309943 | 
worker-03 | s
   https://github.com/apache/spark/pull/28977#issuecomment-659486466 | 
worker-04 | 8403s



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-16 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659856381


   **[Test build #126029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126029/testReport)**
 for PR 29032 at commit 
[`9356fac`](https://github.com/apache/spark/commit/9356facb887328a2e781f46dc533f41eb6751392).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659856207







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659856207







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659856015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659856015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659746319


   **[Test build #126007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126007/testReport)**
 for PR 28840 at commit 
[`94fa132`](https://github.com/apache/spark/commit/94fa132ca4d58f631cc7666e25b126bc28c7f34e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox


SparkQA commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659855587


   **[Test build #126007 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126007/testReport)**
 for PR 28840 at commit 
[`94fa132`](https://github.com/apache/spark/commit/94fa132ca4d58f631cc7666e25b126bc28c7f34e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


HyukjinKwon commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659855366


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-16 Thread GitBox


viirya commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r456219067



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##
@@ -135,7 +136,16 @@ class SessionCatalog(
 
   private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {

Review comment:
   Hmm, I think this cache is still useful for avoiding inferring schema 
again. This is also an expensive operation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29141:
URL: https://github.com/apache/spark/pull/29141#issuecomment-659852368







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


cloud-fan commented on pull request #29141:
URL: https://github.com/apache/spark/pull/29141#issuecomment-659852057


   cc @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan opened a new pull request #29141: [SPARK-32018][SQL][2.4] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox


cloud-fan opened a new pull request #29141:
URL: https://github.com/apache/spark/pull/29141


   backport https://github.com/apache/spark/pull/29125



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659850454







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659850454







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659849344







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659849344







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659848866


   **[Test build #126028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126028/testReport)**
 for PR 29117 at commit 
[`d7974a4`](https://github.com/apache/spark/commit/d7974a4d58bd51f99d6c010ac536e63a5094fbf3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29140: [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation

2020-07-16 Thread GitBox


cloud-fan closed pull request #29140:
URL: https://github.com/apache/spark/pull/29140


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29140: [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation

2020-07-16 Thread GitBox


cloud-fan commented on pull request #29140:
URL: https://github.com/apache/spark/pull/29140#issuecomment-659848160


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-16 Thread GitBox


cloud-fan commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659846826


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659845800







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659845800







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-16 Thread GitBox


cloud-fan commented on a change in pull request #28852:
URL: https://github.com/apache/spark/pull/28852#discussion_r456214684



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##
@@ -135,7 +136,16 @@ class SessionCatalog(
 
   private val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {

Review comment:
   For external data sources, it's common that data are changed outside of 
Spark. I think it's more important to make sure we get the latest data in a new 
query. Maybe we should disable this relation cache by default.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659844495


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126025/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28852: [SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache

2020-07-16 Thread GitBox


SparkQA commented on pull request #28852:
URL: https://github.com/apache/spark/pull/28852#issuecomment-659844682


   **[Test build #126027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126027/testReport)**
 for PR 28852 at commit 
[`3e761dc`](https://github.com/apache/spark/commit/3e761dcd790b9c30e5cee7bffe916dfc2c82b7a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659773878


   **[Test build #126012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126012/testReport)**
 for PR 29015 at commit 
[`31b231e`](https://github.com/apache/spark/commit/31b231e1b0a984ebdfc408beedaadeec6881ddff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-16 Thread GitBox


SparkQA commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659844348


   **[Test build #126012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126012/testReport)**
 for PR 29015 at commit 
[`31b231e`](https://github.com/apache/spark/commit/31b231e1b0a984ebdfc408beedaadeec6881ddff).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  case class DecommissionWorkersOnHosts(hostnames: Seq[String])`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659834294


   **[Test build #126025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126025/testReport)**
 for PR 29117 at commit 
[`f6207b0`](https://github.com/apache/spark/commit/f6207b038a67b575b65ed4adba1c407b6a0d0ecd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659844490







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-65984


   **[Test build #126025 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126025/testReport)**
 for PR 29117 at commit 
[`f6207b0`](https://github.com/apache/spark/commit/f6207b038a67b575b65ed4adba1c407b6a0d0ecd).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659844490


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >