[GitHub] [spark] HyukjinKwon commented on pull request #28819: [SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28819:
URL: https://github.com/apache/spark/pull/28819#issuecomment-643722914


   I think it matches the behaviour with Presto's (see 
https://github.com/apache/spark/pull/21155). Shall we check how it works in 
Presto?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak edited a comment on pull request #28820: [SPARK-31632][CORE][WEBUI][FOLLOWUP] Enrich the exception message when application summary is unavailable

2020-06-13 Thread GitBox


sarutak edited a comment on pull request #28820:
URL: https://github.com/apache/spark/pull/28820#issuecomment-643722372


   @HyukjinKwon Sorry I didn't think it's necessary to file because this is 
just a small followup. I'll take care. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-06-13 Thread GitBox


HyukjinKwon closed pull request #28391:
URL: https://github.com/apache/spark/pull/28391


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28820: [SPARK-31632][CORE][WEBUI][FOLLOWUP] Enrich the exception message when application summary is unavailable

2020-06-13 Thread GitBox


sarutak commented on pull request #28820:
URL: https://github.com/apache/spark/pull/28820#issuecomment-643722372


   @HyukjinKwon Sorry I didn't think it's not necessary to file because this is 
just a small followup. I'll take care. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-643722422


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643722166







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643722166







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


SparkQA removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643706423


   **[Test build #123987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123987/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643721948


   **[Test build #123987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123987/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


HyukjinKwon closed pull request #28607:
URL: https://github.com/apache/spark/pull/28607


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643721610


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643721071







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643721071







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


SparkQA removed a comment on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643706416


   **[Test build #123984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123984/testReport)**
 for PR 28607 at commit 
[`4216405`](https://github.com/apache/spark/commit/4216405789c07f7ded54be05d5ecd8797ee05291).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


SparkQA commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643720854


   **[Test build #123984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123984/testReport)**
 for PR 28607 at commit 
[`4216405`](https://github.com/apache/spark/commit/4216405789c07f7ded54be05d5ecd8797ee05291).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28814: [SPARK-31968][SQL]Duplicate partition columns check when writing data

2020-06-13 Thread GitBox


dongjoon-hyun commented on pull request #28814:
URL: https://github.com/apache/spark/pull/28814#issuecomment-643720843


   Hi, @TJX2014 . What is your JIRA id?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #28814: [SPARK-31968][SQL]Duplicate partition columns check when writing data

2020-06-13 Thread GitBox


dongjoon-hyun closed pull request #28814:
URL: https://github.com/apache/spark/pull/28814


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643720129







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28820: [SPARK-31632][CORE][WEBUI][FOLLOWUP] Enrich the exception message when application summary is unavailable

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28820:
URL: https://github.com/apache/spark/pull/28820#issuecomment-643720151


   Oh, @sarutak, let's file a new JIRA next time when the fixed versions are 
expected to conflict.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643720129







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28820: [SPARK-31632][CORE][WEBUI][FOLLOWUP] Enrich the exception message when application summary is unavailable

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28820:
URL: https://github.com/apache/spark/pull/28820#issuecomment-643720070


   Merged to master and branch-3.0, and branch-2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-13 Thread GitBox


SparkQA commented on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643720025


   **[Test build #123996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123996/testReport)**
 for PR 28821 at commit 
[`707b0cf`](https://github.com/apache/spark/commit/707b0cf949e2532429bdc62d7ef219fe98a0751e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28820: [SPARK-31632][CORE][WEBUI][FOLLOWUP] Enrich the exception message when application summary is unavailable

2020-06-13 Thread GitBox


HyukjinKwon closed pull request #28820:
URL: https://github.com/apache/spark/pull/28820


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643600098


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28821: [SPARK-31981][SQL] Keep TimestampType when taking an average of a Timestamp

2020-06-13 Thread GitBox


HyukjinKwon commented on pull request #28821:
URL: https://github.com/apache/spark/pull/28821#issuecomment-643719783


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-643714697







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-643714697







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


SparkQA commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-643714577


   **[Test build #123995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123995/testReport)**
 for PR 28818 at commit 
[`ef3f523`](https://github.com/apache/spark/commit/ef3f52364afffb9943d0e0a9e36e2b043c4a3284).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643714120


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123983/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643714118


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28818:
URL: https://github.com/apache/spark/pull/28818#discussion_r439788769



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##
@@ -497,6 +453,70 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 
   protected def minRegisteredRatio: Double = _minRegisteredRatio
 
+  /**
+   * Request that the cluster manager decommission the specified executors.
+   * Default implementation delegates to kill, scheduler must override
+   * if it supports graceful decommissioning.

Review comment:
   So that's copied from the trait it's implementing (it is the part doing 
the override), I'll delete it here though since it's confusing in this context, 
good catch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643714118







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


agrawaldevesh commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439788752



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala
##
@@ -52,6 +52,8 @@ private[spark] object CoarseGrainedClusterMessages {
   case class UpdateDelegationTokens(tokens: Array[Byte])
 extends CoarseGrainedClusterMessage
 
+  case object DecommissionSelf extends CoarseGrainedClusterMessage // Mark as 
decommissioned.

Review comment:
   Sounds good ;-) I checked that DecommissionSelf is not indeed used 
anywhere else, so it should be unambiguous. Lets keep the name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


agrawaldevesh commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439788696



##
File path: 
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##
@@ -258,26 +262,60 @@ private[spark] class CoarseGrainedExecutorBackend(
 System.exit(code)
   }
 
-  private def decommissionSelf(): Boolean = {
-logInfo("Decommissioning self w/sync")
-try {
-  decommissioned = true
-  // Tell master we are are decommissioned so it stops trying to schedule 
us
-  if (driver.nonEmpty) {
-driver.get.askSync[Boolean](DecommissionExecutor(executorId))
+  private def shutdownIfDone(): Unit = {
+val numRunningTasks = executor.numRunningTasks
+logInfo(s"Checking to see if we can shutdown have ${numRunningTasks} 
running tasks.")
+if (executor.numRunningTasks == 0) {
+  if (env.conf.get(STORAGE_DECOMMISSION_ENABLED)) {
+val allBlocksMigrated = env.blockManager.decommissionManager match {
+  case Some(m) => m.allBlocksMigrated
+  case None => false // We haven't started migrations yet.
+}
+if (allBlocksMigrated) {
+  logInfo("No running tasks, all blocks migrated, stopping.")
+  exitExecutor(0, "Finished decommissioning", notifyDriver = true)

Review comment:
   I was talking about the case where we get shot down before we had a 
chance to cleanly exit on line 276. Say for example, some time out expires and 
the executor/node is brought down. 
   
   Are `decom.sh` and `decommission-slave.sh` expected to wait until the 
executor/worker process has properly shut down ? I think they have some 
timeouts in them to kill the executor ? Or consider a spot kill scenario where 
you got some warning (like 2 minutes) and then the machine is yanked out. 
   
   In this case, the executor will eventually be marked loss via a 
heartbeat/timeout. And that loss would be deemed as the fault of the task, and 
could cause job failures. I am wondering if we can fix that scenario of an 
unclean exit ? 
   
   One workaround I suggested above was to send a message to the driver saying 
that the executor is going to go away soon. When that happens (in a clean or 
unclean way), that loss shouldn't be attributed to the task. 
   
   Perhaps this unclean executor loss/timeout handling is follow up work ? We 
(or rather I) can create Jira's for this under the parent ticket :-). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28818:
URL: https://github.com/apache/spark/pull/28818#discussion_r439788707



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##
@@ -333,11 +335,19 @@ private[spark] class ExecutorMonitor(
   }
 
   override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {
-if (!event.blockUpdatedInfo.blockId.isInstanceOf[RDDBlockId]) {
-  return
-}
 val exec = 
ensureExecutorIsTracked(event.blockUpdatedInfo.blockManagerId.executorId,
   UNKNOWN_RESOURCE_PROFILE_ID)
+
+// Check if it is a shuffle file, or RDD to pick the correct codepath for 
update
+if (event.blockUpdatedInfo.blockId.isInstanceOf[ShuffleDataBlockId] && 
shuffleTrackingEnabled) {
+  event.blockUpdatedInfo.blockId match {
+case ShuffleDataBlockId(shuffleId, _, _) => exec.addShuffle(shuffleId)
+case _ => // For now we only update on data blocks
+  }

Review comment:
   Sure, I'll add one :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


SparkQA removed a comment on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643703342


   **[Test build #123983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123983/testReport)**
 for PR 28817 at commit 
[`a2c0557`](https://github.com/apache/spark/commit/a2c055715ccf2992e399cef3768b1299c24d9a82).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


SparkQA commented on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643713767


   **[Test build #123983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123983/testReport)**
 for PR 28817 at commit 
[`a2c0557`](https://github.com/apache/spark/commit/a2c055715ccf2992e399cef3768b1299c24d9a82).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


agrawaldevesh commented on a change in pull request #28818:
URL: https://github.com/apache/spark/pull/28818#discussion_r439788125



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##
@@ -333,11 +335,19 @@ private[spark] class ExecutorMonitor(
   }
 
   override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {
-if (!event.blockUpdatedInfo.blockId.isInstanceOf[RDDBlockId]) {
-  return
-}
 val exec = 
ensureExecutorIsTracked(event.blockUpdatedInfo.blockManagerId.executorId,
   UNKNOWN_RESOURCE_PROFILE_ID)
+
+// Check if it is a shuffle file, or RDD to pick the correct codepath for 
update
+if (event.blockUpdatedInfo.blockId.isInstanceOf[ShuffleDataBlockId] && 
shuffleTrackingEnabled) {
+  event.blockUpdatedInfo.blockId match {
+case ShuffleDataBlockId(shuffleId, _, _) => exec.addShuffle(shuffleId)
+case _ => // For now we only update on data blocks
+  }

Review comment:
   Makes perfect sense ! A comment in the code would be helpful ;-). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643712139


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123988/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643712138


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


SparkQA removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643706415


   **[Test build #123988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123988/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


SparkQA commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643712100


   **[Test build #123988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123988/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643712138







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


zhli1142015 commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r439787481



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
##
@@ -238,6 +254,14 @@ public void close() throws IOException {
   }
 
   try {
+if (iteratorTracker != null) {
+  for (SoftReference> ref: iteratorTracker) {
+LevelDBIterator it = ref.get();
+if(it != null) {

Review comment:
   @srowen , thanks for your review, updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


zhli1142015 commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r439787376



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
##
@@ -229,6 +241,10 @@ public long count(Class type, String index, Object 
indexedValue) throws Excep
 return idx.getCount(idx.end(null, indexedValue));
   }
 
+  /**
+   * Trying to close a JNI LevelDB handle with a closed DB can cause JVM 
crashes,

Review comment:
   Thanks, updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-643711604







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-643711604







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


zhli1142015 commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r439787328



##
File path: 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java
##
@@ -276,6 +277,41 @@ public void testNegativeIndexValues() throws Exception {
 assertEquals(expected, results);
   }
 
+  @Test
+  public void testCloseLevelDBIterator() throws Exception {

Review comment:
   yes, this is tested on windows, without this patch, assert in line 312 
would fail.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


zhli1142015 commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r439787366



##
File path: 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java
##
@@ -276,6 +277,41 @@ public void testNegativeIndexValues() throws Exception {
 assertEquals(expected, results);
   }
 
+  @Test
+  public void testCloseLevelDBIterator() throws Exception {
+// SPARK-31929: test when LevelDB.close() is called, related 
LevelDBIterators
+// are closed. And files opened by iterators are also closed.
+File dbPathForCloseTest = File
+  .createTempFile(
+"test_db_close.",
+".ldb");
+dbPathForCloseTest.delete();
+LevelDB dbForCloseTest = new LevelDB(dbPathForCloseTest);
+for (int i = 0; i < 8192; i++) {
+  dbForCloseTest.write(createCustomType1(i));
+}
+String key = dbForCloseTest
+  .view(CustomType1.class).iterator().next().key;
+assertEquals("key0", key);
+Iterator it0 = dbForCloseTest
+  .view(CustomType1.class).max(1).iterator();
+while(it0.hasNext()) {
+  it0.next();
+}
+System.gc();
+Iterator it1 = dbForCloseTest
+  .view(CustomType1.class).iterator();
+assertEquals("key0", it1.next().key);
+try(KVStoreIterator it2 = dbForCloseTest

Review comment:
   @HeartSaVioR , thanks for your review, updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


SparkQA commented on pull request #28769:
URL: https://github.com/apache/spark/pull/28769#issuecomment-643711520


   **[Test build #123994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123994/testReport)**
 for PR 28769 at commit 
[`9bcc084`](https://github.com/apache/spark/commit/9bcc084160021dd9a7dc1d573b787075016e9f01).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28818:
URL: https://github.com/apache/spark/pull/28818#discussion_r439785650



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##
@@ -333,11 +335,19 @@ private[spark] class ExecutorMonitor(
   }
 
   override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {
-if (!event.blockUpdatedInfo.blockId.isInstanceOf[RDDBlockId]) {
-  return
-}
 val exec = 
ensureExecutorIsTracked(event.blockUpdatedInfo.blockManagerId.executorId,
   UNKNOWN_RESOURCE_PROFILE_ID)
+
+// Check if it is a shuffle file, or RDD to pick the correct codepath for 
update
+if (event.blockUpdatedInfo.blockId.isInstanceOf[ShuffleDataBlockId] && 
shuffleTrackingEnabled) {
+  event.blockUpdatedInfo.blockId match {
+case ShuffleDataBlockId(shuffleId, _, _) => exec.addShuffle(shuffleId)
+case _ => // For now we only update on data blocks
+  }

Review comment:
   So it's not (I still want to get to SPARK-31974). The executor monitor 
keeps track of locations of cache and shuffle blocks and this can be used to 
decide which executor(s) Spark should shutdown first. Since we move shuffle 
blocks around now this wires it up so that it keeps track of it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #28822: [SPARK-31644][BUILD][FOLLOWUP] Make Spark's guava version configurable from the command line for sbt

2020-06-13 Thread GitBox


dongjoon-hyun closed pull request #28822:
URL: https://github.com/apache/spark/pull/28822


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #24173:
URL: https://github.com/apache/spark/pull/24173#issuecomment-643706534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27620: [SPARK-30866][SS] FileStreamSource: Cache fetched list of files beyond maxFilesPerTrigger as unread files

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27620:
URL: https://github.com/apache/spark/pull/27620#issuecomment-643706511







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643706508







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-643706531







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-643706528







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #24173:
URL: https://github.com/apache/spark/pull/24173#issuecomment-643706534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27333:
URL: https://github.com/apache/spark/pull/27333#issuecomment-643706522







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643706501







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-643706484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643706496







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27620: [SPARK-30866][SS] FileStreamSource: Cache fetched list of files beyond maxFilesPerTrigger as unread files

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27620:
URL: https://github.com/apache/spark/pull/27620#issuecomment-643706511







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27333:
URL: https://github.com/apache/spark/pull/27333#issuecomment-643706522







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643706496







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table

2020-06-13 Thread GitBox


wangyum commented on pull request #28032:
URL: https://github.com/apache/spark/pull/28032#issuecomment-643706474


   > @wangyum Question, if we have a repartition hint on p1 and p2 in the 
SELECT query would it have similar effect ?
   
   Yes.  It have similar effect.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643706508







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643706485







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643706501







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643706485







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-643706484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27620: [SPARK-30866][SS] FileStreamSource: Cache fetched list of files beyond maxFilesPerTrigger as unread files

2020-06-13 Thread GitBox


SparkQA commented on pull request #27620:
URL: https://github.com/apache/spark/pull/27620#issuecomment-643706447


   **[Test build #123989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123989/testReport)**
 for PR 27620 at commit 
[`8251b74`](https://github.com/apache/spark/commit/8251b744d40f4f8744df53d68842894489808c2b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-06-13 Thread GitBox


SparkQA commented on pull request #24173:
URL: https://github.com/apache/spark/pull/24173#issuecomment-643706432


   **[Test build #123993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123993/testReport)**
 for PR 24173 at commit 
[`1fcfff5`](https://github.com/apache/spark/commit/1fcfff5c2ca78049eb38cf4ef7c041d0005ab9b3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-06-13 Thread GitBox


SparkQA commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-643706446


   **[Test build #123991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123991/testReport)**
 for PR 26935 at commit 
[`895fe06`](https://github.com/apache/spark/commit/895fe068bd3b32ed70ef84cc68e3352306099214).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption

2020-06-13 Thread GitBox


SparkQA commented on pull request #25965:
URL: https://github.com/apache/spark/pull/25965#issuecomment-643706439


   **[Test build #123992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123992/testReport)**
 for PR 25965 at commit 
[`d15acef`](https://github.com/apache/spark/commit/d15acef9698528239dc8a5b92d55c950cdf602b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-06-13 Thread GitBox


SparkQA commented on pull request #27333:
URL: https://github.com/apache/spark/pull/27333#issuecomment-643706441


   **[Test build #123990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123990/testReport)**
 for PR 27333 at commit 
[`466363e`](https://github.com/apache/spark/commit/466363edb22ea83a81e21a72f1b983dc7b5a733e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


SparkQA commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643706415


   **[Test build #123988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123988/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


SparkQA commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643706416


   **[Test build #123984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123984/testReport)**
 for PR 28607 at commit 
[`4216405`](https://github.com/apache/spark/commit/4216405789c07f7ded54be05d5ecd8797ee05291).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643706423


   **[Test build #123987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123987/testReport)**
 for PR 27694 at commit 
[`3933018`](https://github.com/apache/spark/commit/3933018575441fca267e0a0fe93bfef7d9cf58f5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-06-13 Thread GitBox


SparkQA commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643706409


   **[Test build #123985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123985/testReport)**
 for PR 28422 at commit 
[`06ee53d`](https://github.com/apache/spark/commit/06ee53d9dee60756be8563d584d589e198d670f1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-06-13 Thread GitBox


SparkQA commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-643706412


   **[Test build #123986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123986/testReport)**
 for PR 28363 at commit 
[`9383fcb`](https://github.com/apache/spark/commit/9383fcbce44522e396dfa7b5a56a9ff84f951bb7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643706117


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-643706110


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28607: [SPARK-24634][SS] Add a new metric regarding number of inputs later than watermark plus allowed delay

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #28607:
URL: https://github.com/apache/spark/pull/28607#issuecomment-643706118


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #27333:
URL: https://github.com/apache/spark/pull/27333#issuecomment-643706079


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-643706104


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27620: [SPARK-30866][SS] FileStreamSource: Cache fetched list of files beyond maxFilesPerTrigger as unread files

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #27620:
URL: https://github.com/apache/spark/pull/27620#issuecomment-643706085


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #24173:
URL: https://github.com/apache/spark/pull/24173#issuecomment-643706060


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #26935:
URL: https://github.com/apache/spark/pull/26935#issuecomment-643706074


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #27649:
URL: https://github.com/apache/spark/pull/27649#issuecomment-643706092


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #25965: [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #25965:
URL: https://github.com/apache/spark/pull/25965#issuecomment-643706072


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-06-13 Thread GitBox


HeartSaVioR commented on pull request #28422:
URL: https://github.com/apache/spark/pull/28422#issuecomment-643705743


   I can even tolerate the fact maxFileAge is originated from path's latest 
timestamp. If we don't believe the node's wall time (I suspect other logic 
works well in such case though) then yes it might be the source of the truth 
across nodes.
   
   I feel all the confusions come from the behavior of `latestFirst`. Yes we 
would like to read from latest in some case if we're only interested with 
latest files. But then should we really open the possibility to trace back 
older files? Would we just simply do the thing we do with Kafka's "latest" 
option, which only affects the first batch and no-op in further batches?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close

2020-06-13 Thread GitBox


HeartSaVioR commented on a change in pull request #28769:
URL: https://github.com/apache/spark/pull/28769#discussion_r439782275



##
File path: 
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
##
@@ -229,6 +241,10 @@ public long count(Class type, String index, Object 
indexedValue) throws Excep
 return idx.getCount(idx.end(null, indexedValue));
   }
 
+  /**
+   * Trying to close a JNI LevelDB handle with a closed DB can cause JVM 
crashes,

Review comment:
   nit: Trying to close a JNI LevelDB handle with a closed DB can cause JVM 
crashes`. T`his ensures that all iterators are correctly closed before DB is 
closed.
   
   Btw, I guess this comment looks better than the one for `iteratorTracker`. 
We may just need to have one in `iteratorTracker` with replacing the sentences, 
and maybe better to add explanation of why we use soft reference to track these 
iterators.

##
File path: 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java
##
@@ -276,6 +277,41 @@ public void testNegativeIndexValues() throws Exception {
 assertEquals(expected, results);
   }
 
+  @Test
+  public void testCloseLevelDBIterator() throws Exception {
+// SPARK-31929: test when LevelDB.close() is called, related 
LevelDBIterators
+// are closed. And files opened by iterators are also closed.
+File dbPathForCloseTest = File
+  .createTempFile(
+"test_db_close.",
+".ldb");
+dbPathForCloseTest.delete();
+LevelDB dbForCloseTest = new LevelDB(dbPathForCloseTest);
+for (int i = 0; i < 8192; i++) {
+  dbForCloseTest.write(createCustomType1(i));
+}
+String key = dbForCloseTest
+  .view(CustomType1.class).iterator().next().key;
+assertEquals("key0", key);
+Iterator it0 = dbForCloseTest
+  .view(CustomType1.class).max(1).iterator();
+while(it0.hasNext()) {
+  it0.next();
+}
+System.gc();
+Iterator it1 = dbForCloseTest
+  .view(CustomType1.class).iterator();
+assertEquals("key0", it1.next().key);
+try(KVStoreIterator it2 = dbForCloseTest

Review comment:
   nit: space after `try` 

##
File path: 
common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java
##
@@ -276,6 +277,41 @@ public void testNegativeIndexValues() throws Exception {
 assertEquals(expected, results);
   }
 
+  @Test
+  public void testCloseLevelDBIterator() throws Exception {

Review comment:
   I haven't tested but it might not fail on Linux/MacOS even without the 
patch. That's OK, if you've tested with Windows and make sure this test fails 
without the patch. Could you please confirm? (I don't have Windows env. for 
development.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


AmplabJenkins commented on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643703413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643703413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


SparkQA commented on pull request #28817:
URL: https://github.com/apache/spark/pull/28817#issuecomment-643703342


   **[Test build #123983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123983/testReport)**
 for PR 28817 at commit 
[`a2c0557`](https://github.com/apache/spark/commit/a2c055715ccf2992e399cef3768b1299c24d9a82).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439781961



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1887,7 +1891,7 @@ private[spark] class BlockManager(
* but rather shadows them.
* Requires an Indexed based shuffle resolver.

Review comment:
   Good catch





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439781897



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -233,6 +233,7 @@ private[spark] class Executor(
* Mark an executor for decommissioning and avoid launching new tasks.
*/
   private[spark] def decommission(): Unit = {
+logInfo("Executor asked to decommission. Starting shutdown thread.")

Review comment:
   Just logging for now. The reason I propagate the message to the executor 
is so that if we end up in a state where the executor believes it 
decommissioned (say local SIGPWR) but the driver doesn't it could be weird so 
having some logging is useful.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439781822



##
File path: 
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##
@@ -258,26 +262,65 @@ private[spark] class CoarseGrainedExecutorBackend(
 System.exit(code)
   }
 
-  private def decommissionSelf(): Boolean = {
-logInfo("Decommissioning self w/sync")
-try {
-  decommissioned = true
-  // Tell master we are are decommissioned so it stops trying to schedule 
us
-  if (driver.nonEmpty) {
-driver.get.askSync[Boolean](DecommissionExecutor(executorId))
+  private var previousAllBlocksMigrated = false
+  private def shutdownIfDone(): Unit = {
+val numRunningTasks = executor.numRunningTasks
+logInfo(s"Checking to see if we can shutdown have ${numRunningTasks} 
running tasks.")
+if (executor.numRunningTasks == 0) {
+  if (env.conf.get(STORAGE_DECOMMISSION_ENABLED)) {
+val allBlocksMigrated = env.blockManager.decommissionManager match {
+  case Some(m) => m.allBlocksMigrated
+  case None => false // We haven't started migrations yet.
+}
+if (allBlocksMigrated && previousAllBlocksMigrated) {
+  logInfo("No running tasks, all blocks migrated, stopping.")
+  exitExecutor(0, "Finished decommissioning", notifyDriver = true)
+}
+previousAllBlocksMigrated = allBlocksMigrated
   } else {
-logError("No driver to message decommissioning.")
+logInfo("No running tasks, no block migration configured, stopping.")
+exitExecutor(0, "Finished decommissioning", notifyDriver = true)
   }
-  if (executor != null) {
-executor.decommission()
+} else {
+  // If there's a running task it could store blocks.
+  previousAllBlocksMigrated = false
+}
+  }
+
+  private def decommissionSelf(): Boolean = {
+if (!decommissioned) {
+  logInfo("Decommissioning self w/sync")

Review comment:
   Sure





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28817: [WIP][SPARK-31197][CORE] Exit the executor once all tasks and migrations are finished built on top of on top of spark20629

2020-06-13 Thread GitBox


holdenk commented on a change in pull request #28817:
URL: https://github.com/apache/spark/pull/28817#discussion_r439781710



##
File path: 
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##
@@ -258,26 +262,60 @@ private[spark] class CoarseGrainedExecutorBackend(
 System.exit(code)
   }
 
-  private def decommissionSelf(): Boolean = {
-logInfo("Decommissioning self w/sync")
-try {
-  decommissioned = true
-  // Tell master we are are decommissioned so it stops trying to schedule 
us
-  if (driver.nonEmpty) {
-driver.get.askSync[Boolean](DecommissionExecutor(executorId))
+  private def shutdownIfDone(): Unit = {
+val numRunningTasks = executor.numRunningTasks
+logInfo(s"Checking to see if we can shutdown have ${numRunningTasks} 
running tasks.")
+if (executor.numRunningTasks == 0) {
+  if (env.conf.get(STORAGE_DECOMMISSION_ENABLED)) {
+val allBlocksMigrated = env.blockManager.decommissionManager match {
+  case Some(m) => m.allBlocksMigrated
+  case None => false // We haven't started migrations yet.
+}
+if (allBlocksMigrated) {
+  logInfo("No running tasks, all blocks migrated, stopping.")
+  exitExecutor(0, "Finished decommissioning", notifyDriver = true)

Review comment:
   So it's my understanding the `TaskSchedulerImpl` shouldn't have any job 
failures because we've waited for all the tasks on the executor to finish 
before calling this code path. Unless is there something I've missed there?
   
   I think swapping out exit executor for instead telling the driver to stop 
the executor and avoiding the `system.exit` makes sense either way though.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >