[GitHub] spark pull request: [SPARK-14176][SQL]Add DataFrameWriter.trigger ...

marmbrus Tue, 29 Mar 2016 15:41:45 -0700

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11976#discussion_r57813212
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -212,9 +210,18 @@ class StreamExecution(
           populateStartOffsets()
           logDebug(s"Stream running from $committedOffsets to 
$availableOffsets")
           while (isActive) {
    +        val batchStartTimeMs = System.currentTimeMillis()
    --- End diff --
    
    Maybe we can pull this logic out into its own class so that we can override 
time and unit test it properly?  Some questions:
     - I think this might be implementing the "alternative" and not the 
proposal from the design doc. If I understand correctly the behavior should be 
" If the cluster is overloaded, we will skip some firings and wait until the 
next multiple of period.".  Where as I think this is executing ASAP when 
overloaded.
     - What about failures?  What if its an hour trigger and the cluster fails 
and comes back up after waiting 10 minutes? (we might defer this case)
    
    /cc @rxin



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14176][SQL]Add DataFrameWriter.trigger ...

Reply via email to