Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11976#discussion_r57813212
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -212,9 +210,18 @@ class StreamExecution(
populateStartOffsets()
logDebug(s"Stream running from $committedOffsets to
$availableOffsets")
while (isActive) {
+ val batchStartTimeMs = System.currentTimeMillis()
--- End diff --
Maybe we can pull this logic out into its own class so that we can override
time and unit test it properly? Some questions:
- I think this might be implementing the "alternative" and not the
proposal from the design doc. If I understand correctly the behavior should be
" If the cluster is overloaded, we will skip some firings and wait until the
next multiple of period.". Where as I think this is executing ASAP when
overloaded.
- What about failures? What if its an hour trigger and the cluster fails
and comes back up after waiting 10 minutes? (we might defer this case)
/cc @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]