GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/21345
[SPARK-24159] [SS] Enable no-data micro batches for streaming
mapGroupswithState
## What changes were proposed in this pull request?
Enabled no-data batches in flatMapGroupsWithState in following two cases.
- When ProcessingTime timeout is used, then we always run a batch every
trigger interval.
- When event-time watermark is defined, then the user may be doing
arbitrary logic against the watermark value even if timeouts are not set. In
such cases, it's best to run batches whenever the watermark has changed,
irrespective of whether timeouts (i.e. event-time timeout) have been explicitly
enabled.
## How was this patch tested?
updated tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-24159
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21345.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21345
----
commit e24313341a91cebb4411eb2c804861dd87a7a257
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-05-08T12:32:39Z
Enabled and fixed test
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]