Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/15949
Hi @tcondie, just want to provide some information regarding watermark
recovery:
After some investigation, I found that even if we do have correctly
recovered the watermark from log, the test `WatermarkSuite#test("recovery")`
still fails. It's due to a sink might just skip the real execution of a
re-submitted batch, thus the watermark would not proceed correctly. I've opened
[SPARK-18552](https://issues.apache.org/jira/browse/SPARK-18552) for this.
As a provisional fix to pass `WatermarkSuite#test("recovery")` and verify
we can recover watermark from logs correctly, we can just do this to
`MemorySink`:
```
if {
...
} else {
data.collect() // this provisional fix forces the execution of a
re-submitted batch
logDebug(s"Skipping already committed batch: $batchId")
}
```
Hope this helps!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]