Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/16449
> the watermarket time is Thu Jul 17 21:10:50 PDT 2014 and the current time
is Sun Jan 01 20:10:50 PST 2017
> It sounds like it is caused by our intentional over-estimation (that is,
by using 31 days per month)?
@gatorsmile this is an expected behavior. It's intentional and it's correct
as per the comment in `Dataset.withWatermark`,
> the actual watermark used is only guaranteed to be at least
`delayThreshold` behind the actual event time. In some cases we may still
process records that arrive more than `delayThreshold` late.
If the user wants to filter data accurately, they need to use `filter`
explicitly.
That's why I changed the test rather than the watermark calculation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]