Hi Jacek:
Thanks for your response.
I am just trying to understand the fundamentals of watermarking and how it
behaves in aggregation vs non-aggregation scenarios.
On Tuesday, February 6, 2018 9:04 AM, Jacek Laskowski
wrote:
Hi,
What would you expect? The data is simply dropped as
Hi,
What would you expect? The data is simply dropped as that's the purpose of
watermarking it. That's my understanding at least.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark
Just checking if anyone has more details on how watermark works in cases where
event time is earlier than processing time stamp.
On Friday, February 2, 2018 8:47 AM, M Singh wrote:
Hi Vishu/Jacek:
Thanks for your responses.
Jacek - At the moment, the current time for my use case is proc
Hi Vishu/Jacek:
Thanks for your responses.
Jacek - At the moment, the current time for my use case is processing time.
Vishnu - Spark documentation
(https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
does indicate that it can dedup using watermark. So I believe th
Hi Mans,
Watermark is Spark is used to decide when to clear the state, so if the
even it delayed more than when the state is cleared by Spark, then it will
be ignored.
I recently wrote a blog post on this :
http://vishnuviswanath.com/spark_structured_streaming.html#watermark
Yes, this State is ap
Hi,
I'm curious how would you do the requirement "by a certain amount of time"
without a watermark? How would you know what's current and compute the lag?
Let's forget about watermark for a moment and see if it pops up as an
inevitable feature :)
"I am trying to filter out records which are laggi
Hi:
I am trying to filter out records which are lagging behind (based on event
time) by a certain amount of time.
Is the watermark api applicable to this scenario (ie, filtering lagging
records) or it is only applicable with aggregation ? I could not get a clear
understanding from the documen