Hi Jacek: Thanks for your response. I am just trying to understand the fundamentals of watermarking and how it behaves in aggregation vs non-aggregation scenarios.
On Tuesday, February 6, 2018 9:04 AM, Jacek Laskowski <ja...@japila.pl> wrote: Hi, What would you expect? The data is simply dropped as that's the purpose of watermarking it. That's my understanding at least. Pozdrawiam,Jacek Laskowski----https://about.me/JacekLaskowskiMastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Mon, Feb 5, 2018 at 8:11 PM, M Singh <mans2si...@yahoo.com> wrote: Just checking if anyone has more details on how watermark works in cases where event time is earlier than processing time stamp. On Friday, February 2, 2018 8:47 AM, M Singh <mans2si...@yahoo.com> wrote: Hi Vishu/Jacek: Thanks for your responses. Jacek - At the moment, the current time for my use case is processing time. Vishnu - Spark documentation (https://spark.apache.org/ docs/latest/structured- streaming-programming-guide. html) does indicate that it can dedup using watermark. So I believe there are more use cases for watermark and that is what I am trying to find. I am hoping that TD can clarify or point me to the documentation. Thanks On Wednesday, January 31, 2018 6:37 AM, Vishnu Viswanath <vishnu.viswanat...@gmail.com> wrote: Hi Mans, Watermark is Spark is used to decide when to clear the state, so if the even it delayed more than when the state is cleared by Spark, then it will be ignored.I recently wrote a blog post on this : http://vishnuviswanath.com/ spark_structured_streaming. html#watermark Yes, this State is applicable for aggregation only. If you are having only a map function and don't want to process it, you could do a filter based on its EventTime field, but I guess you will have to compare it with the processing time since there is no API to access Watermark by the user. -Vishnu On Fri, Jan 26, 2018 at 1:14 PM, M Singh <mans2si...@yahoo.com.invalid> wrote: Hi: I am trying to filter out records which are lagging behind (based on event time) by a certain amount of time. Is the watermark api applicable to this scenario (ie, filtering lagging records) or it is only applicable with aggregation ? I could not get a clear understanding from the documentation which only refers to it's usage with aggregation. Thanks Mans