Hi, What would you expect? The data is simply dropped as that's the purpose of watermarking it. That's my understanding at least.
Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Mon, Feb 5, 2018 at 8:11 PM, M Singh <mans2si...@yahoo.com> wrote: > Just checking if anyone has more details on how watermark works in cases > where event time is earlier than processing time stamp. > > > On Friday, February 2, 2018 8:47 AM, M Singh <mans2si...@yahoo.com> wrote: > > > Hi Vishu/Jacek: > > Thanks for your responses. > > Jacek - At the moment, the current time for my use case is processing time. > > Vishnu - Spark documentation (https://spark.apache.org/ > docs/latest/structured-streaming-programming-guide.html) does indicate > that it can dedup using watermark. So I believe there are more use cases > for watermark and that is what I am trying to find. > > I am hoping that TD can clarify or point me to the documentation. > > Thanks > > > On Wednesday, January 31, 2018 6:37 AM, Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > > > Hi Mans, > > Watermark is Spark is used to decide when to clear the state, so if the > even it delayed more than when the state is cleared by Spark, then it will > be ignored. > I recently wrote a blog post on this : http://vishnuviswanath.com/ > spark_structured_streaming.html#watermark > > Yes, this State is applicable for aggregation only. If you are having only > a map function and don't want to process it, you could do a filter based on > its EventTime field, but I guess you will have to compare it with the > processing time since there is no API to access Watermark by the user. > > -Vishnu > > On Fri, Jan 26, 2018 at 1:14 PM, M Singh <mans2si...@yahoo.com.invalid> > wrote: > > Hi: > > I am trying to filter out records which are lagging behind (based on event > time) by a certain amount of time. > > Is the watermark api applicable to this scenario (ie, filtering lagging > records) or it is only applicable with aggregation ? I could not get a > clear understanding from the documentation which only refers to it's usage > with aggregation. > > Thanks > > Mans > > > > > > >