Re: Watermarking without aggregation with Structured Streaming

2018-10-25 Thread sanjay_awat
Hello peay-2, Were you able to get a solution to your problem ? Were you able to get watermark timestamp available through a function ? Regards, Sanjay peay-2 wrote > Thanks for the pointers. I guess right now the only workaround would be to > apply a "dummy" aggregation (e.g., group by the tim

Re: Watermarking without aggregation with Structured Streaming

2018-10-24 Thread Sanjay Awatramani
Try if this works... println(query.lastProgress.eventTime.get("watermark")) Regards,Sanjay On 2018/09/30 09:05:40, peay wrote:  > Thanks for the pointers. I guess right now the only workaround would be to > apply a "dummy" aggregation (e.g., group by the timestamp itself) only to > have the sta

Re: Watermarking without aggregation with Structured Streaming

2018-10-24 Thread sanjay_awat
Try this peay-2 wrote > For my purposes, an alternative solution to pushing it out to the source > would be to make the watermark timestamp available through a function so > that it can be used in a regular filter clause. Based on my experiments, > the timestamp is computed and updated even when

Re: Watermarking without aggregation with Structured Streaming

2018-09-30 Thread peay
Thanks for the pointers. I guess right now the only workaround would be to apply a "dummy" aggregation (e.g., group by the timestamp itself) only to have the stateful processing logic kick in and apply the filtering? For my purposes, an alternative solution to pushing it out to the source would

Re: Watermarking without aggregation with Structured Streaming

2018-09-30 Thread Jungtaek Lim
The purpose of watermark is to set a limitation on handling records due to state going infinity. In other cases (non-stateful operations), it is pretty normal to handle all of records even they're pretty late. Btw, there was some comments regarding this: while Spark delegates to filter out late re

Re: Watermarking without aggregation with Structured Streaming

2018-09-30 Thread chandan prakash
Interesting question. I do not think without any aggregation operation/groupBy , watermark is supported currently . Reason: Watermark in Structured Streaming is used for limiting the size of state needed to keep intermediate information in-memory. And state only comes in picture in case of statefu

Watermarking without aggregation with Structured Streaming

2018-09-22 Thread peay
Hello, I am trying to use watermarking without aggregation, to filter out records that are just too late, instead of appending them to the output. My understanding is that aggregation is required for `withWatermark` to have any effect. Is that correct? I am looking for something along the line