Thanks again! Looks like the update mode is not available in 2.1 (which seems to be the latest version as of today) and I am assuming there will be a way to specify trigger interval with the next release because with the following code I don't see a way to specify trigger interval.
val windowedCounts = words.groupBy( window($"timestamp", "24 hours", "24 hours"), $"word").count() On Mon, Apr 10, 2017 at 12:32 PM, Michael Armbrust <mich...@databricks.com> wrote: > It sounds like you want a tumbling window (where the slide and duration > are the same). This is the default if you give only one interval. You > should set the output mode to "update" (i.e. output only the rows that have > been updated since the last trigger) and the trigger to "1 second". > > Try thinking about the batch query that would produce the answer you > want. Structured streaming will figure out an efficient way to compute > that answer incrementally as new data arrives. > > On Mon, Apr 10, 2017 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Michael, >> >> Thanks for the response. I guess I was thinking more in terms of the >> regular streaming model. so In this case I am little confused what my >> window interval and slide interval be for the following case? >> >> I need to hold a state (say a count) for 24 hours while capturing all its >> updates and produce results every second. I also need to reset the state >> (the count) back to zero every 24 hours. >> >> >> >> >> >> >> On Mon, Apr 10, 2017 at 11:49 AM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> Nope, structured streaming eliminates the limitation that micro-batching >>> should affect the results of your streaming query. Trigger is just an >>> indication of how often you want to produce results (and if you leave it >>> blank we just run as quickly as possible). >>> >>> To control how tuples are grouped into a window, take a look at the >>> window >>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time> >>> function. >>> >>> On Thu, Apr 6, 2017 at 10:26 AM, kant kodali <kanth...@gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> Is the trigger interval mentioned in this doc >>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html> >>>> the same as batch interval in structured streaming? For example I have a >>>> long running receiver(not kafka) which sends me a real time stream I want >>>> to use window interval, slide interval of 24 hours to create the Tumbling >>>> window effect but I want to process updates every second. >>>> >>>> Thanks! >>>> >>> >>> >> >