Thanks again! Looks like the update mode is not available in 2.1 (which
seems to be the latest version as of today) and I am assuming there will be
a way to specify trigger interval with the next release because with the
following code I don't see a way to specify trigger interval.

val windowedCounts = words.groupBy(
  window($"timestamp", "24 hours", "24 hours"),
  $"word").count()


On Mon, Apr 10, 2017 at 12:32 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> It sounds like you want a tumbling window (where the slide and duration
> are the same).  This is the default if you give only one interval.  You
> should set the output mode to "update" (i.e. output only the rows that have
> been updated since the last trigger) and the trigger to "1 second".
>
> Try thinking about the batch query that would produce the answer you
> want.  Structured streaming will figure out an efficient way to compute
> that answer incrementally as new data arrives.
>
> On Mon, Apr 10, 2017 at 12:20 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi Michael,
>>
>> Thanks for the response. I guess I was thinking more in terms of the
>> regular streaming model. so In this case I am little confused what my
>> window interval and slide interval be for the following case?
>>
>> I need to hold a state (say a count) for 24 hours while capturing all its
>> updates and produce results every second. I also need to reset the state
>> (the count) back to zero every 24 hours.
>>
>>
>>
>>
>>
>>
>> On Mon, Apr 10, 2017 at 11:49 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> Nope, structured streaming eliminates the limitation that micro-batching
>>> should affect the results of your streaming query.  Trigger is just an
>>> indication of how often you want to produce results (and if you leave it
>>> blank we just run as quickly as possible).
>>>
>>> To control how tuples are grouped into a window, take a look at the
>>> window
>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time>
>>> function.
>>>
>>> On Thu, Apr 6, 2017 at 10:26 AM, kant kodali <kanth...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Is the trigger interval mentioned in this doc
>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
>>>> the same as batch interval in structured streaming? For example I have a
>>>> long running receiver(not kafka) which sends me a real time stream I want
>>>> to use window interval, slide interval of 24 hours to create the Tumbling
>>>> window effect but I want to process updates every second.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>
>

Reply via email to