[Structured Streaming] Trying to use Spark structured streaming

Eduardo D'Avila Mon, 11 Sep 2017 08:06:02 -0700

Hi,

I'm trying to use Spark 2.1.1 structured streaming to *count the number of
records* from Kafka *for each time window* with the code in this GitHub gist
<https://gist.github.com/erdavila/b6ab0c216e82ae77fa8192c48cb816e4>.


I expected that, *once each minute* (the slide duration), it would *output
a single record* (since the only aggregation key is the window) with
the *record
count for the last 5 minutes* (the window duration). However, it outputs
several records 2-3 times per minute, like in the sample output included in
the gist.

Changing the output mode to "append" seems to change the behavior, but
still far from what I expected.

What is wrong with my assumptions on the way it should work? Given the
code, how should the sample output be interpreted or used?

Thanks,

Eduardo

[Structured Streaming] Trying to use Spark structured streaming

Reply via email to