Hi, I'm trying to use Spark 2.1.1 structured streaming to *count the number of records* from Kafka *for each time window* with the code in this GitHub gist <https://gist.github.com/erdavila/b6ab0c216e82ae77fa8192c48cb816e4>.
I expected that, *once each minute* (the slide duration), it would *output a single record* (since the only aggregation key is the window) with the *record count for the last 5 minutes* (the window duration). However, it outputs several records 2-3 times per minute, like in the sample output included in the gist. Changing the output mode to "append" seems to change the behavior, but still far from what I expected. What is wrong with my assumptions on the way it should work? Given the code, how should the sample output be interpreted or used? Thanks, Eduardo