Kafka and micro-batch processing dilemma

Krzysztof Jankiewicz Wed, 09 Jun 2021 09:06:14 -0700

Hi,

I have recently made some experiments with stream data processing systems:
Kafka Streams, Spark Structured Streaming, and Flink.


Spark Structured Streaming has micro-batch architecture.
Kafka Streams is described as having a full stream architecture.

I have noticed a few days ago, that by default Kafka Streams does not
process events in the data stream immediately.
The results are sent to the output approximately every 30 seconds.

So where is the difference between micro-batch and full stream processing
performed by Kafka Streams?

I wonder if I set the Spark Structured Streaming trigger to 30 seconds and
outputMode to update, can I notice any difference in stream data
processing?
Does anyone have any example of a data stream that would show the
differences?

Kind regards
Krzysztof

Kafka and micro-batch processing dilemma

Reply via email to