Structured Streaming Microbatch Semantics

Dipl.-Inf. Rico Bergmann Fri, 05 Mar 2021 00:06:32 -0800

Hi all!

I'm using Spark structured streaming for a data ingestion pipeline.Basically the pipeline reads events (notifications of new availabledata) from a Kafka topic and then queries a REST endpoint to get thereal data (within a flatMap).

For one single event the pipeline creates a few thousand records (rows)that have to be stored. And to write the data I use foreachBatch().

My question is now: Is it guaranteed by Spark that all output records ofone event are always contained in a single batch or can the records alsobe split into multiple batches?



Best,

Rico.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Structured Streaming Microbatch Semantics

Reply via email to