Hi all!

I'm using Spark structured streaming for a data ingestion pipeline. Basically the pipeline reads events (notifications of new available data) from a Kafka topic and then queries a REST endpoint to get the real data (within a flatMap).

For one single event the pipeline creates a few thousand records (rows) that have to be stored. And to write the data I use foreachBatch().

My question is now: Is it guaranteed by Spark that all output records of one event are always contained in a single batch or can the records also be split into multiple batches?


Best,

Rico.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to