Hi all!
I'm using Spark structured streaming for a data ingestion pipeline.
Basically the pipeline reads events (notifications of new available
data) from a Kafka topic and then queries a REST endpoint to get the
real data (within a flatMap).
For one single event the pipeline creates a few thousand records (rows)
that have to be stored. And to write the data I use foreachBatch().
My question is now: Is it guaranteed by Spark that all output records of
one event are always contained in a single batch or can the records also
be split into multiple batches?
Best,
Rico.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org