Hi,

In the spark docs
<http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node>
it mentions "However, output operations (like foreachRDD) have *at-least
once* semantics, that is, the transformed data may get written to an
external entity more than once in the event of a worker failure. "

I would like to setup a Kafka pipeline whereby I write my data to a single
topic 1, then I continue to process using spark streaming and write the
transformed results to topic2, and finally I read the results from topic 2.
How do I configure the spark streaming so that I can maintain exactly once
semantics when writing to topic 2?

Thanks,
Josh

Reply via email to