kafka pipeline exactly once semantics

Josh J Sun, 30 Nov 2014 05:19:44 -0800

Hi,

In the spark docs
<http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node>
it mentions "However, output operations (like foreachRDD) have *at-least
once* semantics, that is, the transformed data may get written to an
external entity more than once in the event of a worker failure. "


I would like to setup a Kafka pipeline whereby I write my data to a single
topic 1, then I continue to process using spark streaming and write the
transformed results to topic2, and finally I read the results from topic 2.
How do I configure the spark streaming so that I can maintain exactly once
semantics when writing to topic 2?

Thanks,
Josh

kafka pipeline exactly once semantics

Reply via email to