Re: kafka pipeline exactly once semantics

2014-11-30 Thread Tobias Pfeiffer
Josh,

On Sun, Nov 30, 2014 at 10:17 PM, Josh J  wrote:
>
> I would like to setup a Kafka pipeline whereby I write my data to a single
> topic 1, then I continue to process using spark streaming and write the
> transformed results to topic2, and finally I read the results from topic 2.
>

Not really related to your question, but you may also want to look into
Samza  which was built exactly for this
kind of processing.

Tobias


kafka pipeline exactly once semantics

2014-11-30 Thread Josh J
Hi,

In the spark docs

it mentions "However, output operations (like foreachRDD) have *at-least
once* semantics, that is, the transformed data may get written to an
external entity more than once in the event of a worker failure. "

I would like to setup a Kafka pipeline whereby I write my data to a single
topic 1, then I continue to process using spark streaming and write the
transformed results to topic2, and finally I read the results from topic 2.
How do I configure the spark streaming so that I can maintain exactly once
semantics when writing to topic 2?

Thanks,
Josh