Accumulators aren't going to work to communicate state changes between executors. You need external storage.
On Tue, Aug 11, 2015 at 11:28 AM, Shushant Arora <shushantaror...@gmail.com> wrote: > What if processing is neither idempotent nor its in transaction ,say I am > posting events to some external server after processing. > > Is it possible to get accumulator of failed task in retry task? Is there > any way to detect whether this task is retried task or original task ? > > I was trying to achieve something like incrementing a counter after each > event processed and if task fails- retry task will just ignore already > processed events by accessing counter of failed task. Is it directly > possible to access accumulator per task basis without writing to hdfs or > hbase. > > > > > On Tue, Aug 11, 2015 at 3:15 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> >> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers >> >> >> http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations >> >> https://www.youtube.com/watch?v=fXnNEq1v3VA >> >> >> On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> Hi >>> >>> How can I avoid duplicate processing of kafka messages in spark stream >>> 1.3 because of executor failure. >>> >>> 1.Can I some how access accumulators of failed task in retry task to >>> skip those many events which are already processed by failed task on this >>> partition ? >>> >>> 2.Or I ll have to persist each msg processed and then check before >>> processing each msg whether its already processed by failure task and >>> delete this perisited information at each batch end? >>> >> >> >