Re: avoid duplicate due to executor failure in spark stream

Cody Koeninger Wed, 12 Aug 2015 07:47:48 -0700

Accumulators aren't going to work to communicate state changes between
executors.  You need external storage.


On Tue, Aug 11, 2015 at 11:28 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> What if processing is neither idempotent nor its in transaction ,say  I am
> posting events to some external server after processing.
>
> Is it possible to get accumulator of failed task in retry task? Is there
> any way to detect whether this task is retried task or original task ?
>
> I was trying to achieve something like incrementing a counter after each
> event processed and if task fails- retry task will just ignore already
> processed events by accessing counter of failed task. Is it directly
> possible to access accumulator per task basis without writing to hdfs or
> hbase.
>
>
>
>
> On Tue, Aug 11, 2015 at 3:15 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>>
>> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
>>
>>
>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations
>>
>> https://www.youtube.com/watch?v=fXnNEq1v3VA
>>
>>
>> On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> How can I avoid duplicate processing of kafka messages in spark stream
>>> 1.3 because of executor failure.
>>>
>>> 1.Can I some how access accumulators of failed task in retry  task to
>>> skip those many events which are already processed by failed task on this
>>> partition ?
>>>
>>> 2.Or I ll have to persist each msg processed and then check before
>>> processing each msg whether its already processed by failure task and
>>> delete this perisited information at each batch end?
>>>
>>
>>
>

Re: avoid duplicate due to executor failure in spark stream

Reply via email to