Also, is there a way to specify to Spark that it shouldn't resubmit failed
stages/tasks, but fail-fast in case any fetch failure occurs?

Grega
--
[image: Inline image 1]
*Grega Kešpret*
Analytics engineer

Celtra — Rich Media Mobile Advertising
celtra.com <http://www.celtra.com/> |
@celtramobile<http://www.twitter.com/celtramobile>


On Mon, Nov 25, 2013 at 9:58 AM, Grega Kešpret <[email protected]> wrote:

> Hi!
>
> We use Spark to process logs in batches and persist the end result in a
> db. Last week, we re-ran the job on the same data couple of times, only to
> find that one run had more results than the rest. Digging through the logs,
> we found out that a task has been lost and marked for resubmission.
>
> I marked the lines here:
>
> https://gist.github.com/gregakespret/7541805#file-spark-fetch-failure-L1432-L1509
>
> Because of that, one block of data was processed two times and the final
> result was not correct.
>
> My question is how can we catch such occurrences in the code, so that we
> can do an effective rollback/discard the data that will get recomputed?
>
> Thanks,
>
>
> Grega
> --
> [image: Inline image 1]
> *Grega Kešpret*
> Analytics engineer
>
> Celtra — Rich Media Mobile Advertising
> celtra.com <http://www.celtra.com/> | 
> @celtramobile<http://www.twitter.com/celtramobile>
>

<<celtra_logo.png>>

Reply via email to