Hi!

We use Spark to process logs in batches and persist the end result in a db.
Last week, we re-ran the job on the same data couple of times, only to find
that one run had more results than the rest. Digging through the logs, we
found out that a task has been lost and marked for resubmission.

I marked the lines here:
https://gist.github.com/gregakespret/7541805#file-spark-fetch-failure-L1432-L1509

Because of that, one block of data was processed two times and the final
result was not correct.

My question is how can we catch such occurrences in the code, so that we
can do an effective rollback/discard the data that will get recomputed?

Thanks,


Grega
--
[image: Inline image 1]
*Grega Kešpret*
Analytics engineer

Celtra — Rich Media Mobile Advertising
celtra.com <http://www.celtra.com/> |
@celtramobile<http://www.twitter.com/celtramobile>

<<celtra_logo.png>>

Reply via email to