Re: a way to allow spark job to continue despite task failures?

Ted Yu Fri, 13 Nov 2015 16:50:36 -0800

I searched the code base and looked at:
https://spark.apache.org/docs/latest/running-on-yarn.html


I didn't find mapred.max.map.failures.percent or its counterpart.

FYI

On Fri, Nov 13, 2015 at 9:05 AM, Nicolae Marasoiu <
nicolae.maras...@adswizz.com> wrote:

> Hi,
>
>
> I know a task can fail 2 times and only the 3rd breaks the entire job.
>
> I am good with this number of attempts.
>
> I would like that after trying a task 3 times, it continues with the other
> tasks.
>
> The job can be "failed", but I want all tasks run.
>
> Please see my use case.
>
>
> I read a hadoop input set, and some gzip files are incomplete. I would
> like to just skip them and the only way I see is to tell Spark to ignore
> some tasks permanently failing, if it is possible. With traditional hadoop
> map-reduce this was possible using mapred.max.map.failures.percent.
>
>
> Do map-reduce params like mapred.max.map.failures.percent apply to
> Spark/YARN map-reduce jobs ?
>
> I edited $HADOOP_CONF_DIR/mapred-site.xml and
> added mapred.max.map.failures.percent=30 but does not seem to apply, job
> still failed after 3 task attempt fails.
>
>
> Should Spark transmit this parameter? Or the mapred.* do not apply?
>
> Do other hadoop parameters (e.g. the ones involved in the input reading,
> not in the "processing" or "application" like this max.map.failures) - are
> others taken into account and transmitted? I saw that it should scan
> HADOOP_CONF_DIR and forward those, but I guess this does not apply to any
> parameter, since Spark has its own distribution & DAG stages processing
> logic, which just happens to have a YARN implementation.
>
>
> Do you know a way to do this in Spark - to ignore a predefined number of
> tasks fail, but allow the job to continue? This way I could see all the
> faulty input files in one job run, delete them all and continue with the
> rest.
>
>
> Just to mention, doing a manual gzip -t on top of hadoop cat is infeasible
> and map-reduce is way faster to scan the 15K files worth 70GB (its doing
> 25M/s per node), while the old style hadoop cat is doing much less.
>
>
> Thanks,
>
> Nicu
>

Re: a way to allow spark job to continue despite task failures?

Reply via email to