I searched the code base and looked at: https://spark.apache.org/docs/latest/running-on-yarn.html
I didn't find mapred.max.map.failures.percent or its counterpart. FYI On Fri, Nov 13, 2015 at 9:05 AM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > I know a task can fail 2 times and only the 3rd breaks the entire job. > > I am good with this number of attempts. > > I would like that after trying a task 3 times, it continues with the other > tasks. > > The job can be "failed", but I want all tasks run. > > Please see my use case. > > > I read a hadoop input set, and some gzip files are incomplete. I would > like to just skip them and the only way I see is to tell Spark to ignore > some tasks permanently failing, if it is possible. With traditional hadoop > map-reduce this was possible using mapred.max.map.failures.percent. > > > Do map-reduce params like mapred.max.map.failures.percent apply to > Spark/YARN map-reduce jobs ? > > I edited $HADOOP_CONF_DIR/mapred-site.xml and > added mapred.max.map.failures.percent=30 but does not seem to apply, job > still failed after 3 task attempt fails. > > > Should Spark transmit this parameter? Or the mapred.* do not apply? > > Do other hadoop parameters (e.g. the ones involved in the input reading, > not in the "processing" or "application" like this max.map.failures) - are > others taken into account and transmitted? I saw that it should scan > HADOOP_CONF_DIR and forward those, but I guess this does not apply to any > parameter, since Spark has its own distribution & DAG stages processing > logic, which just happens to have a YARN implementation. > > > Do you know a way to do this in Spark - to ignore a predefined number of > tasks fail, but allow the job to continue? This way I could see all the > faulty input files in one job run, delete them all and continue with the > rest. > > > Just to mention, doing a manual gzip -t on top of hadoop cat is infeasible > and map-reduce is way faster to scan the 15K files worth 70GB (its doing > 25M/s per node), while the old style hadoop cat is doing much less. > > > Thanks, > > Nicu >