The underlying issue is a filesystem corruption on the workers. In the case where I use hdfs, with a sufficient amount of replica, would Spark try to launch a task on another node where the block replica is present?
Thanks :-) -- Henri Maxime Demoulin 2015-06-29 9:10 GMT-04:00 ayan guha <guha.a...@gmail.com>: > No, spark can not do that as it does not replicate partitions (so no retry > on different worker). It seems your cluster is not provisioned with correct > permissions. I would suggest to automate node provisioning. > > On Mon, Jun 29, 2015 at 11:04 PM, maxdml <maxdemou...@gmail.com> wrote: > >> Hi there, >> >> I have some traces from my master and some workers where for some reason, >> the ./work directory of an application can not be created on the workers. >> There is also an issue with the master's temp directory creation. >> >> master logs: http://pastebin.com/v3NCzm0u >> worker's logs: http://pastebin.com/Ninkscnx >> >> It seems that some of the executors can create the directories, but as >> some >> others are repetitively failing, the job ends up failing. Shouldn't spark >> manage to keep working with a smallest number of executors instead of >> failing? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha >