Thanks a lot! That's very responsive, somebody definitely has encountered the same problem before, and added two hidden modes in masterURL:

(from SparkContext.scala: line1431)

// Regular expression for local[N, maxRetries], used in tests with failing tasks val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+)\s*,\s*([0-9]+)\]""".r // Regular expression for simulating a Spark cluster of [N, cores, memory] locally val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r

Unfortunately they never got pushed into the documentation, and you got config parameters scattered in two different places (masterURL and $spark.task.maxFailures). I'm thinking of adding a new config parameter $spark.task.maxLocalFailures to override 1, how do you think?

Thanks again buddy.

Yours Peng

On Mon 09 Jun 2014 01:33:45 PM EDT, Aaron Davidson wrote:
Looks like your problem is local mode:
https://github.com/apache/spark/blob/640f9a0efefd42cff86aecd4878a3a57f5ae85fa/core/src/main/scala/org/apache/spark/SparkContext.scala#L1430

For some reason, someone decided not to do retries when running in
local mode. Not exactly sure why, feel free to submit a JIRA on this.


On Mon, Jun 9, 2014 at 8:59 AM, Peng Cheng <pc...@uow.edu.au
<mailto:pc...@uow.edu.au>> wrote:

    I speculate that Spark will only retry on exceptions that are
    registered with
    TaskSetScheduler, so a definitely-will-fail task will fail quickly
    without
    taking more resources. However I haven't found any documentation
    or web page
    on it



    --
    View this message in context:
    
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-fault-tolerance-tp7250p7255.html
    Sent from the Apache Spark User List mailing list archive at
    Nabble.com.


Reply via email to