Hi again, Ok, now I do not know of any way to fix the problem other then delete the "bad" machine from the config + restart .. And you will need admin privileges on cluster for that :(
However, before we give up on the speculative execution, I suspect that the task is being run again and again on the same "faulty" machine because that is where the data resides. You could try to store / persist your RDD with MEMORY_ONLY_2 or MEMORY_AND_DISK_2 as that will force the creation of a replica of the data on another node. Thus, with two nodes, the scheduler may choose to execute the speculative task on the second node (I'm not sure about his as I am just not familiar enough with the Sparks scheduler priorities). I'm not very hopeful but it may be worth a try (if you have the disk/RAM space to be able to afford to duplicate all the data that is). If not, I am afraid I am out of ideas ;) Regards and good luck, Gylfi. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org