Hi all, Spark currently has blacklisting enabled on Mesos, no matter what: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend
Blacklisting also prevents new drivers from running on our nodes where previous drivers' had failed tasks. We've tried restarting Spark dispatcher before sending new tasks. Even creating new machines (with the same hostname) does not help. Looking at TaskSetBlacklist <https://github.com/apache/spark/blob/e18d6f5326e0d9ea03d31de5ce04cb84d3b8ab37/core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala#L66> , I don't understand how a fresh Spark job submitted from a fresh Spark Dispatcher starts saying all the nodes are blacklisted right away. How does Spark know previous task failures? This issue severely interrupts us. How could we disable blacklisting on Spark 2.3.0? Creative ideas are welcome :) Best, Han -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
