Github user attilapiros commented on the issue:

    https://github.com/apache/spark/pull/20203
  
    The node blacklisting is tested by unit tests:
    - HistoryServerSuite
    - TaskSetBlacklistSuite
    - AppStatusListenerSuite
    
    And manually with a 2 node cluster: 
https://issues.apache.org/jira/secure/attachment/12906833/node_blacklisting_for_stage.png
    
    Here you can see apiros3.gce.test.com was node blacklisted for the stage 
because of failures on executor 4 and 5. As expected executor 3 is also 
blacklisted even it has no failures itself but sharing the node with 4 and 5.
    
    Spark was started as:
    
    ``` bash
    ./bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G 
--num-executors=8 --conf "spark.blacklist.enabled=true" --conf 
"spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf 
"spark.blacklist.stage.maxFailedExecutorsPerNode=1"  --conf 
"spark.blacklist.application.maxFailedTasksPerExecutor=10" --conf 
"spark.eventLog.enabled=true"
    ```
    
    And the job was:
    
    ``` scala
    import org.apache.spark.SparkEnv
    
    sc.parallelize(1 to 10000, 10).map { x =>
      if (SparkEnv.get.executorId.toInt >= 4) throw new RuntimeException("Bad 
executor")
        else (x % 3, x)
    }.reduceByKey((a, b) => a + b).collect()
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to