Github user zhaoyunjiong commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14887#discussion_r77217138
  
    --- Diff: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 ---
    @@ -25,6 +25,8 @@
     import com.google.common.collect.Lists;
     import org.apache.hadoop.conf.Configuration;
     import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.util.DiskChecker;
    --- End diff --
    
    Yes, my log is from spark 1.6, and in that branch YarnShuffleService will 
put registeredExecutors.ldb under localDirs[0].
    
    By default yarn.nodemanager.recovery.enabled was set to false and 
spark.yarn.shuffle.stopOnFailure was set to false, lots of user will use those 
default setting, so I believe that users will continue have issue when 
localDirs[0] broken.
    `    if (_recoveryPath == null) {
          _recoveryPath = new Path(localDirs[0]);
        }`
    
    At least set spark.yarn.shuffle.stopOnFailure's default value to true will 
help user.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to