tgravescs commented on issue #25962: [SPARK-29285][Shuffle] Temporary shuffle files should be able to handle disk failures URL: https://github.com/apache/spark/pull/25962#issuecomment-547664664 Yes it depends on how often your executors are created/destroyed, if using dynamic allocation and a lot of long tail it could be cycling those fairly often and yarn disk checker should help, if not it won't. Lots of jobs it won't help by itself. I mostly wonder how much this helps because it might help with the temp shuffle file, but then it might fail later creating real shuffle file. Is this ok, maybe, but it's potentially changing from failing fast to failing later. if there is a long time between those then you potentially taking longer. There are a lot of mights in that sentence and I don't have any concrete idea how much it will help or hurt. Has this actually been run on real jobs and have you seen a benefit?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
