zhouyejoe commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-672428047
@yanxiaole I don't think there will be race condition between the checkForLogs() and cleanLogs() thread. These two threads are launched from the same pool, and the thread pool is defined as a single thread pool. So they won't be able to run simultaneously. But with the latest trunk, I do see there is race condition between the replay Task and checkForLogs(). In some former version of Spark, checkForLogs will wait for all the replay tasks to finish then it exists. But now, replay tasks can run simultaneously with checkForLogs() thread, where checkForLogs scanning stale entries will iterate the listing.ldb and replay tasks can trigger the listing.delete if the log file is out of maximum retention time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
