Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 > The current scan code does not make one request to the NameNode per log file in the directory. Your code does. That should be avoided. Make sense, current implementation can be optimized absolutely, it's my fault. > If they come last, then you're first accounting for log sizes of apps that have already finished and might end up trying to delete logs from apps that are still running (!!!). I get what you mean. Currently, order of apps log depends on the last attempts log time: ``` private def compareAppInfo( i1: FsApplicationHistoryInfo, i2: FsApplicationHistoryInfo): Boolean = { val a1 = i1.attempts.head val a2 = i2.attempts.head if (a1.endTime != a2.endTime) a1.endTime >= a2.endTime else a1.startTime >= a2.startTime } ``` So, if need clean up, completed job logs will be cleaned firstly, and the older in-progress (if exists). #16165 has supported deleting too old in-progress job logs. So I think it is OK in this case.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org