Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 @vanzin > For the feature, it feels like it's trying to make the SHS more like a "log management system" than a history server. Sorry, I do not get it. I just add a new clean-up mode, but not add the cleaner itself. One is based on age, and the another one is base on space usage. That's all. > But I'll entertain the thought, even though you can probably get pretty close to this by using time-based deletion with a shorter max age, coupled with log compression (both features that already exist). I think you may not get what I mean. You can set the `max age` a very very small value, and use compression or whatever. Yes, you can use few space finally. But, this way will delete many latest job event logs, and then we can not review job history. Because, we use a wrong way to achieve restricting space usage. > For the implementation, you cannot delete things just based on size. You need to account for time too; you have to delete older logs first, otherwise you risk deleting the logs for just finished applications instead of a large log that's been sitting there for months. I think current `space` mode is deleting logs base on space usage and oldest file first. If you do not think so, there may be something wrong in my implementation. I will check it. > You change will also bombard the NameNode with requests on every scan, to get the size of each log. The `cleanLog` will be called in every `CLEAN_INTERNAL_S` (1 day default), https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L218 If `CLEAN_INTERNAL_S` is set as a very very small value, current implementation has hurt NameNode already. So, I do not think `get the size of each log` will hurt NameNode greatly.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org