HeartSaVioR edited a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-554808184 Hmm... I'm seeing the needs to compact the old event log files in driver instead of SHS, though I know the suggestion has been doing in SHS because driver does too many things. The main reason is, the scope of configuration between "rolling event log" and "compaction" are going to be different. Former works for application-wise, latter works for SHS-wise. Does it matter? I guess yes. 1) We're going to enable end users to manage the overall size of event log files via "max size of event log file" * "number of files to retain", which former is application-wise, whereas latter is SHS-wise. The combination could be set to the unintentional one. Btw, there's not only bad part on it, it can be treated as flexibility - max files to retain can be "modified" in SHS if we want to compact more for more space whereas it cannot be modified in driver. 2) Due to the approach, the compaction makes only streaming query happy, and for batch it just brings huge overheads with no change. (It might also work if the application is jobserver-like, and only executing "interactive" queries.) In other words, compaction should be configured per application to let end users only set to the streaming query. SHS-wise configuration doesn't allow it. One of alternative if we really want to avoid having this in driver is, letting driver to pass the app's configuration to the SHS. We may only need to have this for rolling event log, so we don't need to worry about compatibility, which makes things easier. One more, maybe a big deal or not, managing the overall size of event log files only works when driver and SHS work together. If we want to guarantee this strictly, the compaction is better to be added to the driver, or at least SHS should check and do the compaction more aggressively. (For now I placed the compaction only when the APP UI should be reloaded, but that may be too loose.) @vanzin @squito Would like to hear your thought on this, as you have been leading the efforts on event logging.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
