vanzin commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-557333525 To me, "rolling log" is an application concern. It helps e.g. with matching a part of the event log to when errors occurred, instead of having to dig through gigabytes of data in a large file to find that information. "Compaction" to me is an admin concern, like cleaning up old event logs files. It's something that apps don't care about, but admins do. And thus it's something that belongs in the SHS, just like the log cleaner. In fact, compaction isn't even necessarily related to the concept of rolling log. You can compact an existing humongous event log to just contain the data the SHS would show. The differences between the examples you mention (streaming query vs. long batch job) can be worked around in the code. e.g., for the long batch case, you can decide not to compact because there are not enough finished jobs after parsing an event log. But let's say you parse 2 or 3 of them and then you start seeing jobs going away, you can do some compaction. You could keep some state to help with figuring that out. Anyway, long way of saying that no, compaction does not belong in the driver.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
