vanzin commented on issue #26416: [SPARK-29779][CORE] Compact old event log 
files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-557333525
 
 
   To me, "rolling log" is an application concern. It helps e.g. with matching 
a part of the event log to when errors occurred, instead of having to dig 
through gigabytes of data in a large file to find that information.
   
   "Compaction" to me is an admin concern, like cleaning up old event logs 
files. It's something that apps don't care about, but admins do. And thus it's 
something that belongs in the SHS, just like the log cleaner.
   
   In fact, compaction isn't even necessarily related to the concept of rolling 
log. You can compact an existing humongous event log to just contain the data 
the SHS would show.
   
   The differences between the examples you mention (streaming query vs. long 
batch job) can be worked around in the code. e.g., for the long batch case, you 
can decide not to compact because there are not enough finished jobs after 
parsing an event log. But let's say you parse 2 or 3 of them and then you start 
seeing jobs going away, you can do some compaction. You could keep some state 
to help with figuring that out.
   
   Anyway, long way of saying that no, compaction does not belong in the driver.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to