HeartSaVioR edited a comment on issue #26416: [SPARK-29779][CORE] Compact old 
event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-554808184
 
 
   Hmm... I'm seeing the needs to compact the old event log files in driver 
instead of SHS, though I know the suggestion has been doing in SHS because 
driver does too many things. 
   
   The main reason is, the scope of configuration between "rolling event log" 
and "compaction" are going to be different. Former works for application-wise, 
latter works for SHS-wise. Does it matter? I guess yes.
   
   1) We're going to enable end users to manage the overall size of event log 
files via ("max size of event log file" * "number of files to retain"), which 
former is application-wise, whereas latter is SHS-wise. The combination could 
be set to the unintentional one. Btw, there's not only bad part on it, it can 
be treated as flexibility - max files to retain can be "modified" in SHS if we 
want to compact more for more space whereas it cannot be modified in driver.
   
   2) Due to the approach, the compaction makes only streaming query happy, and 
for batch it just brings huge overheads with no change. (It might also work if 
the application is jobserver-like, and only executing "interactive" queries.) 
In other words, compaction should be configured per application to let end 
users only set to the streaming query. SHS-wise configuration doesn't allow it. 
We might be able to guide "don't use rolling event log for batch query" but it 
would sound odd to understand, because rolling event log for batch query would 
work perfectly "without compaction".
   
   One of alternative if we really want to avoid having this in driver is, 
letting driver to pass the app's configuration to the SHS. We may only need to 
have this for rolling event log, so we don't need to worry about compatibility, 
which makes things easier.
   
   One more, maybe a big deal or not, managing the overall size of event log 
files only works when driver and SHS work together. If we want to guarantee 
this strictly, the compaction is better to be added to the driver, or at least 
SHS should check and do the compaction more aggressively. (For now I placed the 
compaction only when the APP UI should be reloaded, but that may be too loose.)
   
   @vanzin @squito Would like to hear your thought on this, as you have been 
leading the efforts on event logging.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to