shuyouZZ opened a new pull request, #38983:
URL: https://github.com/apache/spark/pull/38983

   ### What changes were proposed in this pull request?
   When restarting the history server, In history server log,
   we can see many `INFO FsHIstoryProvider: Finished parsing application_xxx` 
followed by
   `INFO FsHIstoryProvider: Deleting expired event log for application_xxx`.
   The logic in `startPolling` is to execute `checkForLogs` first,
   which will cause the expired event log files to be parsed, and then execute 
`checkAndCleanLog`
   to delete parsed info, this means that the parsing is unnecessary.
   
   If there are a large number of expired log files in the log directory, it 
will affect the speed of replay.
   
   In order to avoid this, we can put `cleanLogs` before `checkForLogs`.
   
   In addition, since `cleanLogs` is executed before `checkForLogs`, when the 
history server is starting,
   the expired log info may not exist in the listing db, so we need to clean up 
these log files in `cleanLogs`.
   
   ### Why are the changes needed?
       
       Avoid parse unnecessary log files
   
   ### Does this PR introduce _any_ user-facing change?
   
       NO
       
   ### How was this patch tested?
       
       UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to