Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/22444
I see the reasoning here
* @jianjianjiao has a very large cluster with many thousands of history
files of past (successful) jobs.
* history server startup needs to go through all these logs before being
usable, so any server restart results in hours of downtime, just from scanning.
* this patch breaks things up to be incremental.
I don't have any opinions on the patch itself; I've not looked at that code
for so long my reviews are probably dangerous.
Two thought:
1. would it make sense for the initial scans to go for the most recent logs
first, because that 2.5 hour time to scan all files is still there.
1. would you want the UI and rest api to indicate that the scan was still
in progress, and not to worry if the listing was incomplete?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]