cxzl25 commented on pull request #30780: URL: https://github.com/apache/spark/pull/30780#issuecomment-745870369
`FsHistoryProvider#checkForLogs` currently gets the file status of each application. If the number of applications is large, for example, there are 10,000, this will add 10,000 rpc calls to obtain file status. 23 minutes is not the time it takes to get the file status of an applicaton, but the approximate time it takes for a single call to `checkForLogs`. Because the `FsHistoryProvider#checkForLogs` call has no logs, I can only use the last completion time of the last round of process tasks and the initial start time of this round of tasks to estimate the duration. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
