yanxiaole opened a new pull request #29350:
URL: https://github.com/apache/spark/pull/29350
# What changes were proposed in this pull request?
This PR adds a `FileNotFoundException` try catch block while adding a new
entry to history server application listing to skip the non-existing path.
### Why are the changes needed?
If there are a large number (>100k) of applications log dir, listing the log
dir will take a few seconds. After getting the path list some applications
might have finished already, and the filename will change from `foo.inprogress`
to `foo`.
It leads to a problem when adding an entry to the listing, querying file
status like `fileSizeForLastIndex` will throw out a `FileNotFoundException`
exception if the application was finished. And the exception will abort current
loop, in a busy cluster, it will make history server couldn't list and load any
application log.
```
20/08/03 15:17:23 ERROR FsHistoryProvider: Exception in checking for event
log updates
java.io.FileNotFoundException: File does not exist:
hdfs://xx/logs/spark/application_11111111111111.lz4.inprogress
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1527)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1520)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1520)
at
org.apache.spark.deploy.history.SingleFileEventLogFileReader.status$lzycompute(EventLogFileReaders.scala:170)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
1. setup another script keeps changing the filename of applications under
history log dir
2. launch the history server
3. check whether the `File does not exist` error log was gone.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]