Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/18887#discussion_r137942907
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -316,25 +353,21 @@ private[history] class FsHistoryProvider(conf:
SparkConf, clock: Clock)
try {
val newLastScanTime = getNewLastScanTime()
logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
- val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
- .getOrElse(Seq.empty[FileStatus])
+ val statusList = Option(fs.listStatus(new
Path(logDir))).map(_.toSeq).getOrElse(Nil)
// scan for modified applications, replay and merge them
- val logInfos: Seq[FileStatus] = statusList
+ val logInfos = statusList
.filter { entry =>
- val fileInfo = fileToAppInfo.get(entry.getPath())
- val prevFileSize = if (fileInfo != null) fileInfo.fileSize else
0L
!entry.isDirectory() &&
// FsHistoryProvider generates a hidden file which can't be
read. Accidentally
// reading a garbage file is safe, but we would log an error
which can be scary to
// the end-user.
!entry.getPath().getName().startsWith(".") &&
- prevFileSize < entry.getLen() &&
- SparkHadoopUtil.get.checkAccessPermission(entry, FsAction.READ)
+ SparkHadoopUtil.get.checkAccessPermission(entry,
FsAction.READ) &&
+ recordedFileSize(entry.getPath()) < entry.getLen()
--- End diff --
Can we add a comment to explain what `recordedFileSize(entry.getPath())`
returns? In the original code, the variable name is self descriptive. The new
change does not have it any more.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]