Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/22752#discussion_r225908701 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -449,7 +450,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) listing.write(info.copy(lastProcessed = newLastScanTime, fileSize = entry.getLen())) } - if (info.fileSize < entry.getLen()) { + if (info.fileSize < entry.getLen() || checkAbsoluteLength(info, entry)) { --- End diff -- Have you looked @ this getFileLength() call to see how well it updates? FwIW [HADOOP-15606](https://issues.apache.org/jira/browse/HADOOP-15606) proposes adding a method like this for all streams, though that proposal includes the need for specification and tests. Generally the HDFS team are a bit lax about that spec -> test workflow, which doesn't help downstream code or other implementations.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org