Github user devaraj-kavali commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22752#discussion_r226014153
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -449,7 +450,7 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
                   listing.write(info.copy(lastProcessed = newLastScanTime, 
fileSize = entry.getLen()))
                 }
     
    -            if (info.fileSize < entry.getLen()) {
    +            if (info.fileSize < entry.getLen() || 
checkAbsoluteLength(info, entry)) {
    --- End diff --
    
    Thanks @steveloughran for looking into this.
    > Have you looked @ this getFileLength() call to see how well it updates?
    
    I looked at the DFSInputStream.getFileLength() api, it gives 
locatedBlocks.getFileLength() + lastBlockBeingWrittenLength, here 
locatedBlocks.getFileLength() is the value got from NameNode for all the 
completed blocks and lastBlockBeingWrittenLength is the lastblock lenth from 
DataNode which is not the completed block.
    
    > FwIW HADOOP-15606 proposes adding a method like this for all streams
    
    Thanks for the pointer, once this is available we can update to use it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to