[GitHub] spark pull request #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingL...

steveloughran Thu, 18 Oct 2018 02:58:25 -0700

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22752#discussion_r226243409
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -449,7 +450,7 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
                   listing.write(info.copy(lastProcessed = newLastScanTime, 
fileSize = entry.getLen()))
                 }
     
    -            if (info.fileSize < entry.getLen()) {
    +            if (info.fileSize < entry.getLen() || 
checkAbsoluteLength(info, entry)) {
    --- End diff --
    
    ...there's no timetable for that getLength thing, but if HDFS already 
supports the API, I'm more motivated to implement it. It has benefits in cloud 
stores in general
    1. saves apps going an up front HEAD/getFileStatus() to know how long their 
data is; the GET should return it.
    2. for S3 Select, you get back the filtered data so don't know how much you 
will see until the GET is issued



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingL...

Reply via email to