Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6935#discussion_r51446687
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -42,6 +42,35 @@ import org.apache.spark.util.{Clock, SystemClock, 
ThreadUtils, Utils}
      * A class that provides application history from event logs stored in the 
file system.
      * This provider checks for new finished applications in the background 
periodically and
      * renders the history application UI by parsing the associated event logs.
    + *
    + * ==How new and updated attempts are detected==
    + *
    + * - New attempts are detected in [[checkForLogs]]: the log dir is 
scanned, and any
    + * entries in the log dir whose modification time is greater than the last 
scan time
    + * are considered new or updated. These are replayed to create a new 
[[FsApplicationAttemptInfo]]
    + * entry and update or create a matching [[FsApplicationHistoryInfo]] 
element in the list
    + * of applications.
    + * - Updated attempts are checked by scanning all known attempts, and if 
their file size
    + * has changed, considering them as updated. A new 
[[FsApplicationAttemptInfo]] instance
    + * is created copying over all the original data, the current size, and an 
incremented version
    + * counter. Accordingly, the fact the attempt is updated is detected, but 
there is no replay
    + * cost.
    + * - When [[updateProbe()]] is invoked to check if a loaded [[SparkUI]]
    + * instance is out of date, the version counter of the application attempt 
loaded is
    + * compared with that attempt's current value; the loaded UI is considered 
out of date
    + * if its version is less than that of the current listing.
    + *
    + * The use of a version counter, rather than simply relying on 
modification times, is needed to
    + * address the following issues
    + * - some filesystems do not appear to update the `modtime` value whenever 
data is flushed to
    + * an open file output stream. Changes to the history may not be picked up.
    + * - the granularity of the `modtime` field may be 2+ seconds. Rapid 
changes to the FS can be
    + * missed.
    --- End diff --
    
    the coments about version counters is out of date now, you can replace that 
with file size.  (definitely keep the explanation of why modtime is 
insufficient.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to