Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19582#discussion_r148901343
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -486,8 +580,21 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
         bus.addListener(listener)
     
         replay(fileStatus, isApplicationCompleted(fileStatus), bus, 
eventsFilter)
    -    listener.applicationInfo.foreach(addListing)
    -    listing.write(LogInfo(logPath.toString(), fileStatus.getLen()))
    +    listener.applicationInfo.foreach { app =>
    +      // Invalidate the existing UI for the reloaded app attempt, if any. 
Note that this does
    +      // not remove the UI from the active list; that has to be done in 
onUIDetached, so that
    +      // cleanup of files can be done in a thread-safe manner. It does 
mean the UI will remain
    +      // in memory for longer than it should.
    --- End diff --
    
    it took me some time to figure how the cache & invalidation worked, mostly 
because I wasn't looking in the right places.  I don't think you've made this 
any more confusing than it was before (in fact its probably better), but seems 
like a good opportunity to improve commenting a little.  I think it might help 
to have one comment in the code where the entire sequence is described ( here 
on `mergeApplicationListing`, or on `AppCache`, or 
`ApplicationCacheCheckFilter`, doesn't really matter, but they could all 
reference the longer comment).  if I understand correctly, it would be 
something like:
    
    Logs of incomplete apps are regularly polled to see if they have been 
updated (based on an increase in file size).  If they have, the existing data 
for that app is marked as invalid in LoadedAppUI.  However, no memory is freed, 
no files are cleaned up at this time, nor is a new UI built.  On each request 
for one app's UI, the application cache is checked  to see if it has a valid 
`LoadedAppUI` in the cache.  If there is data in the cache and its valid, then 
its served.  If there is data in the cache but it is invalid, then the UI is 
rebuilt from the raw event logs.  If there is nothing in the cache, then the UI 
is built from the raw event logs and added to the cache.  This may kick another 
entry out of the cache -- if its for an incomplete app, then any KVStore data 
written to disk is deleted (as the KVStore for an incomplete app is always 
regenerated from scratch anyway). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to