tanvn commented on PR #36424:
URL: https://github.com/apache/spark/pull/36424#issuecomment-1119230516

   @gengliangwang 
   Thank you for taking your time.
   Regarding `isProcessing ` method, in the scenario I wrote in the PR summary:
   In the next run of checkForLogs, now the AAA application has finished, the 
log `viewfs://iu/log/spark3/AAA_1.inprogress` has been deleted and a new log 
`viewfs://iu/log/spark3/AAA_1` is created. 
   In this case, `viewfs://iu/log/spark3/AAA_1` will be marked as `processing`.
   
https://github.com/apache/spark/blob/v3.2.1/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1391
   but the `stale` is taking data from the `listing` map, 
   
https://github.com/apache/spark/blob/v3.2.1/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L586-L592
   which means it will contain `viewfs://iu/log/spark3/AAA_1.inprogress` (as it 
is added to the `listing` in the previous execution of `checkForLogs`) and 
`isProcessing` for `viewfs://iu/log/spark3/AAA_1.inprogress` will return false, 
so `cleanAppData` will be executed for it (which I think there is no problem)
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to