Ngone51 commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-542717490
 
 
   > As long as you have the SHS configured to use local disk, after it parses 
the logs once, it'll just read the leveldb kvstore which will be very fast.
   
   Yeah, but this has a prerequisite that SHS should always be running with 
those in-progress applications. If SHS is not running(e.g. not started, 
crashed) while in-progress applications are running, we'll have completed event 
log files at the end. Then, a running SHS would need to parse those completed 
event log files from start to end.
   
   But, to be honest, I feel this can be rare case as user should have a 
running SHS along with in-progress application in most time if he/she really 
wants to use SHS. But, whenever SHS shutdown or user directly provides some 
completed event log files from somewhere else, problem of slow replaying still 
exists.
   
   And I agree that, with SPARK-28594, the first time parse (or say, replay) in 
SHS will be much better than current, as we'd always parse files from start to 
end whenever that in-progress files updated currently. Of course, we should let 
SPARK-28594 or start a new task to support optimizing single file later while 
it only supports multiple rolled files yet. 
   
   All in all, if we could ignore that rare case, SPARK-28867 wouldn't be 
really necessary after SPARK-28594 is done.
   
   > Is your goal to avoid having the SHS even parse the file one time?
   
   If an application finished normally, then we could avoid parsing the file. 
If not(e.g. crash),
   we'd need to do incremental parse basing on the snapshot before crash. 
   
   >  If you really wanted to do that, I'd have the driver just write out the 
leveldb kvstore when the application terminates. 
   
   Actually, this is the original plan when we try to do this. But as 
@gengliangwang points out that application may crash unexpectedly and LevelDB 
could be corrupt. So, we made current plan, which snapshot periodically and do 
incremental parse if application crashes.
   
   > the pr description seems to mostly focus on in-progress, so I'm surprised 
you're saying this is primarily for complete applications.
   
   Sorry for the misleading. As mentioned above, if an application finished 
normally, then, we don't need to parse the file. So this path can be more 
simple. But if application crashes, then, we'd need to do incremental parse, 
which requires more works(e.g. recover live entities, decide from where to 
continue). So, I paid more effort on explaining how do we handle in-progress 
application.
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to