Ngone51 commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-532737445
 
 
   > For snapshotting in driver, I also considered both sides (driver/SHS) in 
design phase but took SHS because listener is the thing which must not be 
blocked by long operation.
   
   Well, in SPARK-28867, I don't think there's a choice of creating snapshot on 
driver or SHS. As I mentioned above, there can be a case where SHS not running 
before an application finished. So, I think it can only create snapshot on 
driver.
   
   > Either you'll end up with cloning InMemoryStore and snapshotting with 
cloned one (requires more memory on driver, but unless deep copy is required, 
then maybe applicable), or block listener to process next event until snapshot 
has been taken. You can't take snapshot asynchronously with live 
AppStatusListener.
   
   Yes, you're right. And in this PR, we don't have a copy for InMemoryStore 
and we just block(but with optimizations) during snapshotting. I also realized 
that listener couldn't take too much time while processing events. So, in this 
PR, we use a separate thread to do snapshot and we don't block for frequent 
events(e.g. `SparkListenerTaskStart`, `SparkListenerTaskEnd`). Snapshotting is 
more likely to be triggered within frequent incoming events, and this would 
make blocking less happening.
   
   > ideally you're ensuring your design is already reviewed internally in your 
team, but that's optional.
   
   Actually, I've reached an agreement with @gengliangwang before I made this 
PR.
   
   > I can see the major benefits from taking snapshot from driver, so if you 
can prove it doesn't bring considerable latency in listener side and no memory 
pressure, it would be better approach.
   
   As you know, this's still a work in process and I only wrote a prototype yet 
in order to validate the thoughts. And there still be some work to be done  
before this PR gets perfectly work. The one is that correctly handling 
different behavior between AppStatusListener and EventLoggingListener, which I 
mentioned above. Maybe, we could resolve it with more careful control at code 
level, but I'm afraid the whole idea is not good enough and codes will result 
in buggy finally. All in all, this still couldn't be a competed implementation 
yet. So, for me, it still far from testing of latency and memory. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to