zsxwing commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-532841153
 
 
   I'm +1 on taking snapshot in driver rather than SHS. One of the issues I hit 
in the past is that it cannot render UI for a long-running spark application 
because replaying events takes too long. For example, if you have a streaming 
query running 7 days, the event logs will be huge and it may take SHS several 
days to replay events. If we can take snapshot in driver, the number of events 
need to replay in SHS will be small.
   
   > You can't take snapshot asynchronously with live AppStatusListener.
   
   I think we can take snapshot of `InMemoryStore` asynchronously. For example, 
we can have two maps in `InMemoryStore`. Firstly, we write to one map. When 
flushing out, we freeze the current map and new updates go to the other one. We 
can write out the frozen map asynchronously and any query going to 
InMemoryStore can just check both two maps. Then flushing them out, we add all 
the items in backup map to the frozen map and re-activate it. The number of 
items to copy here should be small.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to