zsxwing commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-532867328
 
 
   > This seems to assume there're only "appends", but the reality is that 
there're also "updates". This will require special care of updating existing 
object and it needs to choose one of 1) simply cloning all events 2) copying 
map and cloning object whenever it is updated 3) let update be synchronous.
   
   The trick part we need to deal with is deleting objects. But I think that's 
doable. We can have a delete flag in the backup map, and delete the objects 
when merging two maps. I think the major latency comes from the lock when we 
copying the items from the second map to the first one. But if flushing 
snapshot is not very slow, we won't accumulate lots of objects in the second 
map, then the number of objects to copy should be small and the latency should 
be acceptable. IIRC, AppStatusListener has some codes to avoid flushing items 
to InMemoryStore too frequently. This is also helpful here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to