zsxwing commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore checkpoint to speed up replay log file in HistoryServer URL: https://github.com/apache/spark/pull/25577#issuecomment-532867328 > This seems to assume there're only "appends", but the reality is that there're also "updates". This will require special care of updating existing object and it needs to choose one of 1) simply cloning all events 2) copying map and cloning object whenever it is updated 3) let update be synchronous. The trick part we need to deal with is deleting objects. But I think that's doable. We can have a delete flag in the backup map, and delete the objects when merging two maps. I think the major latency comes from the lock when we copying the items from the second map to the first one. But if flushing snapshot is not very slow, we won't accumulate lots of objects in the second map, then the number of objects to copy should be small and the latency should be acceptable. IIRC, AppStatusListener has some codes to avoid flushing items to InMemoryStore too frequently. This is also helpful here.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
