HeartSaVioR commented on issue #25577: [WIP][CORE][SPARK-28867] InMemoryStore 
checkpoint to speed up replay log file in HistoryServer
URL: https://github.com/apache/spark/pull/25577#issuecomment-533680034
 
 
   The one of main goals in SPARK-28594 is limiting the overall size of log 
directory per application. (End users have been concerned about it.) That 
means, we should provide a way to roll the event log file within deterministic 
size, which is not applicable to roll file per lines. In the following patch 
I'll introduce max number of files (max file size is introduced in #25670 ) and 
clean up old event files via replacing these old files with snapshot file - so 
it'll take a snapshot for different purpose, though it also helps faster 
reading.
   
   Given two issues take a snapshot for different purposes, I'm kind of OK to 
go with different approaches and consolidate the approach later (assuming the 
snapshot file is compatible). 
   
   One thing I might be concerning about is, we only talk about the new 
approach for in-memory store which Spark hides the implementation of KVStore 
via wrapping it with ElementTrackingStore. The change should be reflected to 
KVStore API so that caller side would deal with the way of snapshotting 
properly. (Now we only add some necessary methods in KVStore to snapshot from 
outside, but if we have both sync/async snapshot for KVStore, that should be 
reflected to the KVStore API.)
   
   To add some context on this, previously (in internal reviewing) I proposed 
snapshotting underlying LevelDB - archiving directory would just work - for 
LevelDB KVStore implementation and I was suggested to find a way to support 
snapshotting for all implementations of KVStore. That's why current snapshot 
mechanism is based on KVStore interface. Once we respect the format of snapshot 
file, both sync/async snapshots would be compatible, but in same spirit, 
ideally we should support both approaches of snapshot smoothly, via KVStore 
interface level.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to