baohe-zhang opened a new pull request #28412:
URL: https://github.com/apache/spark/pull/28412


   ### What changes were proposed in this pull request?
   Add a new class HybridKVStore to make the history server faster when loading 
event files. When rebuilding the application state from event logs, 
HybridKVStore will first write data to an in-memory store and having a 
background thread that keeps pushing the change to levelDB.
   
   ### Why are the changes needed?
   HybridKVStore can significantly reduce the event logs loading time, 
especially for large log files. The table below shows some test results I ran 
on mac os.
   
   kvstore type / log size | 100m | 200m | 500m | 1g | 2g
   -- | -- | -- | -- | -- | --
   HybridKVStore | 5s to parse, 7s(include the parsing time) to switch to 
leveldb | 6s to parse, 10s to switch to leveldb | 15s to parse, 23s to switch 
to leveldb | 23s to parse, 40s to switch to leveldb | 37s to parse, 73s to 
switch to leveldb
   LevelDB | 12s to parse | 19s to parse | 43s to parse | 69s to parse | 124s 
to parse
   
   
   
   ### Does this PR introduce any user-facing change?
   This PR adds a new config "spark.history.store.hybridKVStore.enabled". 
   
   ### How was this patch tested?
   A test suite for HybridKVStore is added. I also manually tested it on 3.0.1 
on mac os.
   
   This is a follow-up for the work done by Hieu Huynh in 2019.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to