baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-639668009


   I did a comparison between hybridstore, in-memory store, and levledb for 
parsing large log files. The environment is mac os, spark-3.1.0.
   
   | store/ log file size | 250mb (SparkPi with 100000 iterations)   | 1.2gb 
(SparkPi with 500000 iterations)    | 2.3gb (SparkPi with 900000 iterations)    
|
   | ---------------------- | ---------------------------------------- | 
----------------------------------------- | 
----------------------------------------- |
   | HybridStore            | 8s to parse, 16s to dump data to leveldb | 33s to 
parse, 16s to dump data to leveldb | 68s to parse, 15s to dump data to leveldb |
   | InMemoryStore          | 8s to parse                              | 34s to 
parse                              | 68s to parse                              |
   | LevelDB                | 28s to parse                             | 187s 
to parse                             | 434s to parse                            
 |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to