[GitHub] [spark] baohe-zhang commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

GitBox Thu, 02 Jul 2020 09:36:57 -0700


baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-653110422



   Hi @HeartSaVioR @tgravescs , I measured the memory usage and disk usage for 
a 1.21g log file and logs for the same application with different compression 
codec. The log is generated by spark3 and parsed by spark3 SHS. The application 
contains 400 jobs, each job contains one stage, each stage contains 1000 tasks.
   | codec                                                                      
                  | uncompressed | lz4      | lzf      | snappy   | zstd     |
   | 
--------------------------------------------------------------------------------------------
 | ------------ | -------- | -------- | -------- | -------- |
   | log filesize                                                               
                  | 1.21 gb      | 108 mb   | 128 mb   | 136 mb   | 40 mb    |
   | actual memory usage (measure through Utils.SizeEstimator)                  
                        | 254.8 mb     | 252.1 mb | 260.5 mb | 256.4 mb | 279.2 
mb |
   | estimated memory usage (log size / 2 for uncompressed log, log size \* 2 
for compressed log) | 605 mb       | 216 mb   | 256 mb   | 272 mb   | 80 mb    |
   | disk usage (leveldb filesize)                                              
                  | 393 mb       | 398 mb   | 403 mb   | 395 mb   | 424 mb   |
   
   From the result seems we are overestimating the memory usage of uncompressed 
files and underestimate the memory usage of zstd compressed files. I think 
filesize / 4 for uncompressed log, filesize * 4 for zstd compressed log might 
be a better estimation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] baohe-zhang commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

Reply via email to