[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196076#comment-17196076 ]
Stephen O'Donnell commented on HDDS-3630: ----------------------------------------- Have a look at HDDS-4246 - it seems there is only one 8MB cache shared by all RocksDBs related to datanode containers. Looking at the rocksDB manual, one key memory user is the "write buffer size" https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#write-buffer-size" {quote} It represents the amount of data to build up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk file. The default is 64 MB. You need to budget for 2 x your worst case memory use. If you don't have enough memory for this, you should reduce this value. Otherwise, it is not recommended to change this option. {quote} It seems to be, this default of 64MB is setup for "high write throughput", which is probably a usual use case for RocksDB. However for datanode containers, I doubt rocksDB is really stressed, especially for closed containers. What if we: 1. Reduced this value significantly - eg to 1MB? 2. Reduced it significantly for only closed containers? There are also some other interesting Rocks DB options. You can configure a "Write Buffer Manager" and give it a target size for all RocksDB instances / column families related to write buffers, and then all open instances will share this. You can also make it be part of the LRU cache: https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager And you can have the Index and Filter blocks cached in the LRU cache too via the option - cache_index_and_filter_blocks. Therefore, if we created a large shared LRU cache, use a shared Write Buffer Manager which stored the memtables inside this LRU cache, and also cache the Index and Filter block there - perhaps we could constrain the rocksDB memory within reasonable bounds. It would be good to experiment with some of these options before jumping into a major refactor to use a single RocksDB per disk or other major changes. > Merge rocksdb in datanode > ------------------------- > > Key: HDDS-3630 > URL: https://issues.apache.org/jira/browse/HDDS-3630 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > Attachments: Merge RocksDB in Datanode-v1.pdf, Merge RocksDB in > Datanode-v2.pdf > > > Currently, one rocksdb for one container. one container has 5GB capacity. > 10TB data need more than 2000 rocksdb in one datanode. It's difficult to > limit the memory of 2000 rocksdb. So maybe we should limited instance of > rocksdb for each disk. > The design of improvement is in the follow link, but still is a draft. > TODO: > 1. compatibility with current logic i.e. one rocksdb for each container > 2. measure the memory usage before and after improvement > 3. effect on efficiency of read and write. > https://docs.google.com/document/d/18Ybg-NjyU602c-MYXaJHP6yrg-dVMZKGyoK5C_pp1mM/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org