elek commented on issue #726: HDDS-3267. replace ContainerCache in BlockUtils by LoadingCache URL: https://github.com/apache/hadoop-ozone/pull/726#issuecomment-610966771 Thanks the patch @esahekmat, I am trying to understand it, but to be honest I am not very familiar with this code path. I found this comment from @fapifta which was referenced by the Jira: > LoadingCache is a guava utility, it provides and LRU cache, and provides facilities to create the instances that are cached, and also guava provides facilities to configure how the cache works. What we need to consider before replacing is that why the ContainerCache is implemented in a way where you can not evict a DB instance from the cache if there is a reference for it. Looking at all the usages of BlockUtils getDb and removeDB methods, all the cached instances are accessed for just short term, and even though creating an instance might be costly, with the default 1024 cache size if the LoadingCache evicts the least recently used item when the cache is full, we do not suffer a huge perf hit, while in the current implementation if the cache is full, then we get an exception, so I would definitely trade this off. > With LoadingCache which just have a size restriction, we can have the same cache functionality, if we get rid of reference counting which I think we do not need, as we use the instances just for short periods of time, so most of the time they are in the cache with a zero reference count, and can be evicted. If we still want to ensure that in extreme cases when the cache is pressured, and all items in the cache are referenced, we can use weakValues() in the guava cache to prevent eviction in this case, but this does not seem to be relevant based on the usage of the cached items. Based on my understanding (but fix me if I am wrong) the reference count is used to check if the DB is used by anybody at the moment. The reference count is increased when the db is retrieved from the cache and decreased at the auto closable block. If I understand well, this is removed in this patch. Let's say we have one code block which starts to use the DB but after a very short time the DB is evicted from the cache (which also means that it will be closed). The code block still tries to use the db reference and an exception will be thrown. With LRU cache there is a very low chance to this, unless we have more open DBs at the same time than the limit. Which means that DB can be evicted after a get. It is a possible problem? What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
