[kudu-CR] KUDU-613: Introduce SLRU cache

Mahesh Reddy (Code Review) Thu, 16 May 2024 14:21:11 -0700

Mahesh Reddy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20607 )


Change subject: KUDU-613: Introduce SLRU cache
......................................................................


Patch Set 18:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20607/16//COMMIT_MSG@22
PS16, Line 22: s and 2.6GHz
> It would be nice to mentions the following details:
This was a DEBUG build run locally on macOS. 6 cores and 2.6 GHz. I'll try to 
run it on a release build on a linux machine, but having some issues with 
va1022 right now.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache-bench.cc
File src/kudu/util/cache-bench.cc:

http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache-bench.cc@92
PS16, Line 92:       ret += " SLRU";
> nit for here and elsewhere: separate variable and the operator by space
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h
File src/kudu/util/cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/cache.h@520
PS17, Line 520:   // Returns true if probationary segment contains key.
              :   // Returns false if not.
              :   // Only used for SLRU cache.
              :   virtual bool ProbationaryContains(const Slice& key) = 0;
              :
              :   // Returns true if protected segment contains key.
              :   // Returns false if not.
              :   // Only used for SLRU cache.
              :   virtual bool ProtectedContains(const Slice& key) = 0;
> SLRUCacheShard is the right place for these, even without any extras, I don
I guess the issue stems from returning a pointer to Cache rather than 
ShardedSLRUCache. If we return a pointer to Cache, then all methods used in the 
SLRU cache will have to be defined in the base class and overridden in the 
appropriate derived class. If we return a pointer to ShardedSLRUCache, then we 
don't have to define all these methods in the base class, but then the 
integration with block_cache and the cache tests is more difficult as they all 
accept a pointer to Cache for the old LRU implementation, but they don't accept 
a pointer to the new ShardedSLRUCache.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h
File src/kudu/util/cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@58
PS16, Line 58:
> nit: consider either making this part of the comment more generic (as it sh
Done


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@91
PS16, Line 91: in_protected_segm
> nit: should be rather named 'in_protected_segment'?
Done


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/cache.h@116
PS16, Line 116: Sanitize
> style nit: since this isn't an accessor-like method, its name should be cap
Done


http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h
File src/kudu/util/cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/cache.h@228
PS14, Line 228:     // ensure this is safe, e.g. by destructing the entity that 
owned the
              :     // original metrics.
              :     kReset,
              :   };
              :   // Set the cache metrics to update corresponding counters 
accordingly.
              :   virtual void SetMetrics(std::unique_ptr<CacheMetrics> metrics,
              :                           ExistingMetricsPolicy metrics_policy) 
= 0;
              :
              :   // Sets probationary and protected segments metrics to update 
corresponding counters accordingly.
              :   //
              :   // An upgrade of one entry will increment the number of 
evictions from the probationary segment by
              :   // one and the number of insertions into the protected 
segment by one.
              :   //
              :   // A downgrade of one entry will increment the number of 
evictions from the protected segment by
              :   // one and the number of insertions into the probationary 
segment by one.
              :   //
              :   // To calculate the number of inserts for the SLRU cache, 
subtract the number of evictions
              :   // of the protected segment from the number of inserts of the 
probationary segment.
              :   //
> OK.  But even with that, instead of polluting the interface of the base Cac
SetMetrics() is used in other areas of the code that use a cache (file_cache, 
ttl_cache), so that can't be removed from the Cache interface. Similar to my 
comment from ProbationaryContains and ProtectedContains, this has to do with 
returning a pointer to Cache instead of ShardedSLRUCache so all methods will 
have to be defined here.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/nvm_cache.h
File src/kudu/util/nvm_cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/nvm_cache.h@46
PS16, Line 46:   Slice key() const {
             :     return Slice(kv_data, key_length);
             :   }
> Is this really needed?  I didn't see NVM-based implementation of SLRU cache
Removed it for now, a follow up patch with the NVM-based implementation of SLRU 
cache will include this instead.


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache-test.cc
File src/kudu/util/slru_cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache-test.cc@413
PS17, Line 413: evicted to probationary segment
> If the probationary segment doesn't have enough free space for this new ent
Yep that's right. I guess the entry being upgraded from the probationary 
segment will create some space in the probationary segment, but if the entry 
being evicted from the protected segment is larger than the entry being 
upgraded, then more entries from the probationary segment will have to be 
evicted to create space. Added a test case to test for this behavior.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.h
File src/kudu/util/slru_cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.h@40
PS16, Line 40: NewSLRUCache
> Instead, is it possible to use the same approach as in cache.cc where it's
I guess then the question is what are the template parameters? It makes sense 
in cache.cc since there are multiple possible eviction policies (FIFO and LRU). 
This file is just for DRAM and for SLRU eviction policy so I don't think 
specializing this for template parameters makes sense since there's only one 
combination of parameters. Let me know if you think otherwise or if there are 
other benefits to doing this.


http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/slru_cache.h
File src/kudu/util/slru_cache.h:

http://gerrit.cloudera.org:8080/#/c/20607/14/src/kudu/util/slru_cache.h@40
PS14, Line 40: Cache
> ShardedSLRUCache inherits from Cache, so the returned ShardedSLRUCache* cou
Tried to make this change, but the cache in block_cache is a unique pointer to 
Cache. Given the old LRU cache returns Cache*, it makes the most sense to keep 
this as Cache* as it doesn't accept ShardedSLRUCache*.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc
File src/kudu/util/slru_cache.cc:

http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@57
PS16, Line 57: // A single shard of s
> Did you consider using the CacheShard template, templatized by EvictionPoli
It's definitely possible, however, the cache file would be even more polluted 
with SLRUCacheShard specific methods (ProtectedInsert(), 
InsertAndReturnEvicted(), ReInsert()). I also removed the use of the mutex in 
the SLRUCacheShard level methods since the ShardPair has a mutex now, so all 
methods on the shard level with a mutex would need a new method just for 
SLRUCacheShard.

So it's definitely possible, but it would pollute the cache.cc file even more. 
Let me know if you think it's worth the tradeoff.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@254
PS16, Line 254:
> What guarantees do we have to be sure that 'table_' cannot be modified unde
Only SLRUCacheShardPair::Lookup() calls this method, and within it everything 
is protected by a mutex. So until that mutex is released, no other ShardPair 
method can access the underlying shards and thus the underlying table within 
each shard. I'll add a note about the use of the mutex in the pair level 
methods.


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@267
PS16, Line 267:   return reinterpret_cast<Cache
> ditto: what if Insert() is running concurrently and modifying the 'table_'
Its only usage in the slru_cache.cc is protected by a mutex, similar to the 
explanation above where it's used within a ShardPair method that's protected by 
a mutex. Other than that, it's only used in unit tests through 
ProbationaryContains and ProtectedContains..


http://gerrit.cloudera.org:8080/#/c/20607/16/src/kudu/util/slru_cache.cc@401
PS16, Line 401:   }
              : }
> nit here and below: define 'e' along with assigning the value at the same l
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc
File src/kudu/util/slru_cache.cc:

http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@57
PS17, Line 57: ingle shard of
> Does it make sense to explicitly forbid copying and assigning of this class
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@273
PS17, Line 273: }
              :
> nit: could these be joined, adding a comment on what's the intention of thi
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@434
PS17, Line 434: on_callback);
> Why to pass this as a parameter instead of having it as a constant for SLRU
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@441
PS17, Line 441:
              :  private:
> What's the reason of wrapping instances of SLRUCacheShard into std::unique_
Good point, removed the use of unique pointers here.


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@460
PS17, Line 460:                                        
probationary_shard_(SLRUCacheShard(mem_tracker,
              : 
> This isn't needed.
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@471
PS17, Line 471: //
> What is the purpose of this extra scope?
Removed it


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@490
PS17, Line 490: seg
> nit: delete 'and'
Done


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@501
PS17, Line 501:   C
> What is the purpose of this extra scope?
Removed it


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@543
PS17, Line 543:   } else {
              :     protected_shard_.Release(handle);
              :   }
              : }
              :
              : void SLRUCacheShardPair::Erase(const Slice& key, uint32_t hash) 
{
              :
> What guards against the following scenarios:
I don't think having the atomic bool protects us from the scenarios you 
described.

The only shard agnostic behavior happens when the lookup handle in thread A is 
the last reference to the entry. Since in both scenarios you described thread B 
calls Lookup() thus adding another reference to the entry, both the contents 
and the metrics of the SLRU cache will stay consistent.


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@552
PS17, Line 552: }
> What is the purpose of this extra scope?
Removed it


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@561
PS17, Line 561: boo
> What is the purpose of this extra scope?
Removed it


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@581
PS17, Line 581:
> nit:
I'm not sure I understand the proposed suggestion here. A clarification would 
be great, thanks!


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@601
PS17, Line 601:     }
              :     shards_.clear();
              :   }
              :
              :   UniquePendingHandle Allocate(Slice key, int val_len, int 
charge) override {
              :     int key_len = key.size();
              :     D
> Is this really needed?  Wouldn't the contained unique_ptr be destroyed auto
The mem tracker doesn't get properly updated unless we explicitly clear each 
pointer to shard within the vector.


http://gerrit.cloudera.org:8080/#/c/20607/17/src/kudu/util/slru_cache.cc@694
PS17, Line 694:   }
              :
> nit: could these lines be joined?
Done



--
To view, visit http://gerrit.cloudera.org:8080/20607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45531534a2049dd38c002f4dc7982df9fd46e0bb
Gerrit-Change-Number: 20607
Gerrit-PatchSet: 18
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 16 May 2024 21:17:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-613: Introduce SLRU cache

Reply via email to