Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22132 )

Change subject: KUDU-613: Don't modify refs when moving entries
......................................................................


Patch Set 5: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc
File src/kudu/util/slru_cache.cc:

http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc@346
PS4, Line 346:
> This is an interesting idea. Instead of having the loop on the ShardPair le
Yes, certainly -- that's an idea for a follow-up patch.  I'm not suggesting to 
address that in this patch, no.

As for the extra complexity -- I was talking about some logic that would chop 
off elements in the input container when it's clean they aren't fitting into 
the cache due to the usage vs capacity, that's it :)


http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc@482
PS4, Line 482: void SLRUCacheShardPair::Release(Handle* handle) {
             :   SLRUHandle* e = reinterpret_cast<SLRUHandle*>(handle);
             :
             :   // Release from either the probationary or the protected shard.
             :   if (!e->in_protected_segment.load(std::memory_order_relaxed)) {
             :
> IIUC, your question has more to do with which segment the handle is being r
Maybe, in a follow-up patch you could add a comment here to describe that it's 
guaranteed that an entry is freed and removed from the cache regardless of the 
'in_protected_segment' concurrency -- that would help at least to understand 
that the only inconsistency here might be in updating metrics, so it's not a 
major issue and it doesn't lead to leaking memory and cache handles.

As for ways to deal with the race condition here, I don't have a recipe yet.  
Probably, we might consider something like CAS approach by keeping two 32-bit 
counters in an atomic 64 bit integer.  Now, here we could do compare_exchange 
with incrementing both counters by one, and proceed with updating the metrics 
only in case when we knew that were us who updated the counter from the value 
read prior atomic update.  The difference between two 32-bit halves of the 
64-bit number would give us an idea whether it was in protected or in the 
probational segment -- all those settings would increment the first or the 
second part of the 64-bit number by two.  We'd start with {0, 1} that means the 
element's in the probational segment (when it went into the protected segment, 
it'd become {2, 1}).

Also, I suspect that this SLRUCacheShardPair::Release() method might be called 
concurrently with the cache handle migrating from protected to the probational 
segment only, but not vice versa.  If we can confirm that, that should be 
enough to come up with a simpler way of dealing with this race condition, I 
guess.



--
To view, visit http://gerrit.cloudera.org:8080/22132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I643907612d43eba2c5f8dbc19d2f74f88bbca869
Gerrit-Change-Number: 22132
Gerrit-PatchSet: 5
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Fri, 06 Dec 2024 08:34:17 +0000
Gerrit-HasComments: Yes

Reply via email to