Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22132 )
Change subject: KUDU-613: Don't modify refs when moving entries ...................................................................... Patch Set 5: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc File src/kudu/util/slru_cache.cc: http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc@346 PS4, Line 346: > This is an interesting idea. Instead of having the loop on the ShardPair le Yes, certainly -- that's an idea for a follow-up patch. I'm not suggesting to address that in this patch, no. As for the extra complexity -- I was talking about some logic that would chop off elements in the input container when it's clean they aren't fitting into the cache due to the usage vs capacity, that's it :) http://gerrit.cloudera.org:8080/#/c/22132/4/src/kudu/util/slru_cache.cc@482 PS4, Line 482: void SLRUCacheShardPair::Release(Handle* handle) { : SLRUHandle* e = reinterpret_cast<SLRUHandle*>(handle); : : // Release from either the probationary or the protected shard. : if (!e->in_protected_segment.load(std::memory_order_relaxed)) { : > IIUC, your question has more to do with which segment the handle is being r Maybe, in a follow-up patch you could add a comment here to describe that it's guaranteed that an entry is freed and removed from the cache regardless of the 'in_protected_segment' concurrency -- that would help at least to understand that the only inconsistency here might be in updating metrics, so it's not a major issue and it doesn't lead to leaking memory and cache handles. As for ways to deal with the race condition here, I don't have a recipe yet. Probably, we might consider something like CAS approach by keeping two 32-bit counters in an atomic 64 bit integer. Now, here we could do compare_exchange with incrementing both counters by one, and proceed with updating the metrics only in case when we knew that were us who updated the counter from the value read prior atomic update. The difference between two 32-bit halves of the 64-bit number would give us an idea whether it was in protected or in the probational segment -- all those settings would increment the first or the second part of the 64-bit number by two. We'd start with {0, 1} that means the element's in the probational segment (when it went into the protected segment, it'd become {2, 1}). Also, I suspect that this SLRUCacheShardPair::Release() method might be called concurrently with the cache handle migrating from protected to the probational segment only, but not vice versa. If we can confirm that, that should be enough to come up with a simpler way of dealing with this race condition, I guess. -- To view, visit http://gerrit.cloudera.org:8080/22132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I643907612d43eba2c5f8dbc19d2f74f88bbca869 Gerrit-Change-Number: 22132 Gerrit-PatchSet: 5 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Fri, 06 Dec 2024 08:34:17 +0000 Gerrit-HasComments: Yes
