iamaleksey commented on PR #4173: URL: https://github.com/apache/cassandra/pull/4173#issuecomment-3069678346
The single partition read path looks valid to me. Some of my notes on it, for future reference and follow up PRs: 1. With the simplified reconciliation logic, the `TrackedLocalReadCoordinator` will only be getting a single event upon the end of reconciliation: with all the potentially relevant mutation ids from other replicas, missing from the data-reading instance. With that logic, the single partition case here can be significantly simplified. We take the initial summary, we make the storage iterator, we take the secondary summary to capture any local races. We wait for reconciliation to finish and grab the augmenting mutation ids that we need to apply (with the mutations themselves already in our log). Now, from the delta between secondary summary and initial summary (most often null), and the augmenting mutation ids, we can build the augmenting `SimpleBTreePartition` all in one go, and concat the two iterators. No intermediate state and steps necessary, with simpler logic and GC-friendlier behaviour. 2. In the current PR implementation, we unconditionally "freeze" the memtable partition and sstable references of the initial single partition read. This simplifies the implementation, so for now I think this is the behaviour to keep. What we discussed initially, though, and maybe something we should implement in the future, is to only do this for the SRP path. Do not hold on to the immutable memtable partitions and allow them to be GCd when heavy overwrites occur. Assume by default that SRP will be rare. And only if we detect the possibility of a short read after reconciliation, read with this partition-freezing behaviour as a retry read. Either have a flag that allows to switch defaults, or make the decision based on metrics, but long-term, I think, we should probably default to cheaper, optimistic protocol as the initial read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org