iamaleksey commented on PR #4173:
URL: https://github.com/apache/cassandra/pull/4173#issuecomment-3069678346

   The single partition read path looks valid to me. Some of my notes on it, 
for future reference and follow up PRs:
   
   1. With the simplified reconciliation logic, the 
`TrackedLocalReadCoordinator` will only be getting a single event upon the end 
of reconciliation: with all the potentially relevant mutation ids from other 
replicas, missing from the data-reading instance. With that logic, the single 
partition case here can be significantly simplified. We take the initial 
summary, we make the storage iterator, we take the secondary summary to capture 
any local races. We wait for reconciliation to finish and grab the augmenting 
mutation ids that we need to apply (with the mutations themselves already in 
our log). Now, from the delta between secondary summary and initial summary 
(most often null), and the augmenting mutation ids, we can build the augmenting 
`SimpleBTreePartition` all in one go, and concat the two iterators. No 
intermediate state and steps necessary, with simpler logic and GC-friendlier 
behaviour.
   2. In the current PR implementation, we unconditionally "freeze" the 
memtable partition and sstable references of the initial single partition read. 
This simplifies the implementation, so for now I think this is the behaviour to 
keep. What we discussed initially, though, and maybe something we should 
implement in the future, is to only do this for the SRP path. Do not hold on to 
the immutable memtable partitions and allow them to be GCd when heavy 
overwrites occur. Assume by default that SRP will be rare. And only if we 
detect the possibility of a short read after reconciliation, read with this 
partition-freezing behaviour as a retry read. Either have a flag that allows to 
switch defaults, or make the decision based on metrics, but long-term, I think, 
we should probably default to cheaper, optimistic protocol as the initial read.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to