belliottsmith commented on PR #3426: URL: https://github.com/apache/cassandra/pull/3426#issuecomment-2316946511
So, I only have one minor concern and that's the unmark stale operation, which simply removes a replica from the stale collection without first checking if the replica actually is stale or not. I'm not sure if this is a problem, and it might be scope creep to introduce the work necessary to _stop_ a replica being stale. But, we should briefly consider what our roadmap is for safely migrating a replica from stale to safely not stale. _Ideally_ the operator would have an unmark stale operation that indicates the other replicas should stop cleaning up state until this replica has caught up (like this achieves), but equally we might like the replica to continue to be marked stale in some way until we know that's happened (e.g. a round of durability scheduling has run successfully on the previously stale replica to confirm it is not stale, as otherwise e.g. a repair might need to be run by the operator). Which raises a question: do we want more than one stale flags, e.g. one for whether a replica is to be included in durability scheduling, and another for whether the replica can be treated as healthy by other replicas for querying and by the operator for availability? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

