David Ribeiro Alves has posted comments on this change.

Change subject: [catalog_manager] categorization of rw operation failures
......................................................................


Patch Set 24:

I chased the bug. It's the following. Say node A is at term 10 and is leader 
current TSK seq no is 0.
1 - Starts CatalogManagerBgTasks::Run(), which runs since its leader, but takes 
a while to actually get to the part TryGenerateNewTskUnlocked() is called.
2 ,- In the meanwhile A loses leadership, B takes over and generates TSK 1, 
later TSK 2.
3 - B loses leadership, A wins it again.
4 - Before A gets a chance to run the "leader election callback" the bg task 
from 1 completes (it can because it's leader again). The TSK that gets written 
is 1, breaking monotonicity.

Note that this is a very contrived scenario that needs leadership interleaving 
that is likely unrealistic when TSK's last days.

-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 24
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: No

Reply via email to