David Ribeiro Alves has posted comments on this change. Change subject: [catalog_manager] categorization of rw operation failures ......................................................................
Patch Set 24: I chased the bug. It's the following. Say node A is at term 10 and is leader current TSK seq no is 0. 1 - Starts CatalogManagerBgTasks::Run(), which runs since its leader, but takes a while to actually get to the part TryGenerateNewTskUnlocked() is called. 2 ,- In the meanwhile A loses leadership, B takes over and generates TSK 1, later TSK 2. 3 - B loses leadership, A wins it again. 4 - Before A gets a chance to run the "leader election callback" the bg task from 1 completes (it can because it's leader again). The TSK that gets written is 1, breaking monotonicity. Note that this is a very contrived scenario that needs leadership interleaving that is likely unrealistic when TSK's last days. -- To view, visit http://gerrit.cloudera.org:8080/6170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: No
