[ https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367473#comment-16367473 ]
Grant Henke commented on KUDU-639: ---------------------------------- Should we close this and open a Jira to track adding an integration test? > Leader doesn't overwrite demoted follower's log properly > -------------------------------------------------------- > > Key: KUDU-639 > URL: https://issues.apache.org/jira/browse/KUDU-639 > Project: Kudu > Issue Type: Bug > Components: consensus > Affects Versions: M4.5 > Reporter: David Alves > Assignee: Todd Lipcon > Priority: Minor > > We just ran into this situation in the YCSB cluster, which is apparently a > log divergence. > We have nodes a, b, c (corresponding to nodes > 33c8fb1dc4434df0938ccc27ecfd58a1/a1219, > 4ed2e09f80e04d198edeb53e15b3539e/a1220, > ab8ed89f9041495a95b8d2b77591c9d7/a1215). > Node a is leader for term 3, timesout > Node b is elected leader for term 5 with votes from b, c > When b is elected leader the log state is: > State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 3.6546, Current term: 5 > b never actually replicates anything and eventually loses leadership to node > a, again. > When b loses leadership it's wall is at the following state: > State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 5.6547, Current term: 5 > That is b appended a message in term 5 but never actually got to commit it. > However, if we look at b's log we find a message in term 5 committed: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > 5.6547@99789 REPLICATE CHANGE_CONFIG_OP > COMMIT 3.6535 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6545 > COMMIT 3.6546 > COMMIT 3.6544 > COMMIT 3.6539 > COMMIT 5.6547 > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > And more problematically, that diverges from the other two nodes's logs: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6535 > COMMIT 3.6539 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6544 > 3.6547@99429 REPLICATE WRITE_OP > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > 6.6550@99878 REPLICATE WRITE_OP > 6.6551@99879 REPLICATE WRITE_OP > 6.6552@99880 REPLICATE WRITE_OP > COMMIT 3.6545 > COMMIT 3.6548 > COMMIT 3.6547 > COMMIT 3.6546 > COMMIT 6.6549 -- This message was sent by Atlassian JIRA (v7.6.3#76005)