Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11759 )
Change subject: KUDU-1678: Race during abort of pending operations during raft shutdown ...................................................................... KUDU-1678: Race during abort of pending operations during raft shutdown When a tablet replica is shutting down, the following race can occur: 0: The replica receives an ALTER_SCHEMA op adding the column 'foo'. 1: The replica receives a WRITE_OP inserting a row with column 'foo' present. 2: The replica starts to abort its pending operations because it is shutting down. The ALTER_SCHEMA is aborted. 3: Before the WRITE_OP can be aborted, it replicates. 4: The tablet server crashes as a result, with a message like: F1023 20:49:18.088703 1409 transaction_driver.cc:382] T 3fe5651e31d8486780d0e480f8748ead P 12e38846d8c648df8a11b9d8da2ad407 S R-NP Ts 6309182492379934720: Cannot cancel transactions that have already replicated: Invalid argument: Client provided column c587 INT32 NOT NULL not present in tablet The tablet server will crash with this same message every time it tries to bootstrap. Other tablet servers hosting replicas may crash if they replicated the bad write. The problem is that transactions need to be aborted in reverse order of their index, since later transactions may depend on earlier ones. This bug reproduced about 1% of the time in alter_table-randomized-test when run in DEBUG mode with 16 stress threads: I saw it 10 times in 1000 runs. With this change, I saw 0 failures in 1000 runs. Change-Id: Idde75bd1fe966a1a3d53aa1e5de6a01a48ff1103 Reviewed-on: http://gerrit.cloudera.org:8080/11759 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Tested-by: Will Berkeley <wdberke...@gmail.com> --- M src/kudu/consensus/pending_rounds.cc M src/kudu/integration-tests/alter_table-randomized-test.cc 2 files changed, 23 insertions(+), 13 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Will Berkeley: Verified -- To view, visit http://gerrit.cloudera.org:8080/11759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idde75bd1fe966a1a3d53aa1e5de6a01a48ff1103 Gerrit-Change-Number: 11759 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>