[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy This fixes a bug in the way we handle tablet copies while replacing existing tombstoned tablets: - a tablet exists in TABLET_DATA_TOMBSTONED state - we begin copying a new replica on top of this one -- this calls TabletMetadata::ReplaceSuperBlock() using the remote superblock (importantly, this remote superblock contains remote block IDs) - we crash mid-copy - on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" the deletion of this tablet. However the block IDs here are the IDs from the remote machine, and we incorrectly delete a bunch of blocks. This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd usually not see this bug. The fix here is relatively simple: rather than writing the remote superblock to disk when starting the copy, we just change the state of the existing superblock to indicate 'copying'. Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Reviewed-on: http://gerrit.cloudera.org:8080/4392 Reviewed-by: Mike PercyTested-by: Todd Lipcon --- M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 165 insertions(+), 65 deletions(-) Approvals: Mike Percy: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Mike Percy has posted comments on this change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Todd Lipcon has posted comments on this change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. Patch Set 3: took care of all the nits -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4392 to look at the new patch set (#3). Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy This fixes a bug in the way we handle tablet copies while replacing existing tombstoned tablets: - a tablet exists in TABLET_DATA_TOMBSTONED state - we begin copying a new replica on top of this one -- this calls TabletMetadata::ReplaceSuperBlock() using the remote superblock (importantly, this remote superblock contains remote block IDs) - we crash mid-copy - on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" the deletion of this tablet. However the block IDs here are the IDs from the remote machine, and we incorrectly delete a bunch of blocks. This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd usually not see this bug. The fix here is relatively simple: rather than writing the remote superblock to disk when starting the copy, we just change the state of the existing superblock to indicate 'copying'. Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 --- M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 165 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/3 -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/3393/ -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/3392/ -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4392 to look at the new patch set (#2). Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy This fixes a bug in the way we handle tablet copies while replacing existing tombstoned tablets: - a tablet exists in TABLET_DATA_TOMBSTONED state - we begin copying a new replica on top of this one -- this calls TabletMetadata::ReplaceSuperBlock() using the remote superblock (importantly, this remote superblock contains remote block IDs) - we crash mid-copy - on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" the deletion of this tablet. However the block IDs here are the IDs from the remote machine, and we incorrectly delete a bunch of blocks. This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd usually not see this bug. The fix here is relatively simple: rather than writing the remote superblock to disk when starting the copy, we just change the state of the existing superblock to indicate 'copying'. Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 --- M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 165 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/2 -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Hello Mike Percy, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/4392 to review the following change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy This fixes a bug in the way we handle tablet copies while replacing existing tombstoned tablets: - a tablet exists in TABLET_DATA_TOMBSTONED state - we begin copying a new replica on top of this one -- this calls TabletMetadata::ReplaceSuperBlock() using the remote superblock (importantly, this remote superblock contains remote block IDs) - we crash mid-copy - on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" - the deletion of this tablet. However the block IDs here are the IDs from - the remote machine, and we incorrectly delete a bunch of blocks. This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd usually not see this bug. The fix here is relatively simple: rather than writing the remote superblock to disk when starting the copy, we just change the state of the existing superblock to indicate 'copying'. Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 --- M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 7 files changed, 165 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/1 -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/3385/ -- To view, visit http://gerrit.cloudera.org:8080/4392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No