[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. KUDU-2115: avoid compacting already-compacted rowsets Tablets will perform compaction selection on a copy of the currently available rowsets in order to avoid holding the component_lock_ for the duration of rowset selection. It is then verified that nothing else compacted the selected rowsets by iterating over the selected rowsets and checking that they still exist to be compacted. However, the initial selection of rowsets is performed on a snapshotted copy of the available rowsets, and in between the snapshot of the rowsets and the verification, the component_lock_ is dropped, allowing the following race: T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction T2: runs PickRowSetsToCompact: T2: snapshots the rowset tree (which still contains {A, B}) T2: drops component_lock_ after taking the snapshot T1: finishes compaction and removes {A, B} from the rowset tree, unlocking each rowset's compact_flush_lock_ T2: iterates over the rowset tree and sees {A, B} as IsAvailableForCompaction(), since they have been unlocked T2: selects {A, B} as a part of its compaction T2: grabs the component_lock_ to verify that the selected rowsets are still available T2: sees that some of the selected rowsets would not be returned because they are missing from the currently available rowsets (DFATALs and aborts the compaction) I verified the diagnosis by placing a random sleep just after making the copy in PickRowSetsToCompact() and seeing TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many compactions, fail consistently with the failed verification. Upon making the fix, this passes. This can lead to scheduling a fairly useless compaction. This patch adds an API to RowSets to mark itself as compacted, and ensure that when selecting rowsets to compact, check that rowset candidates have not already been compacted. To verify the solution, I've added a small test that schedules several concurrent compactions. Upon adding the aforementioned randomized delay and removing a couple sanity D/CHECKs (including the above verification), I saw the test fail, leading to an unexpected number of rows left in the tablet. Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Reviewed-on: http://gerrit.cloudera.org:8080/8859 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset-test.cc M src/kudu/tablet/memrowset.cc M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc 9 files changed, 170 insertions(+), 77 deletions(-) Approvals: Kudu Jenkins: Verified Todd Lipcon: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 10 Jan 2018 01:58:37 + Gerrit-HasComments: No
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 6: IWYU and clock sync issues -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 05 Jan 2018 00:38:42 + Gerrit-HasComments: No
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8859 to look at the new patch set (#7). Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. KUDU-2115: avoid compacting already-compacted rowsets Tablets will perform compaction selection on a copy of the currently available rowsets in order to avoid holding the component_lock_ for the duration of rowset selection. It is then verified that nothing else compacted the selected rowsets by iterating over the selected rowsets and checking that they still exist to be compacted. However, the initial selection of rowsets is performed on a snapshotted copy of the available rowsets, and in between the snapshot of the rowsets and the verification, the component_lock_ is dropped, allowing the following race: T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction T2: runs PickRowSetsToCompact: T2: snapshots the rowset tree (which still contains {A, B}) T2: drops component_lock_ after taking the snapshot T1: finishes compaction and removes {A, B} from the rowset tree, unlocking each rowset's compact_flush_lock_ T2: iterates over the rowset tree and sees {A, B} as IsAvailableForCompaction(), since they have been unlocked T2: selects {A, B} as a part of its compaction T2: grabs the component_lock_ to verify that the selected rowsets are still available T2: sees that some of the selected rowsets would not be returned because they are missing from the currently available rowsets (DFATALs and aborts the compaction) I verified the diagnosis by placing a random sleep just after making the copy in PickRowSetsToCompact() and seeing TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many compactions, fail consistently with the failed verification. Upon making the fix, this passes. This can lead to scheduling a fairly useless compaction. This patch adds an API to RowSets to mark itself as compacted, and ensure that when selecting rowsets to compact, check that rowset candidates have not already been compacted. To verify the solution, I've added a small test that schedules several concurrent compactions. Upon adding the aforementioned randomized delay and removing a couple sanity D/CHECKs (including the above verification), I saw the test fail, leading to an unexpected number of rows left in the tablet. Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 --- M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset-test.cc M src/kudu/tablet/memrowset.cc M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc 9 files changed, 170 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/7 -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626 PS5, Line 1626: // rs->set_has_been_compacted(); > oops Done -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Jan 2018 23:15:37 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8859 to look at the new patch set (#6). Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. KUDU-2115: avoid compacting already-compacted rowsets Tablets will perform compaction selection on a copy of the currently available rowsets in order to avoid holding the component_lock_ for the duration of rowset selection. It is then verified that nothing else compacted the selected rowsets by iterating over the selected rowsets and checking that they still exist to be compacted. However, the initial selection of rowsets is performed on a snapshotted copy of the available rowsets, and in between the snapshot of the rowsets and the verification, the component_lock_ is dropped, allowing the following race: T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction T2: runs PickRowSetsToCompact: T2: snapshots the rowset tree (which still contains {A, B}) T2: drops component_lock_ after taking the snapshot T1: finishes compaction and removes {A, B} from the rowset tree, unlocking each rowset's compact_flush_lock_ T2: iterates over the rowset tree and sees {A, B} as IsAvailableForCompaction(), since they have been unlocked T2: selects {A, B} as a part of its compaction T2: grabs the component_lock_ to verify that the selected rowsets are still available T2: sees that some of the selected rowsets would not be returned because they are missing from the currently available rowsets (DFATALs and aborts the compaction) I verified the diagnosis by placing a random sleep just after making the copy in PickRowSetsToCompact() and seeing TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many compactions, fail consistently with the failed verification. Upon making the fix, this passes. This can lead to scheduling a fairly useless compaction. This patch adds an API to RowSets to mark itself as compacted, and ensure that when selecting rowsets to compact, check that rowset candidates have not already been compacted. To verify the solution, I've added a small test that schedules several concurrent compactions. Upon adding the aforementioned randomized delay and removing a couple sanity D/CHECKs (including the above verification), I saw the test fail, leading to an unexpected number of rows left in the tablet. Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 --- M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset-test.cc M src/kudu/tablet/memrowset.cc M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc 9 files changed, 168 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/6 -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626 PS5, Line 1626: // rs->set_has_been_compacted(); oops -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Jan 2018 23:12:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8859 to look at the new patch set (#5). Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. KUDU-2115: avoid compacting already-compacted rowsets Tablets will perform compaction selection on a copy of the currently available rowsets in order to avoid holding the component_lock_ for the duration of rowset selection. It is then verified that nothing else compacted the selected rowsets by iterating over the selected rowsets and checking that they still exist to be compacted. However, the initial selection of rowsets is performed on a snapshotted copy of the available rowsets, and in between the snapshot of the rowsets and the verification, the component_lock_ is dropped, allowing the following race: T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction T2: runs PickRowSetsToCompact: T2: snapshots the rowset tree (which still contains {A, B}) T2: drops component_lock_ after taking the snapshot T1: finishes compaction and removes {A, B} from the rowset tree, unlocking each rowset's compact_flush_lock_ T2: iterates over the rowset tree and sees {A, B} as IsAvailableForCompaction(), since they have been unlocked T2: selects {A, B} as a part of its compaction T2: grabs the component_lock_ to verify that the selected rowsets are still available T2: sees that some of the selected rowsets would not be returned because they are missing from the currently available rowsets (DFATALs and aborts the compaction) I verified the diagnosis by placing a random sleep just after making the copy in PickRowSetsToCompact() and seeing TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many compactions, fail consistently with the failed verification. Upon making the fix, this passes. This can lead to scheduling a fairly useless compaction. This patch adds an API to RowSets to mark itself as compacted, and ensure that when selecting rowsets to compact, check that rowset candidates have not already been compacted. To verify the solution, I've added a small test that schedules several concurrent compactions. Upon adding the aforementioned randomized delay and removing a couple sanity D/CHECKs (including the above verification), I saw the test fail, leading to an unexpected number of rows left in the tablet. Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 --- M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset-test.cc M src/kudu/tablet/memrowset.cc M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc 9 files changed, 168 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/5 -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h File src/kudu/tablet/diskrowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@435 PS4, Line 435: virtual Status DebugDump(std::vector *lines = NULL) override; > warning: default arguments on virtual or override methods are prohibited [g Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478 PS4, Line 478: std::atomic has_been_compacted_; > add doc Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h File src/kudu/tablet/memrowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@293 PS4, Line 293: // TODO: unit test me > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@350 PS4, Line 350: virtual Status DebugDump(std::vector *lines = NULL) override; > warning: default arguments on virtual or override methods are prohibited [g Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@376 PS4, Line 376: double DeltaStoresCompactionPerfImprovementScore(DeltaCompactionType type) const override { > warning: parameter 'type' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441 PS4, Line 441: std::atomic has_been_compacted_; > doc Done http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h File src/kudu/tablet/rowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237 PS4, Line 237: rowset > i think better to be explicit and say that it has been removed from the Row Done -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Jan 2018 23:11:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 4: (3 comments) looks like all the tests failed :) http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h File src/kudu/tablet/diskrowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478 PS4, Line 478: std::atomic has_been_compacted_; add doc http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h File src/kudu/tablet/memrowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441 PS4, Line 441: std::atomic has_been_compacted_; doc http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h File src/kudu/tablet/rowset.h: http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237 PS4, Line 237: rowset i think better to be explicit and say that it has been removed from the RowSetTree -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Jan 2018 21:23:18 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 ) Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@19 PS3, Line 19: rowsets and the verification, the component_lock_ is dropped, allowing : the following race: : > The race wasn't immediately clear to me from the commit message but I looke Yep, the biggest piece of this is that the component_lock_ is dropped while running compaction selection policy. T1: makes copy {A,B} and drops component_lock_ T2: makes copy {A,B} and drops component_lock_ T1: considers {A,B}, and selects them for compaction, taking their compact_flush_lock_ T1: finishes compaction, removes rowsets from tree, drops {A,B}'s compact_flush_lock_ T2: considers {A,B}, and selects them for compaction, taking their compact_flush_lock_ T2: can't proceed because it can't find them in {A,B} in the rowset tree http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@24 PS3, Line 24: T2: snapshots the rowset tree (which still contains {A, B}) > Ignoring and proceeding could have the side effect of running a fairly usel I'm not super convinced (a) would be able to avoid the race without holding the component lock while selecting, so I'm going with (b). -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Jan 2018 19:00:21 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8859 to look at the new patch set (#4). Change subject: KUDU-2115: avoid compacting already-compacted rowsets .. KUDU-2115: avoid compacting already-compacted rowsets Tablets will perform compaction selection on a copy of the currently available rowsets in order to avoid holding the component_lock_ for the duration of rowset selection. It is then verified that nothing else compacted the selected rowsets by iterating over the selected rowsets and checking that they still exist to be compacted. However, the initial selection of rowsets is performed on a snapshotted copy of the available rowsets, and in between the snapshot of the rowsets and the verification, the component_lock_ is dropped, allowing the following race: T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction T2: runs PickRowSetsToCompact: T2: snapshots the rowset tree (which still contains {A, B}) T2: drops component_lock_ after taking the snapshot T1: finishes compaction and removes {A, B} from the rowset tree, unlocking each rowset's compact_flush_lock_ T2: iterates over the rowset tree and sees {A, B} as IsAvailableForCompaction(), since they have been unlocked T2: selects {A, B} as a part of its compaction T2: grabs the component_lock_ to verify that the selected rowsets are still available T2: sees that some of the selected rowsets would not be returned because they are missing from the currently available rowsets (DFATALs and aborts the compaction) I verified the diagnosis by placing a random sleep just after making the copy in PickRowSetsToCompact() and seeing TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many compactions, fail consistently with the failed verification. Upon making the fix, this passes. This can lead to scheduling a fairly useless compaction. This patch adds an API to RowSets to mark itself as compacted, and ensure that when selecting rowsets to compact, check that rowset candidates have not already been compacted. To verify the solution, I've added a small test that schedules several concurrent compactions. Upon adding the aforementioned randomized delay and removing a couple sanity D/CHECKs (including the above verification), I saw the test fail, leading to an unexpected number of rows left in the tablet. Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 --- M src/kudu/tablet/compaction-test.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc 6 files changed, 155 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/4 -- To view, visit http://gerrit.cloudera.org:8080/8859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236 Gerrit-Change-Number: 8859 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon