[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
still available
T2: sees that some of the selected rowsets would not be returned because
they are missing from the currently available rowsets (DFATALs and
aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Reviewed-on: http://gerrit.cloudera.org:8080/8859
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 170 insertions(+), 77 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 10 Jan 2018 01:58:37 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 6:

IWYU and clock sync issues


--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 05 Jan 2018 00:38:42 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#7).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
still available
T2: sees that some of the selected rowsets would not be returned because
they are missing from the currently available rowsets (DFATALs and
aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 170 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/7
--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626
PS5, Line 1626:   //  rs->set_has_been_compacted();
> oops
Done



--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:15:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#6).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
still available
T2: sees that some of the selected rowsets would not be returned because
they are missing from the currently available rowsets (DFATALs and
aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 168 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/6
--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626
PS5, Line 1626:   //  rs->set_has_been_compacted();
oops



--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:12:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#5).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
still available
T2: sees that some of the selected rowsets would not be returned because
they are missing from the currently available rowsets (DFATALs and
aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 168 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/5
--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@435
PS4, Line 435:   virtual Status DebugDump(std::vector *lines = 
NULL) override;
> warning: default arguments on virtual or override methods are prohibited [g
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478
PS4, Line 478:   std::atomic has_been_compacted_;
> add doc
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@293
PS4, Line 293:   // TODO: unit test me
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@350
PS4, Line 350:   virtual Status DebugDump(std::vector *lines = 
NULL) override;
> warning: default arguments on virtual or override methods are prohibited [g
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@376
PS4, Line 376:   double 
DeltaStoresCompactionPerfImprovementScore(DeltaCompactionType type) const 
override {
> warning: parameter 'type' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441
PS4, Line 441:   std::atomic has_been_compacted_;
> doc
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237
PS4, Line 237: rowset
> i think better to be explicit and say that it has been removed from the Row
Done



--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:11:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 4:

(3 comments)

looks like all the tests failed :)

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478
PS4, Line 478:   std::atomic has_been_compacted_;
add doc


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441
PS4, Line 441:   std::atomic has_been_compacted_;
doc


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237
PS4, Line 237: rowset
i think better to be explicit and say that it has been removed from the 
RowSetTree



--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Jan 2018 21:23:18 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@19
PS3, Line 19: rowsets and the verification, the component_lock_ is dropped, 
allowing
: the following race:
:
> The race wasn't immediately clear to me from the commit message but I looke
Yep, the biggest piece of this is that the component_lock_ is dropped while 
running compaction selection policy.

T1: makes copy {A,B} and drops component_lock_
T2: makes copy {A,B} and drops component_lock_

T1: considers {A,B}, and selects them for compaction, taking their 
compact_flush_lock_
T1: finishes compaction, removes rowsets from tree, drops {A,B}'s 
compact_flush_lock_

T2: considers {A,B}, and selects them for compaction, taking their 
compact_flush_lock_
T2: can't proceed because it can't find them in {A,B} in the rowset tree


http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@24
PS3, Line 24: T2:   snapshots the rowset tree (which still contains {A, B})
> Ignoring and proceeding could have the side effect of running a fairly usel
I'm not super convinced (a) would be able to avoid the race without holding the 
component lock while selecting, so I'm going with (b).



--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Jan 2018 19:00:21 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

2018-01-04 Thread Andrew Wong (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#4).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
..

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
still available
T2: sees that some of the selected rowsets would not be returned because
they are missing from the currently available rowsets (DFATALs and
aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
6 files changed, 155 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/4
--
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon