Hello Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13264

to look at the new patch set (#3).

Change subject: KUDU-2807 Crash when flush or compaction overlaps with another 
compaction
......................................................................

KUDU-2807 Crash when flush or compaction overlaps with another compaction

Commit d3684a7b2add8f06b7189adb9ce9222b8ae1eff5 introduced a metric for
average rowset height. Computing this requires examining the rowsets in
the rowset tree and briefly taking each one's `compact_flush_lock_`.
However, any time a thread takes the `compact_flush_lock_` of a rowset,
it must hold the `compact_select_lock_` of the tablet that rowset
belongs to. This was not happening in two of the three places where the
average height is computed:

1. When opening the tablet.
2. When updating the rowset tree during a flush or compaction.

The first case is benign (as far as I know). The second case could cause
a crash like

F0429 07:26:56.918041 34043 tablet.cc:2268] Check failed: lock.owns_lock() 
RowSet(24130) unable to lock compact_flush_lock

MM ops enforced the invariant above by try-locking the
`compact_flush_lock_` and checking that they obtained the lock, while
holding the `compact_select_lock_`. So, if a MM op try-locked a rowset
at the same time as another MM op was holding its `compact_flush_lock_`,
the above crash would result.

This patch fixes the crash by ensuring that the `compact_select_lock_`
is held whenever `ComputeCdfAndCheckOrdered`, which computes the average
rowset height, is called. I also made a small modification to the scope
of a `component_lock_` to avoid having to define a lock order for
`component_lock_` and `compact_select_lock_`.

Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99
---
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
3 files changed, 36 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/13264/3
--
To view, visit http://gerrit.cloudera.org:8080/13264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99
Gerrit-Change-Number: 13264
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <[email protected]>

Reply via email to