Hello Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13264
to look at the new patch set (#2).
Change subject: KUDU-2807 Crash when flush or compaction overlaps with another
compaction
......................................................................
KUDU-2807 Crash when flush or compaction overlaps with another compaction
Commit d3684a7b2add8f06b7189adb9ce9222b8ae1eff5 introduced a metric for
average rowset height. Computing this requires examining the rowsets in
the rowset tree and briefly taking each one's `compact_flush_lock_`.
However, any time a thread takes the `compact_flush_lock_` of a rowset,
it must hold the `compact_select_lock_` of the tablet that rowset
belongs to. This was not happening in two of the three places where the
average height is computed:
1. When opening the tablet.
2. When updating the rowset tree during a flush or compaction.
The first case is benign (as far as I know). The second case could cause
a crash like
F0429 07:26:56.918041 34043 tablet.cc:2268] Check failed: lock.owns_lock()
RowSet(24130) unable to lock compact_flush_lock
MM ops enforced the invariant above by try-locking the
`compact_flush_lock_` and checking that they obtained the lock, while
holding the `compact_select_lock_`. So, if a MM op try-locked a rowset
at the same time as another MM op was holding its `compact_flush_lock_`,
the above crash would result.
This patch fixes the crash by ensuring that the `compact_select_lock_`
is held whenever `ComputeCdfAndCheckOrdered`, which computes the average
rowset height, is called. I also made a small modification to the scope
of a `component_lock_` to avoid having to define a lock order for
`component_lock_` and `compact_select_lock_`.
Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99
---
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
3 files changed, 33 insertions(+), 17 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/13264/2
--
To view, visit http://gerrit.cloudera.org:8080/13264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99
Gerrit-Change-Number: 13264
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <[email protected]>