Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13264 )
Change subject: KUDU-2807 Crash when flush or compaction overlaps with another compaction ...................................................................... KUDU-2807 Crash when flush or compaction overlaps with another compaction Commit d3684a7b2add8f06b7189adb9ce9222b8ae1eff5 introduced a metric for average rowset height. Computing this requires examining the rowsets in the rowset tree and briefly taking each one's `compact_flush_lock_`. However, any time a thread takes the `compact_flush_lock_` of a rowset, it must hold the `compact_select_lock_` of the tablet that rowset belongs to. This was not happening in two of the three places where the average height is computed: 1. When opening the tablet. 2. When updating the rowset tree during a flush or compaction. The first case is benign (as far as I know). The second case could cause a crash like F0429 07:26:56.918041 34043 tablet.cc:2268] Check failed: lock.owns_lock() RowSet(24130) unable to lock compact_flush_lock MM ops enforced the invariant above by try-locking the `compact_flush_lock_` and checking that they obtained the lock, while holding the `compact_select_lock_`. So, if a MM op try-locked a rowset at the same time as another MM op was holding its `compact_flush_lock_`, the above crash would result. This patch fixes the crash by ensuring that the `compact_select_lock_` is held whenever `ComputeCdfAndCheckOrdered`, which computes the average rowset height, is called. I also made a small modification to the scope of a `component_lock_` to avoid having to define a lock order for `component_lock_` and `compact_select_lock_`. Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99 Reviewed-on: http://gerrit.cloudera.org:8080/13264 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Will Berkeley <[email protected]> --- M src/kudu/tablet/rowset_info.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h 3 files changed, 36 insertions(+), 28 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Will Berkeley: Verified -- To view, visit http://gerrit.cloudera.org:8080/13264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic255f0466aa2c158fa32e8e38428eddfcf901b99 Gerrit-Change-Number: 13264 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley <[email protected]>
