Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14061
to look at the new patch set (#10).
Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
......................................................................
[tablet] Fixed the bug of DeltaTracker::CountDeletedRows
When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.
User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@ 0x49891f
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@ 0x4ae34b boost::_mfi::mf1<>::operator()()
@ 0x4add25 boost::_bi::list2<>::operator()<>()
@ 0x4acfe9 boost::_bi::bind_t<>::operator()()
@ 0x4ac8a6
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone
This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1 | T2
|---------- |----------
|+ In DT::Flush |
| Take compact_flush_lock_ (excl) |
| Take component_lock_ (excl) |
| deleted_row_count_ = ... |
| Release component_lock_ |
| + In DT::FlushDMS |
| Call RSMD::IncrementLiveRows |
| --> RSMD::live_row_count - deleted_row_count_
| |+ In DRS::CountLiveRows
| | Take component_lock_ (shared)
| | Call RSMD::live_row_count -
DT::CountDeletedRows
| | --> RSMD::live_row_count -
deleted_row_count_
| | --> we double counted deleted_row_count_
!!!
| Take component_lock_ (excl) |
| deleted_row_count_ = 0 |
| Release component_lock_ |
| Release compact_flush_lock_ |
Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 177 insertions(+), 80 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/10
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu <[email protected]>
Gerrit-Reviewer: helifu <[email protected]>