[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Reviewed-on: http://gerrit.cloudera.org:8080/14061 Reviewed-by: Adar Dembo Tested-by: Adar Dembo --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 177 insertions(+), 80 deletions(-) Approvals: Adar Dembo: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 11 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 10: Verified+1 Code-Review+2 Overriding Jenkins, failure seems unrelated to this fix. -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 10 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Wed, 21 Aug 2019 17:59:08 + Gerrit-HasComments: No
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has removed a vote on this change. Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 10 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239 PS9, Line 239: > Nit: add a using for this. Done -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 10 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Wed, 21 Aug 2019 05:19:10 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#10). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 177 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/10 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 10 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239 PS9, Line 239: std:: Nit: add a using for this. -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 9 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Tue, 20 Aug 2019 16:30:47 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969 PS4, Line 969: // Otherwise, test deleting data on MRS. : int32_t write_interval_millis = 0 > Why are these necessary? Please doc. Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139 PS4, Line 3139: WriteRequestPB w_req; > Nit: did you mean to add this empty line at the end? a slip of a pen http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h File src/kudu/integration-tests/test_workload.h: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152 PS4, Line 152: enum WritePattern { > Nit: "Insert random rows, then delete them." Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140 PS4, Line 140: while (should_run_.Load()) { > Since you have to pass 'this', how about you make this a named member funct Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168 PS4, Line 168: // Note: overridi > Nit: the previous style ('KuduPartialRow*' ) is more correct. Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237 PS4, Line 237: : size_t TestWorkload::GetNumberOfErrors(kudu::client::KuduSession* session) { > Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread do Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748 PS4, Line 748: // Merge the deleted row count of the old DMS to the RowSetMetadata : // and reset deleted_row_count_ should be atomic, so we lock the : // component_lock_ in exclusive mode. : std::lock_guard loc > These comments just duplicate the actual code. It would be more interesting Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h File src/kudu/tablet/rowset_metadata.h: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120 PS4, Line 120: // Atomically commit the new redo delta block to RowSetMetadata. > Could you doc this function? Done http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc File src/kudu/tablet/rowset_metadata.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291 PS4, Line 291: std::lock_guard l(lock_); > Additional locks to old tables that do not support live row > counting. Emm, I think our newly created table will turn this on by default, so this should be acceptable. -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 9 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Tue, 20 Aug 2019 11:11:36 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#9). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 171 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/9 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 9 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#8). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 171 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/8 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 8 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#7). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 172 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/7 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 7 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#6). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 170 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/6 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 6 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#5). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 170 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/5 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 5 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969 PS4, Line 969: workload.set_write_pattern(TestWorkload::INSERT_RANDOM_ROWS_WITH_DELETE); : workload.set_write_batch_size(3); Why are these necessary? Please doc. http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139 PS4, Line 3139: Nit: did you mean to add this empty line at the end? http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h File src/kudu/integration-tests/test_workload.h: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152 PS4, Line 152: // Insert random row keys and delete them. Nit: "Insert random rows, then delete them." http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140 PS4, Line 140: auto check_error_func = [](KuduSession* session, TestWorkload* workload) { Since you have to pass 'this', how about you make this a named member function instead? http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168 PS4, Line 168: KuduPartialRow *row Nit: the previous style ('KuduPartialRow*' ) is more correct. http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237 PS4, Line 237: int64_t expected_row_count = : write_pattern_ == INSERT_RANDOM_ROWS_WITH_DELETE ? 0 : rows_inserted_.Load(); Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread doesn't really verify anything except that a scan works. http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746 PS2, Line 746: > Yeah, very good case!! L746 updates the in-memory tablet superblock to reflect the DMS flush. L747 atomically writes that new tablet superblock to disk. If the tserver crashes in between, the changes made by L746 are lost. But that's OK; the post-crash state is correct as long as it reflects either the entirety of the DMS flush, or none of it. http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748 PS4, Line 748: // Lock the component_lock_ in exclusive mode. : std::lock_guard lock(component_lock_); : // Merge the deleted row count of the old DMS to the RowSetMetadata : // and reset deleted_row_count_. These comments just duplicate the actual code. It would be more interesting to instead describe _why_ this is necessary. i.e. why do we need to take component_lock_ in exclusive mode before doing this? http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h File src/kudu/tablet/rowset_metadata.h: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120 PS4, Line 120: Status CommitRedoDeltaDataBlock(int64_t dms_id, Could you doc this function? -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 4 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Mon, 19 Aug 2019 19:19:38 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746 PS2, Line 746: > Is it problematic that the live row count isn't changed atomically as part Yeah, very good case!! And I'm curious about the result: what will happen when the tserver crashes between DMS flushing thread calls CommitRedoDeltaDataBlock and flush the tablet metadata? Meaning, crash between Line746 and Line747. Thanks in advance. http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc File src/kudu/tablet/rowset_metadata.cc: http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291 PS4, Line 291: std::lock_guard l(lock_); Additional locks to old tables that do not support live row counting. -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 4 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Mon, 19 Aug 2019 11:31:23 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 4: I implemented an integration test case that simulated this scenario with high-frequency flush, tablet server crashes, and heartbeats(count live rows). -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 4 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Thu, 15 Aug 2019 15:00:35 + Gerrit-HasComments: No
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#4). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 145 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/4 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 4 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#3). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1| T2 |-- |-- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS| |Call RSMD::IncrementLiveRows | |--> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 148 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/3 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 3 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Yao Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38 PS2, Line 38: This is because there is DeltaTracker lack of lock protection when modify : the number of live rows in rowset_metadata_ and reset the deleted_row_count_. : This caused deleted_row_count_ to be duplicated when calculating the number : of live rows of DRS. > It took me a while to understand the race. Below is a thread interleaving t Well, it's really clear. Got a new skill !! http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746 PS2, Line 746: RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id)); > Is it problematic that the live row count isn't changed atomically as part Mmm, this is a real problem, and I'll try to add some test cases to my integration tests. Great Thanks :) -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 2 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu Gerrit-Reviewer: helifu Gerrit-Comment-Date: Thu, 15 Aug 2019 03:44:06 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38 PS2, Line 38: This is because there is DeltaTracker lack of lock protection when modify : the number of live rows in rowset_metadata_ and reset the deleted_row_count_. : This caused deleted_row_count_ to be duplicated when calculating the number : of live rows of DRS. It took me a while to understand the race. Below is a thread interleaving that I sketched which describes it: T1 T2 --- --- + In DT::Flush Take compact_flush_lock_ (excl) Take component_lock_ (excl) deleted_row_count_ = ... Release component_lock_ + In DT::FlushDMS Call RSMD::IncrementLiveRows --> RSMD::live_row_count now includes deleted_row_count_ + In DRS::CountLiveRows Take component_lock_ (shared) Call RSMD::live_row_count - DT::CountDeletedRows --> RSMD::live_row_count - deleted_row_count_ --> we double counted deleted_row_count_ !!! Take component_lock_ (excl) deleted_row_count_ = 0 Release component_lock_ Maybe you can clean this up and add it to the commit message? http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746 PS2, Line 746: RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id)); Is it problematic that the live row count isn't changed atomically as part of this operation? Consider the following sequence: 1. One MM thread is in the middle of flushing a DMS, and makes it as far as IncrementLiveRows before a context switch. Meaning, the RSMD's live row count has changed, but neither the RSMD's last durable DMS ID nor its list of redo block IDs has been updated. 2. Meanwhile, another MM thread does a compaction, flushing the tablet metadata (and all RSMDs), including the updated live row count from #1. 3. The server crashes before the DMS flushing thread can call CommitRedoDeltaDataBlock and flush the tablet metadata. 4. On restart, the tablet is bootstrapped. Its metadata includes a live row count that reflects the DMS flush, but it's as if the DMS flush "never happened" because a) the new redo block ID wasn't committed, and b) the rowset's last durable DMS ID wasn't updated. As I see it, it seems like all of the RSMD changes in the DMS flush need to happen atomically; otherwise the live row count may not reflect reality after a crash. Do you think this is a real problem? If so, could we prove it in a unit test in some way? -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 2 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: helifu Gerrit-Comment-Date: Wed, 14 Aug 2019 17:23:26 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14061 ) Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. Patch Set 2: Code-Review+1 (1 comment) There is a gap between updating "rowset_metadata_" and resetting the "deleted_row_count_" indeed. Yeah, it is a bug! Thanks to Yao Xu. http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@786 PS2, Line 786: rowset_metadata_->IncrementLiveRows(-old_dms->deleted_row_count()); It seems to be the only way although it looks a bit obtrusive. -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 2 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: helifu Gerrit-Comment-Date: Wed, 14 Aug 2019 07:59:36 + Gerrit-HasComments: Yes
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Hello Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#2). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/mt-tablet-test.cc 3 files changed, 15 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/2 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 2 Gerrit-Owner: Yao Xu Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: helifu
[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
Yao Xu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14061 Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows .. [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 156571 (unix time) try "date -d @156571" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/mt-tablet-test.cc 3 files changed, 14 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/1 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 1 Gerrit-Owner: Yao Xu