[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. [tablet] fix nullptr dereference while capturing iterators Added a check into Tablet::CaptureConsistentIterators() to make sure the tablet is not stopped/shutdown. Before this patch in one test scenario I saw stack traces like below (built in DEBUG configuration): kudu-tserver: src/kudu/gutil/ref_counted.h:284: T *scoped_refptr::operator->() const [T = kudu::tablet::TabletComponents]: Assertion `ptr_ != __null' failed. *** Aborted at 1517534012 (unix time) try "date -d @1517534012" if you are using GNU date *** PC: @ 0x7ff9ad39cc37 gsignal *** SIGABRT (@0x3e8745f) received by PID 29791 (TID 0x7ff99a0bc700) from PID 29791; stack trace: *** @ 0x7ff9b5129330 (unknown) at ??:0 @ 0x7ff9ad39cc37 gsignal at ??:0 @ 0x7ff9ad3a0028 abort at ??:0 @ 0x7ff9ad395bf6 (unknown) at ??:0 @ 0x7ff9ad395ca2 __assert_fail at ??:0 @ 0x7ff9b7f2ce52 scoped_refptr<>::operator->() at ??:0 @ 0x7ff9b7f1bf6d kudu::tablet::Tablet::CaptureConsistentIterators() at ??:0 @ 0x7ff9b7f225f6 kudu::tablet::Tablet::Iterator::Init() at ??:0 @ 0x7ff9b94372e3 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 @ 0x7ff9b943a906 kudu::tserver::TabletServiceImpl::Checksum() at ??:0 @ 0x7ff9b3d3c83d kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_11::operator()() at ??:0 @ 0x7ff9b3d3c682 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7ff9b2ea026b std::function<>::operator()() at ??:0 @ 0x7ff9b2e9fb2d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7ff9b2ea1ee6 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7ff9b2ea4499 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7ff9b2ea4400 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7ff9b2ea43aa boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7ff9b2ea418d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7ff9b2e45f68 boost::function0<>::operator()() at ??:0 @ 0x7ff9b115162d kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff9b5121184 start_thread at ??:0 @ 0x7ff9ad463ffd clone at ??:0 @0x0 (unknown) I used the following WIP stress test for the reproduction scenario: https://gerrit.cloudera.org/#/c/9255/ For DEBUG builds, without fix the issues appeared ~0.5% of cases. After the fix, the issue could not be reproduced: Without fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492521.137030 With fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492937.141401 Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Reviewed-on: http://gerrit.cloudera.org:8080/9189 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy --- M src/kudu/tablet/tablet.cc 1 file changed, 11 insertions(+), 9 deletions(-) Approvals: Kudu Jenkins: Verified Mike Percy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Feb 2018 21:04:59 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9189 to look at the new patch set (#3). Change subject: [tablet] fix nullptr dereference while capturing iterators .. [tablet] fix nullptr dereference while capturing iterators Added a check into Tablet::CaptureConsistentIterators() to make sure the tablet is not stopped/shutdown. Before this patch in one test scenario I saw stack traces like below (built in DEBUG configuration): kudu-tserver: src/kudu/gutil/ref_counted.h:284: T *scoped_refptr::operator->() const [T = kudu::tablet::TabletComponents]: Assertion `ptr_ != __null' failed. *** Aborted at 1517534012 (unix time) try "date -d @1517534012" if you are using GNU date *** PC: @ 0x7ff9ad39cc37 gsignal *** SIGABRT (@0x3e8745f) received by PID 29791 (TID 0x7ff99a0bc700) from PID 29791; stack trace: *** @ 0x7ff9b5129330 (unknown) at ??:0 @ 0x7ff9ad39cc37 gsignal at ??:0 @ 0x7ff9ad3a0028 abort at ??:0 @ 0x7ff9ad395bf6 (unknown) at ??:0 @ 0x7ff9ad395ca2 __assert_fail at ??:0 @ 0x7ff9b7f2ce52 scoped_refptr<>::operator->() at ??:0 @ 0x7ff9b7f1bf6d kudu::tablet::Tablet::CaptureConsistentIterators() at ??:0 @ 0x7ff9b7f225f6 kudu::tablet::Tablet::Iterator::Init() at ??:0 @ 0x7ff9b94372e3 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 @ 0x7ff9b943a906 kudu::tserver::TabletServiceImpl::Checksum() at ??:0 @ 0x7ff9b3d3c83d kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_11::operator()() at ??:0 @ 0x7ff9b3d3c682 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7ff9b2ea026b std::function<>::operator()() at ??:0 @ 0x7ff9b2e9fb2d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7ff9b2ea1ee6 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7ff9b2ea4499 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7ff9b2ea4400 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7ff9b2ea43aa boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7ff9b2ea418d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7ff9b2e45f68 boost::function0<>::operator()() at ??:0 @ 0x7ff9b115162d kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff9b5121184 start_thread at ??:0 @ 0x7ff9ad463ffd clone at ??:0 @0x0 (unknown) I used the following WIP stress test for the reproduction scenario: https://gerrit.cloudera.org/#/c/9255/ For DEBUG builds, without fix the issues appeared ~0.5% of cases. After the fix, the issue could not be reproduced: Without fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492521.137030 With fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492937.141401 Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff --- M src/kudu/tablet/tablet.cc 1 file changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/9189/3 -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 2: > This looks good to me, if we can show before / after failure stats > on your test linked below then I'd be on board with merging this > as-is. Thanks! I added information on the run with and without the fix into the commit message. -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Feb 2018 03:38:21 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9189 to look at the new patch set (#2). Change subject: [tablet] fix nullptr dereference while capturing iterators .. [tablet] fix nullptr dereference while capturing iterators Added a check into Tablet::CaptureConsistentIterators() to make sure the tablet is not stopped/shutdown. Before this patch in one test scenario I saw stack traces like below (built in DEBUG configuration): kudu-tserver: src/kudu/gutil/ref_counted.h:284: T *scoped_refptr::operator->() const [T = kudu::tablet::TabletComponents]: Assertion `ptr_ != __null' failed. *** Aborted at 1517534012 (unix time) try "date -d @1517534012" if you are using GNU date *** PC: @ 0x7ff9ad39cc37 gsignal *** SIGABRT (@0x3e8745f) received by PID 29791 (TID 0x7ff99a0bc700) from PID 29791; stack trace: *** @ 0x7ff9b5129330 (unknown) at ??:0 @ 0x7ff9ad39cc37 gsignal at ??:0 @ 0x7ff9ad3a0028 abort at ??:0 @ 0x7ff9ad395bf6 (unknown) at ??:0 @ 0x7ff9ad395ca2 __assert_fail at ??:0 @ 0x7ff9b7f2ce52 scoped_refptr<>::operator->() at ??:0 @ 0x7ff9b7f1bf6d kudu::tablet::Tablet::CaptureConsistentIterators() at ??:0 @ 0x7ff9b7f225f6 kudu::tablet::Tablet::Iterator::Init() at ??:0 @ 0x7ff9b94372e3 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 @ 0x7ff9b943a906 kudu::tserver::TabletServiceImpl::Checksum() at ??:0 @ 0x7ff9b3d3c83d kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_11::operator()() at ??:0 @ 0x7ff9b3d3c682 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7ff9b2ea026b std::function<>::operator()() at ??:0 @ 0x7ff9b2e9fb2d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7ff9b2ea1ee6 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7ff9b2ea4499 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7ff9b2ea4400 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7ff9b2ea43aa boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7ff9b2ea418d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7ff9b2e45f68 boost::function0<>::operator()() at ??:0 @ 0x7ff9b115162d kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff9b5121184 start_thread at ??:0 @ 0x7ff9ad463ffd clone at ??:0 @0x0 (unknown) The rate of failure was about 0.5%: http://dist-test.cloudera.org/job?job_id=aserbin.1518482380.49973 For DEBUG builds, without fix the issues appeared ~0.5% of cases. After the fix, the issue could not be reproduced: Without fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492521.137030 With fix: http://dist-test.cloudera.org//job?job_id=aserbin.1518492937.141401 Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff --- M src/kudu/tablet/tablet.cc 1 file changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/9189/2 -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 1: Code-Review+1 This looks good to me, if we can show before / after failure stats on your test linked below then I'd be on board with merging this as-is. -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Feb 2018 02:53:27 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 1: > > What test did you see this in? Can we add a new test that > stresses > > creation of scanners while deleting a replica? > > That's one of the variation of the test I used to repro the > flash-cluster-bug. I'll post it as I WIP patch. > > Yes, I think we can add more test which specifically target this > particular issue. I'll add one. Now we have an integration test which triggers this issue: https://gerrit.cloudera.org/#/c/9255/ I think we can go with that, and I can add a dedicated test just for this particular issue a bit later. -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Feb 2018 20:34:01 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 1: > What test did you see this in? Can we add a new test that stresses > creation of scanners while deleting a replica? That's one of the variation of the test I used to repro the flash-cluster-bug. I'll post it as I WIP patch. Yes, I think we can add more test which specifically target this particular issue. I'll add one. -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 02 Feb 2018 07:38:16 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9189 ) Change subject: [tablet] fix nullptr dereference while capturing iterators .. Patch Set 1: What test did you see this in? Can we add a new test that stresses creation of scanners while deleting a replica? -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 02 Feb 2018 06:30:08 + Gerrit-HasComments: No
[kudu-CR] [tablet] fix nullptr dereference while capturing iterators
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9189 Change subject: [tablet] fix nullptr dereference while capturing iterators .. [tablet] fix nullptr dereference while capturing iterators Added a check into Tablet::CaptureConsistentIterators() to make sure the tablet is not stopped/shutdown. Before this patch in one test scenario I saw stack traces like below (built in DEBUG configuration): kudu-tserver: src/kudu/gutil/ref_counted.h:284: T *scoped_refptr::operator->() const [T = kudu::tablet::TabletComponents]: Assertion `ptr_ != __null' failed. *** Aborted at 1517534012 (unix time) try "date -d @1517534012" if you are using GNU date *** PC: @ 0x7ff9ad39cc37 gsignal *** SIGABRT (@0x3e8745f) received by PID 29791 (TID 0x7ff99a0bc700) from PID 29791; stack trace: *** @ 0x7ff9b5129330 (unknown) at ??:0 @ 0x7ff9ad39cc37 gsignal at ??:0 @ 0x7ff9ad3a0028 abort at ??:0 @ 0x7ff9ad395bf6 (unknown) at ??:0 @ 0x7ff9ad395ca2 __assert_fail at ??:0 @ 0x7ff9b7f2ce52 scoped_refptr<>::operator->() at ??:0 @ 0x7ff9b7f1bf6d kudu::tablet::Tablet::CaptureConsistentIterators() at ??:0 @ 0x7ff9b7f225f6 kudu::tablet::Tablet::Iterator::Init() at ??:0 @ 0x7ff9b94372e3 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 @ 0x7ff9b943a906 kudu::tserver::TabletServiceImpl::Checksum() at ??:0 @ 0x7ff9b3d3c83d kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_11::operator()() at ??:0 @ 0x7ff9b3d3c682 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7ff9b2ea026b std::function<>::operator()() at ??:0 @ 0x7ff9b2e9fb2d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7ff9b2ea1ee6 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7ff9b2ea4499 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7ff9b2ea4400 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7ff9b2ea43aa boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7ff9b2ea418d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7ff9b2e45f68 boost::function0<>::operator()() at ??:0 @ 0x7ff9b115162d kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff9b5121184 start_thread at ??:0 @ 0x7ff9ad463ffd clone at ??:0 @0x0 (unknown) Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff --- M src/kudu/tablet/tablet.cc 1 file changed, 11 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/9189/1 -- To view, visit http://gerrit.cloudera.org:8080/9189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7600f006c8df7f445cc2551e99390177378bcff Gerrit-Change-Number: 9189 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin