[jira] [Commented] (KUDU-2295) nullptr dereference while scanning on already shutdown tablet replica
[ https://issues.apache.org/jira/browse/KUDU-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394329#comment-16394329 ] Alexey Serbin commented on KUDU-2295: - One more stack trace captured after with RYW changes were committed: {noformat} PC: @ 0x7f563b5e31cb std::atomic_bool::load() *** SIGSEGV (@0x1f8) received by PID 19893 (TID 0x7f561ce82700) from PID 504; stack trace: *** @ 0x7f5638713330 (unknown) at ??:0 @ 0x7f563b5e31cb std::atomic_bool::load() at ??:0 @ 0x7f563b609c31 kudu::tablet::MvccManager::is_open() at ??:0 @ 0x7f563b6085f3 kudu::tablet::MvccManager::CheckOpen() at ??:0 @ 0x7f563b607fc5 kudu::tablet::MvccManager::WaitUntil() at ??:0 @ 0x7f563b608938 kudu::tablet::MvccManager::WaitForSnapshotWithAllCommitted() at ??:0 @ 0x7f563ca61b55 kudu::tserver::TabletServiceImpl::HandleScanAtSnapshot() at ??:0 @ 0x7f563ca5c0e2 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 @ 0x7f563ca59793 kudu::tserver::TabletServiceImpl::Scan() at ??:0 @ 0x7f5637324e4d kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_5::operator()() at ??:0 @ 0x7f5637324c92 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7f563648992b std::function<>::operator()() at ??:0 @ 0x7f56364891ed kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7f563648b5e6 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7f563648dc29 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7f563648db90 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7f563648db3a boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7f563648d91d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7f5636430078 boost::function0<>::operator()() at ??:0 @ 0x7f563472c08d kudu::Thread::SuperviseThread() at ??:0 @ 0x7f563870b184 start_thread at ??:0 @ 0x7f5630a2affd clone at ??:0 {noformat} > nullptr dereference while scanning on already shutdown tablet replica > - > > Key: KUDU-2295 > URL: https://issues.apache.org/jira/browse/KUDU-2295 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > > While running the \{{raft_consensus_stress-itest}}, I find one of tablet > servers crashed with the following stack trace: > {noformat} > > *** Aborted at 1518480865 (unix time) try "date -d @1518480865" if you are > using GNU date *** > PC: @ 0x7f1e02025790 scoped_refptr<>::operator->() > > *** SIGSEGV (@0x160) received by PID 8782 (TID 0x7f1de3c7e700) from PID 352; > stack trace: *** > @ 0x7f1dfdcfc330 (unknown) at ??:0 > > @ 0x7f1e02025790 scoped_refptr<>::operator->() at ??:0 > > @ 0x7f1e00ae62e7 kudu::tablet::Tablet::GetTabletAncientHistoryMark() > at ??:0 > @ 0x7f1e00ae627d kudu::tablet::Tablet::GetHistoryGcOpts() at ??:0 > > @ 0x7f1e02012c53 kudu::tserver::(anonymous > namespace)::VerifyNotAncientHistory() at ??:0 > @ 0x7f1e0201223b > kudu::tserver::TabletServiceImpl::HandleScanAtSnapshot() at ??:0 > @ 0x7f1e0200c6dd > kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 > @ 0x7f1e02009d33 kudu::tserver::TabletServiceImpl::Scan() at ??:0 > > @ 0x7f1dfc90de4d > kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_5::operator()() > at ??:0 > @ 0x7f1dfc90dc92 std::_Function_handler<>::_M_invoke() at ??:0 > > @ 0x7f1dfba728ab std::function<>::operator()() at ??:0 > > @ 0x7f1dfba7216d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > > @ 0x7f1dfba74526 kudu::rpc::ServicePool::RunThread() at ??:0 > > @ 0x7f1dfba76ad9 boost::_mfi::mf0<>::operator()() at ??:0 > > @ 0x7f1dfba76a40 boost::_bi::list1<>::operator()<>() at ??:0 > > @ 0x7f1dfba769ea boost::_bi::bind_t<>::operator()() at ??:0 > > @ 0x7f1dfba767cd > boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 > @ 0x7f1dfba190f8 boost::function0<>::operator()() at ??:0 > > @ 0x7f1df9d1788d kudu::Thread::SuperviseThread() at ??:0
[jira] [Commented] (KUDU-2295) nullptr dereference while scanning on already shutdown tablet replica
[ https://issues.apache.org/jira/browse/KUDU-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363343#comment-16363343 ] Alexey Serbin commented on KUDU-2295: - Just another manifestation of the same issue, I think: {noformat} *** Aborted at 1518568875 (unix time) try "date -d @1518568875" if you are using GNU date ***PC: @ 0x7f68aaf6e3ec scoped_refptr<>::get() *** SIGSEGV (@0xe0) received by PID 4224 (TID 0x7f687fbb5700) from PID 224; stack trace: *** @ 0x7f68a6c4d330 (unknown) at ??:0 @ 0x7f68aaf6e3ec scoped_refptr<>::get() at ??:0 @ 0x7f68aaf6e3ac kudu::tablet::Tablet::metadata() at ??:0 @ 0x7f68aaf69ec2 kudu::tablet::TabletReplica::permanent_uuid() at ??:0 @ 0x7f68aaf52fa8 kudu::tserver::ConsensusServiceImpl::UpdateConsensus() at ??:0 @ 0x7f68a53ab1bd kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()( ) at ??:0 @ 0x7f68a53ab002 std::_Function_handler<>::_M_invoke() at ??:0 @ 0x7f68a49c388b std::function<>::operator()() at ??:0 @ 0x7f68a49c314d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 @ 0x7f68a49c5506 kudu::rpc::ServicePool::RunThread() at ??:0 @ 0x7f68a49c7ab9 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7f68a49c7a20 boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7f68a49c79ca boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7f68a49c77ad boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7f68a496a0d8 boost::function0<>::operator()() at ??:0 @ 0x7f68a2c6886d kudu::Thread::SuperviseThread() at ??:0 @ 0x7f68a6c45184 start_thread at ??:0 @ 0x7f689ef74ffd clone at ??:0 @ 0x0 (unknown){noformat} > nullptr dereference while scanning on already shutdown tablet replica > - > > Key: KUDU-2295 > URL: https://issues.apache.org/jira/browse/KUDU-2295 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.7.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > > While running the \{{raft_consensus_stress-itest}}, I find one of tablet > servers crashed with the following stack trace: > {noformat} > > *** Aborted at 1518480865 (unix time) try "date -d @1518480865" if you are > using GNU date *** > PC: @ 0x7f1e02025790 scoped_refptr<>::operator->() > > *** SIGSEGV (@0x160) received by PID 8782 (TID 0x7f1de3c7e700) from PID 352; > stack trace: *** > @ 0x7f1dfdcfc330 (unknown) at ??:0 > > @ 0x7f1e02025790 scoped_refptr<>::operator->() at ??:0 > > @ 0x7f1e00ae62e7 kudu::tablet::Tablet::GetTabletAncientHistoryMark() > at ??:0 > @ 0x7f1e00ae627d kudu::tablet::Tablet::GetHistoryGcOpts() at ??:0 > > @ 0x7f1e02012c53 kudu::tserver::(anonymous > namespace)::VerifyNotAncientHistory() at ??:0 > @ 0x7f1e0201223b > kudu::tserver::TabletServiceImpl::HandleScanAtSnapshot() at ??:0 > @ 0x7f1e0200c6dd > kudu::tserver::TabletServiceImpl::HandleNewScanRequest() at ??:0 > @ 0x7f1e02009d33 kudu::tserver::TabletServiceImpl::Scan() at ??:0 > > @ 0x7f1dfc90de4d > kudu::tserver::TabletServerServiceIf::TabletServerServiceIf()::$_5::operator()() > at ??:0 > @ 0x7f1dfc90dc92 std::_Function_handler<>::_M_invoke() at ??:0 > > @ 0x7f1dfba728ab std::function<>::operator()() at ??:0 > > @ 0x7f1dfba7216d kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > > @ 0x7f1dfba74526 kudu::rpc::ServicePool::RunThread() at ??:0 > > @ 0x7f1dfba76ad9 boost::_mfi::mf0<>::operator()() at ??:0 > > @ 0x7f1dfba76a40 boost::_bi::list1<>::operator()<>() at ??:0 > > @ 0x7f1dfba769ea boost::_bi::bind_t<>::operator()() at ??:0 > > @ 0x7f1dfba767cd > boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 > @ 0x7f1dfba190f8 boost::function0<>::operator()() at ??:0 > > @ 0x7f1df9d1788d kudu::Thread::SuperviseThread() at ??:0 > > @ 0x7f1dfdcf4184 start_thread at ??:0 > > @ 0x7f1df6023ffd clone at ??:0 > > @