Re: [ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-07 Thread kefu chai
On Tue, Mar 7, 2017 at 3:30 PM, kefu chai  wrote:
> On Fri, Mar 3, 2017 at 11:40 PM, Sage Weil  wrote:
>> On Fri, 3 Mar 2017, Mike Lovell wrote:
>>> i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
>>> cluster that is using cache tiering. this cluster has 3 monitors, 28 storage
>>> nodes, around 370 osds. the upgrade of the monitors completed without issue.
>>> i then upgraded 2 of the storage nodes, and after the restarts, the osds
>>> started crashing during hit_set_trim. here is some of the output from the
>>> log.
>>> 2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
>>> 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
>>> thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
>>> osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
>>>
>>>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x85) [0xbddac5]
>>>  2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
>>> int)+0x75f) [0x87e48f]
>>>  3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
>>>  4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
>>>  5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
>>> ThreadPool::TPHandle&)+0x68a) [0x83be4a]
>>>  6: (OSD::dequeue_op(boost::intrusive_ptr,
>>> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
>>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
>>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
>>> [0xbcd1cf]
>>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
>>>  10: (()+0x7dc5) [0x7f8c1c209dc5]
>>>  11: (clone()+0x6d) [0x7f8c1aceaced]
>>>
>>> it started on just one osd and then spread to others until most of the osds
>>> that are part of the cache tier were crashing. that was happening on both
>>> the osds that were running jewel and on the ones running hammer. in the
>>> process of trying to sort this out, the use_gmt_hitset option was set to
>>> true and all of the osds were upgraded to hammer. we still have not been
>>> able to determine a cause or a fix.
>>>
>>> it looks like when hit_set_trim and hit_set_remove_all are being called,
>>> they are calling hit_set_archive_object() to generate a name based on a
>>> timestamp and then calling get_object_context() which then returns nothing
>>> and triggers an assert.
>>>
>>> i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
>>> i found the following in the ceph osd log afterwards.
>>>
>>> 2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
>>> pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
>>> local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
>>> 266042/266042/266042) [146,116,179] r=0 lpr=266042
>>>  pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0 active+degraded
>>> NIBBLEWISE] get_object_context: no obc for soid
>>> 19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
>>> 05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
>>> 6Z:head and !can_create
>>> 2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
>>> start
>>> 2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
>>> finish
>>> 2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
>>> from osd.255
>>> 2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
>>> start
>>> 2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
>>> finish
>>> 2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
>>> 'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
>>> 2017-03-03 03:10:31.918201
>>> osd/ReplicatedPG.cc: 11494: FAILED assert(obc)
>>>
>>>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x85) [0x7f21acee9425]
>>>  2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
>>>  3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
>>>  4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xac)
>>> [0x7f21ac916adc]
>>>  5: (boost::statechart::simple_state>> PG::RecoveryState::Primary, 
>>> PG::RecoveryState::Activating,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
>>> se const&, void const*)+0x179) [0x7f21a
>>> c974909]
>>>  6: (boost::statechart::simple_state>> PG::RecoveryState::Active, boost::mpl::list>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>>> mpl_::na, mpl_::na, mpl_:
>>> :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
>>> mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
>>> se const&, void const*)+0xcd) [0x7f21ac977ccd]

Re: [ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-06 Thread kefu chai
On Fri, Mar 3, 2017 at 11:40 PM, Sage Weil  wrote:
> On Fri, 3 Mar 2017, Mike Lovell wrote:
>> i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
>> cluster that is using cache tiering. this cluster has 3 monitors, 28 storage
>> nodes, around 370 osds. the upgrade of the monitors completed without issue.
>> i then upgraded 2 of the storage nodes, and after the restarts, the osds
>> started crashing during hit_set_trim. here is some of the output from the
>> log.
>> 2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
>> 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
>> thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
>> osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
>>
>>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x85) [0xbddac5]
>>  2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
>> int)+0x75f) [0x87e48f]
>>  3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
>>  4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
>>  5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
>> ThreadPool::TPHandle&)+0x68a) [0x83be4a]
>>  6: (OSD::dequeue_op(boost::intrusive_ptr,
>> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
>> [0xbcd1cf]
>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
>>  10: (()+0x7dc5) [0x7f8c1c209dc5]
>>  11: (clone()+0x6d) [0x7f8c1aceaced]
>>
>> it started on just one osd and then spread to others until most of the osds
>> that are part of the cache tier were crashing. that was happening on both
>> the osds that were running jewel and on the ones running hammer. in the
>> process of trying to sort this out, the use_gmt_hitset option was set to
>> true and all of the osds were upgraded to hammer. we still have not been
>> able to determine a cause or a fix.
>>
>> it looks like when hit_set_trim and hit_set_remove_all are being called,
>> they are calling hit_set_archive_object() to generate a name based on a
>> timestamp and then calling get_object_context() which then returns nothing
>> and triggers an assert.
>>
>> i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
>> i found the following in the ceph osd log afterwards.
>>
>> 2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
>> pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
>> local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
>> 266042/266042/266042) [146,116,179] r=0 lpr=266042
>>  pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0 active+degraded
>> NIBBLEWISE] get_object_context: no obc for soid
>> 19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
>> 05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
>> 6Z:head and !can_create
>> 2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
>> start
>> 2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
>> finish
>> 2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
>> from osd.255
>> 2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
>> start
>> 2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
>> finish
>> 2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
>> 'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
>> 2017-03-03 03:10:31.918201
>> osd/ReplicatedPG.cc: 11494: FAILED assert(obc)
>>
>>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x85) [0x7f21acee9425]
>>  2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
>>  3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
>>  4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xac)
>> [0x7f21ac916adc]
>>  5: (boost::statechart::simple_state> PG::RecoveryState::Primary, 
>> PG::RecoveryState::Activating,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
>> se const&, void const*)+0x179) [0x7f21a
>> c974909]
>>  6: (boost::statechart::simple_state> PG::RecoveryState::Active, boost::mpl::list> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_:
>> :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
>> mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
>> se const&, void const*)+0xcd) [0x7f21ac977ccd]
>>  7: (boost::statechart::state_machine> PG::RecoveryState::Initial, 
>> 

Re: [ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-03 Thread Sage Weil
On Fri, 3 Mar 2017, Mike Lovell wrote:
> i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
> cluster that is using cache tiering. this cluster has 3 monitors, 28 storage
> nodes, around 370 osds. the upgrade of the monitors completed without issue.
> i then upgraded 2 of the storage nodes, and after the restarts, the osds
> started crashing during hit_set_trim. here is some of the output from the
> log.
> 2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
> 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
> thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
> osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
> 
>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0xbddac5]
>  2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
> int)+0x75f) [0x87e48f]
>  3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
>  4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
>  5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
> ThreadPool::TPHandle&)+0x68a) [0x83be4a]
>  6: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
>  7: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
> [0xbcd1cf]
>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
>  10: (()+0x7dc5) [0x7f8c1c209dc5]
>  11: (clone()+0x6d) [0x7f8c1aceaced]
> 
> it started on just one osd and then spread to others until most of the osds
> that are part of the cache tier were crashing. that was happening on both
> the osds that were running jewel and on the ones running hammer. in the
> process of trying to sort this out, the use_gmt_hitset option was set to
> true and all of the osds were upgraded to hammer. we still have not been
> able to determine a cause or a fix.
> 
> it looks like when hit_set_trim and hit_set_remove_all are being called,
> they are calling hit_set_archive_object() to generate a name based on a
> timestamp and then calling get_object_context() which then returns nothing
> and triggers an assert.
> 
> i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
> i found the following in the ceph osd log afterwards.
> 
> 2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
> pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
> local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
> 266042/266042/266042) [146,116,179] r=0 lpr=266042
>  pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0 active+degraded
> NIBBLEWISE] get_object_context: no obc for soid
> 19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
> 05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
> 6Z:head and !can_create
> 2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
> start
> 2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
> finish
> 2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
> from osd.255
> 2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
> start
> 2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
> finish
> 2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
> 'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
> 2017-03-03 03:10:31.918201
> osd/ReplicatedPG.cc: 11494: FAILED assert(obc)
> 
>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7f21acee9425]
>  2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
>  3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
>  4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xac)
> [0x7f21ac916adc]
>  5: (boost::statechart::simple_state PG::RecoveryState::Primary, 
> PG::RecoveryState::Activating,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
> se const&, void const*)+0x179) [0x7f21a
> c974909]
>  6: (boost::statechart::simple_state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_:
> :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
> se const&, void const*)+0xcd) [0x7f21ac977ccd]
>  7: (boost::statechart::state_machine PG::RecoveryState::Initial, 
> std::allocator,boost::statechart::null_exception_translator>::send_event(boost::statechart
> ::event_base const&)+0x6b) [0x7f21ac95
> d9cb]
>  8: (PG::handle_peering_event(std::shared_ptr,
> 

[ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-03 Thread Mike Lovell
i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
cluster that is using cache tiering. this cluster has 3 monitors, 28
storage nodes, around 370 osds. the upgrade of the monitors completed
without issue. i then upgraded 2 of the storage nodes, and after the
restarts, the osds started crashing during hit_set_trim. here is some of
the output from the log.

2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
osd/ReplicatedPG.cc: 10514: FAILED assert(obc)

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0xbddac5]
 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
int)+0x75f) [0x87e48f]
 3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
 5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
ThreadPool::TPHandle&)+0x68a) [0x83be4a]
 6: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
 7: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
[0xbcd1cf]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
 10: (()+0x7dc5) [0x7f8c1c209dc5]
 11: (clone()+0x6d) [0x7f8c1aceaced]

it started on just one osd and then spread to others until most of the osds
that are part of the cache tier were crashing. that was happening on both
the osds that were running jewel and on the ones running hammer. in the
process of trying to sort this out, the use_gmt_hitset option was set to
true and all of the osds were upgraded to hammer. we still have not been
able to determine a cause or a fix.

it looks like when hit_set_trim and hit_set_remove_all are being called,
they are calling hit_set_archive_object() to generate a name based on a
timestamp and then calling get_object_context() which then returns nothing
and triggers an assert.

i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
i found the following in the ceph osd log afterwards.

2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
266042/266042/266042) [146,116,179] r=0 lpr=266042
 pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0
active+degraded NIBBLEWISE] get_object_context: no obc for soid
19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
6Z:head and !can_create
2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
start
2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
finish
2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
from osd.255
2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
start
2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
finish
2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
2017-03-03 03:10:31.918201
osd/ReplicatedPG.cc: 11494: FAILED assert(obc)

 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7f21acee9425]
 2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
 3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
 4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated
const&)+0xac) [0x7f21ac916adc]
 5: (boost::statechart::simple_state::react_impl(boost::statechart::event_base
const&, void const*)+0x179) [0x7f21a
c974909]
 6: (boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xcd) [0x7f21ac977ccd]
 7: (boost::statechart::state_machine::send_event(boost::statechart::event_base
const&)+0x6b) [0x7f21ac95
d9cb]
 8: (PG::handle_peering_event(std::shared_ptr,
PG::RecoveryCtx*)+0x1f4) [0x7f21ac924d24]
 9: (OSD::process_peering_events(std::list >
const&, ThreadPool::TPHandle&)+0x259) [0x7f21ac87de99]
 10: (OSD::PeeringWQ::_process(std::list > const&,