Re: [ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-25 Thread Brad Hubbard
https://tracker.ceph.com/issues/38724

On Fri, Aug 23, 2019 at 10:18 PM Paul Emmerich  wrote:
>
> I've seen that before (but never on Nautilus), there's already an
> issue at tracker.ceph.com but I don't recall the id or title.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Fri, Aug 23, 2019 at 1:47 PM Lars Täuber  wrote:
> >
> > Hi Paul,
> >
> > a result of fgrep is attached.
> > Can you do something with it?
> >
> > I can't read it. Maybe this is the relevant part:
> > " bluestore(/var/lib/ceph/osd/first-16) _txc_add_transaction error (39) 
> > Directory not empty not handled on operation 21 (op 1, counting from 0)"
> >
> > Later I tried it again and the osd is working again.
> >
> > It feels like I hit a bug!?
> >
> > Huge thanks for your help.
> >
> > Cheers,
> > Lars
> >
> > Fri, 23 Aug 2019 13:36:00 +0200
> > Paul Emmerich  ==> Lars Täuber  :
> > > Filter the log for "7f266bdc9700" which is the id of the crashed
> > > thread, it should contain more information on the transaction that
> > > caused the crash.
> > >
> > >
> > > Paul
> > >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-23 Thread Paul Emmerich
I've seen that before (but never on Nautilus), there's already an
issue at tracker.ceph.com but I don't recall the id or title.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Aug 23, 2019 at 1:47 PM Lars Täuber  wrote:
>
> Hi Paul,
>
> a result of fgrep is attached.
> Can you do something with it?
>
> I can't read it. Maybe this is the relevant part:
> " bluestore(/var/lib/ceph/osd/first-16) _txc_add_transaction error (39) 
> Directory not empty not handled on operation 21 (op 1, counting from 0)"
>
> Later I tried it again and the osd is working again.
>
> It feels like I hit a bug!?
>
> Huge thanks for your help.
>
> Cheers,
> Lars
>
> Fri, 23 Aug 2019 13:36:00 +0200
> Paul Emmerich  ==> Lars Täuber  :
> > Filter the log for "7f266bdc9700" which is the id of the crashed
> > thread, it should contain more information on the transaction that
> > caused the crash.
> >
> >
> > Paul
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-23 Thread Lars Täuber
Hi Paul,

a result of fgrep is attached.
Can you do something with it?

I can't read it. Maybe this is the relevant part:
" bluestore(/var/lib/ceph/osd/first-16) _txc_add_transaction error (39) 
Directory not empty not handled on operation 21 (op 1, counting from 0)"

Later I tried it again and the osd is working again.

It feels like I hit a bug!?

Huge thanks for your help.

Cheers,
Lars

Fri, 23 Aug 2019 13:36:00 +0200
Paul Emmerich  ==> Lars Täuber  :
> Filter the log for "7f266bdc9700" which is the id of the crashed
> thread, it should contain more information on the transaction that
> caused the crash.
> 
> 
> Paul
> 


7f266bdc9700.log.gz
Description: application/gzip
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-23 Thread Paul Emmerich
Filter the log for "7f266bdc9700" which is the id of the crashed
thread, it should contain more information on the transaction that
caused the crash.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Aug 23, 2019 at 9:29 AM Lars Täuber  wrote:
>
> Hi there!
>
> In our testcluster is an osd that won't start anymore.
>
> Here is a short part of the log:
>
> -1> 2019-08-23 08:56:13.316 7f266bdc9700 -1 
> /tmp/release/Debian/WORKDIR/ceph-14.2.2/src/os/bluestore/BlueStore.cc: In 
> function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)' thread 7f266bdc9700 time 2019-08-23 
> 08:56:13.318938
> /tmp/release/Debian/WORKDIR/ceph-14.2.2/src/os/bluestore/BlueStore.cc: 11208: 
> ceph_abort_msg("unexpected error")
>
>  ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus 
> (stable)
>  1: (ceph::__ceph_abort(char const*, int, char const*, 
> std::__cxx11::basic_string, std::allocator 
> > const&)+0xdf) [0x564406ac153a]
>  2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x2830) [0x5644070e48d0]
>  3: 
> (BlueStore::queue_transactions(boost::intrusive_ptr&,
>  std::vector std::allocator >&, boost::intrusive_ptr, 
> ThreadPool::TPHandle*)+0x42a) [0x5644070ec33a]
>  4: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr, 
> ThreadPool::TPHandle*)+0x7f) [0x564406cd620f]
>  5: (PG::_delete_some(ObjectStore::Transaction*)+0x945) [0x564406d32d85]
>  6: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x71) 
> [0x564406d337d1]
>  7: (boost::statechart::simple_state PG::RecoveryState::ToDelete, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na, mpl_::na, mpl_::na>, 
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base 
> const&, void const*)+0x109) [0x564406d81ec9]
>  8: (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator, 
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
>  const&)+0x6b) [0x564406d4e7cb]
>  9: (PG::do_peering_event(std::shared_ptr, 
> PG::RecoveryCtx*)+0x2af) [0x564406d3f39f]
>  10: (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> std::shared_ptr, ThreadPool::TPHandle&)+0x1b4) 
> [0x564406c7e644]
>  11: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> ThreadPool::TPHandle&)+0xc4) [0x564406c7e8c4]
>  12: (OSD::ShardedOpWQ::_process(unsigned int, 
> ceph::heartbeat_handle_d*)+0x7d7) [0x564406c72667]
>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) 
> [0x56440724f7d4]
>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5644072521d0]
>  15: (()+0x7fa3) [0x7f26862f6fa3]
>  16: (clone()+0x3f) [0x7f2685ea64cf]
>
>
> The log is so huge that I don't know which part may be of interest. The cite 
> is the part I think is most useful.
> Is there anybody able to read and explain this?
>
>
> Thanks in advance,
> Lars
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-23 Thread Lars Täuber
Hi there!

In our testcluster is an osd that won't start anymore.

Here is a short part of the log:

-1> 2019-08-23 08:56:13.316 7f266bdc9700 -1 
/tmp/release/Debian/WORKDIR/ceph-14.2.2/src/os/bluestore/BlueStore.cc: In 
function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
ObjectStore::Transaction*)' thread 7f266bdc9700 time 2019-08-23 08:56:13.318938
/tmp/release/Debian/WORKDIR/ceph-14.2.2/src/os/bluestore/BlueStore.cc: 11208: 
ceph_abort_msg("unexpected error")

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus 
(stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, 
std::__cxx11::basic_string, std::allocator > 
const&)+0xdf) [0x564406ac153a]
 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
ObjectStore::Transaction*)+0x2830) [0x5644070e48d0]
 3: 
(BlueStore::queue_transactions(boost::intrusive_ptr&,
 std::vector 
>&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x42a) 
[0x5644070ec33a]
 4: 
(ObjectStore::queue_transaction(boost::intrusive_ptr&,
 ObjectStore::Transaction&&, boost::intrusive_ptr, 
ThreadPool::TPHandle*)+0x7f) [0x564406cd620f]
 5: (PG::_delete_some(ObjectStore::Transaction*)+0x945) [0x564406d32d85]
 6: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x71) 
[0x564406d337d1]
 7: (boost::statechart::simple_state, 
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base 
const&, void const*)+0x109) [0x564406d81ec9]
 8: (boost::statechart::state_machine, 
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
 const&)+0x6b) [0x564406d4e7cb]
 9: (PG::do_peering_event(std::shared_ptr, 
PG::RecoveryCtx*)+0x2af) [0x564406d3f39f]
 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr, 
ThreadPool::TPHandle&)+0x1b4) [0x564406c7e644]
 11: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
ThreadPool::TPHandle&)+0xc4) [0x564406c7e8c4]
 12: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0x7d7) [0x564406c72667]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) 
[0x56440724f7d4]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5644072521d0]
 15: (()+0x7fa3) [0x7f26862f6fa3]
 16: (clone()+0x3f) [0x7f2685ea64cf]


The log is so huge that I don't know which part may be of interest. The cite is 
the part I think is most useful.
Is there anybody able to read and explain this?


Thanks in advance,
Lars


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com