[ceph-users] Re: ceph osd continously fails

2021-08-12 Thread Wesley Dillingham
Can you send the results of "ceph daemon osd.0 status" and maybe do that
for a couple of osd ids ? You may need to target ones which are currently
running.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Aug 11, 2021 at 9:51 AM Amudhan P  wrote:

> Hi,
>
> Below are the logs in one of the failed OSD.
>
> Aug 11 16:55:48 bash[27152]: debug-20> 2021-08-11T11:25:47.433+
> 7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
> src has [
> Aug 11 16:55:48 bash[27152]: debug-19> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-18> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-17> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-16> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-15> 2021-08-11T11:25:47.441+
> 7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
> src has [
> Aug 11 16:55:48 bash[27152]: debug-14> 2021-08-11T11:25:47.561+
> 7fbf3a817700  2 osd.12 6697 ms_handle_refused con 0x563b53a3cc00 session
> 0x563b51aecb
> Aug 11 16:55:48 bash[27152]: debug-13> 2021-08-11T11:25:47.561+
> 7fbf3a817700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
> 10.0.103.2:3300/
> Aug 11 16:55:48 bash[27152]: debug-12> 2021-08-11T11:25:47.565+
> 7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66226000 session 0
> Aug 11 16:55:48 bash[27152]: debug-11> 2021-08-11T11:25:47.581+
> 7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66227c00 session 0
> Aug 11 16:55:48 bash[27152]: debug-10> 2021-08-11T11:25:47.581+
> 7fbf4e0ae700 10 monclient: get_auth_request con 0x563b53a4f400 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -9> 2021-08-11T11:25:47.581+
> 7fbf39815700  2 osd.12 6697 ms_handle_refused con 0x563b53a3c800 session
> 0x563b679120
> Aug 11 16:55:48 bash[27152]: debug -8> 2021-08-11T11:25:47.581+
> 7fbf39815700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
> 10.0.103.2:3300/
> Aug 11 16:55:48 bash[27152]: debug -7> 2021-08-11T11:25:47.581+
> 7fbf4f0b0700 10 monclient: get_auth_request con 0x563b6331d000 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -6> 2021-08-11T11:25:47.581+
> 7fbf4e8af700 10 monclient: get_auth_request con 0x563b53a4f000 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -5> 2021-08-11T11:25:47.717+
> 7fbf4f0b0700 10 monclient: get_auth_request con 0x563b66226c00 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -4> 2021-08-11T11:25:47.789+
> 7fbf43623700  5 prioritycache tune_memory target: 1073741824 mapped:
> 388874240 unmap
> Aug 11 16:55:48 bash[27152]: debug -3> 2021-08-11T11:25:47.925+
> 7fbf32807700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> Aug 11 16:55:48 bash[27152]:
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZ
> Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
> (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
> Aug 11 16:55:48 bash[27152]:  1: (ceph::__ceph_assert_fail(char const*,
> char const*, int, char const*)+0x158) [0x563b46835dbe]
> Aug 11 16:55:48 bash[27152]:  2: (()+0x504fd8) [0x563b46835fd8]
> Aug 11 16:55:48 bash[27152]:  3: (OSD::do_recovery(PG*, unsigned int,
> unsigned long, ThreadPool::TPHandle&)+0x5f5) [0x563b46918c25]
> Aug 11 16:55:48 bash[27152]:  4:
> (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
> boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x563b46b74
> Aug 11 16:55:48 bash[27152]:  5: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x12ef) [0x563b469364df]
> Aug 11 16:55:48 bash[27152]:  6:
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> [0x563b46f6f224]
> Aug 11 16:55:48 bash[27152]:  7:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563b46f71e84]
> Aug 11 16:55:48 bash[27152]:  8: (()+0x82de) [0x7fbf528952de]
> Aug 11 16:55:48 bash[27152]:  9: (clone()+0x43) [0x7fbf515cce83]
> Aug 11 16:55:48 bash[27152]: debug -2> 2021-08-11T11:25:47.929+
> 7fbf32807700 -1 *** Caught signal (Aborted) **
> Aug 11 16:55:48 bash[27152]:  in thread 7fbf32807700 thread_name:tp_osd_tp
> Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
> (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
> Aug 11 16:55:48 bash[27152]:  1: (()+0x12dd0) [0x7fbf5289fdd0]
> Aug 11 16:55:48 bash[27152]:  2: (gsignal()+0x10f) [0x7fbf5150870f]
> Aug 11 16:55:48 

[ceph-users] Re: ceph osd continously fails

2021-08-11 Thread Amudhan P
Hi,

Below are the logs in one of the failed OSD.

Aug 11 16:55:48 bash[27152]: debug-20> 2021-08-11T11:25:47.433+
7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
src has [
Aug 11 16:55:48 bash[27152]: debug-19> 2021-08-11T11:25:47.433+
7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
(4460'174466,6312'18356
Aug 11 16:55:48 bash[27152]: debug-18> 2021-08-11T11:25:47.433+
7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
(4460'174466,6312'18356
Aug 11 16:55:48 bash[27152]: debug-17> 2021-08-11T11:25:47.433+
7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
(4460'174466,6312'18356
Aug 11 16:55:48 bash[27152]: debug-16> 2021-08-11T11:25:47.433+
7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
(4460'174466,6312'18356
Aug 11 16:55:48 bash[27152]: debug-15> 2021-08-11T11:25:47.441+
7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
src has [
Aug 11 16:55:48 bash[27152]: debug-14> 2021-08-11T11:25:47.561+
7fbf3a817700  2 osd.12 6697 ms_handle_refused con 0x563b53a3cc00 session
0x563b51aecb
Aug 11 16:55:48 bash[27152]: debug-13> 2021-08-11T11:25:47.561+
7fbf3a817700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
10.0.103.2:3300/
Aug 11 16:55:48 bash[27152]: debug-12> 2021-08-11T11:25:47.565+
7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66226000 session 0
Aug 11 16:55:48 bash[27152]: debug-11> 2021-08-11T11:25:47.581+
7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66227c00 session 0
Aug 11 16:55:48 bash[27152]: debug-10> 2021-08-11T11:25:47.581+
7fbf4e0ae700 10 monclient: get_auth_request con 0x563b53a4f400 auth_method 0
Aug 11 16:55:48 bash[27152]: debug -9> 2021-08-11T11:25:47.581+
7fbf39815700  2 osd.12 6697 ms_handle_refused con 0x563b53a3c800 session
0x563b679120
Aug 11 16:55:48 bash[27152]: debug -8> 2021-08-11T11:25:47.581+
7fbf39815700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
10.0.103.2:3300/
Aug 11 16:55:48 bash[27152]: debug -7> 2021-08-11T11:25:47.581+
7fbf4f0b0700 10 monclient: get_auth_request con 0x563b6331d000 auth_method 0
Aug 11 16:55:48 bash[27152]: debug -6> 2021-08-11T11:25:47.581+
7fbf4e8af700 10 monclient: get_auth_request con 0x563b53a4f000 auth_method 0
Aug 11 16:55:48 bash[27152]: debug -5> 2021-08-11T11:25:47.717+
7fbf4f0b0700 10 monclient: get_auth_request con 0x563b66226c00 auth_method 0
Aug 11 16:55:48 bash[27152]: debug -4> 2021-08-11T11:25:47.789+
7fbf43623700  5 prioritycache tune_memory target: 1073741824 mapped:
388874240 unmap
Aug 11 16:55:48 bash[27152]: debug -3> 2021-08-11T11:25:47.925+
7fbf32807700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
Aug 11 16:55:48 bash[27152]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZ
Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
(88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
Aug 11 16:55:48 bash[27152]:  1: (ceph::__ceph_assert_fail(char const*,
char const*, int, char const*)+0x158) [0x563b46835dbe]
Aug 11 16:55:48 bash[27152]:  2: (()+0x504fd8) [0x563b46835fd8]
Aug 11 16:55:48 bash[27152]:  3: (OSD::do_recovery(PG*, unsigned int,
unsigned long, ThreadPool::TPHandle&)+0x5f5) [0x563b46918c25]
Aug 11 16:55:48 bash[27152]:  4:
(ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x563b46b74
Aug 11 16:55:48 bash[27152]:  5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x563b469364df]
Aug 11 16:55:48 bash[27152]:  6:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x563b46f6f224]
Aug 11 16:55:48 bash[27152]:  7:
(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563b46f71e84]
Aug 11 16:55:48 bash[27152]:  8: (()+0x82de) [0x7fbf528952de]
Aug 11 16:55:48 bash[27152]:  9: (clone()+0x43) [0x7fbf515cce83]
Aug 11 16:55:48 bash[27152]: debug -2> 2021-08-11T11:25:47.929+
7fbf32807700 -1 *** Caught signal (Aborted) **
Aug 11 16:55:48 bash[27152]:  in thread 7fbf32807700 thread_name:tp_osd_tp
Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
(88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
Aug 11 16:55:48 bash[27152]:  1: (()+0x12dd0) [0x7fbf5289fdd0]
Aug 11 16:55:48 bash[27152]:  2: (gsignal()+0x10f) [0x7fbf5150870f]
Aug 11 16:55:48 bash[27152]:  3: (abort()+0x127) [0x7fbf514f2b25]
Aug 11 16:55:48 bash[27152]:  4: (ceph::__ceph_assert_fail(char const*,
char const*, int, char const*)+0x1a9) [0x563b46835e0f]
Aug 11 16:55:48 bash[27152]:  5: (()+0x504fd8) [0x563b46835fd8]
Aug 11 16:55:48 bash[27152]:  6: (OSD::do_recovery(PG*, unsigned int,
unsigned long, ThreadPool::TPHandle&)+0x5f5) [0x563b46918c25]
Aug 11 16:55:48 bash[27152]:  7:
(ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr&,