from:"Mohd Bazli Ab Karim"

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-21 Thread Mohd Bazli Ab Karim

Hi all,

Our MDS still fine today. Thanks everyone!

Regards,
Bazli

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mohd Bazli Ab Karim
Sent: Monday, January 19, 2015 11:38 AM
To: John Spray
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: RE: MDS aborted after recovery and active, FAILED assert (r =0)

Hi John,

Good shot!
I've increased the osd_max_write_size to 1GB (still smaller than osd journal 
size) and now the mds still running fine after an hour.
Now checking if fs still accessible or not. Will update from time to time.

Thanks again John.

Regards,
Bazli


-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Friday, January 16, 2015 11:58 PM
To: Mohd Bazli Ab Karim
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

It has just been pointed out to me that you can also workaround this issue on 
your existing system by increasing the osd_max_write_size setting on your OSDs 
(default 90MB) to something higher, but still smaller than your osd journal 
size.  That might get you on a path to having an accessible filesystem before 
you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc:
 In function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-18 Thread Mohd Bazli Ab Karim

Hi John,

Good shot!
I've increased the osd_max_write_size to 1GB (still smaller than osd journal 
size) and now the mds still running fine after an hour.
Now checking if fs still accessible or not. Will update from time to time.

Thanks again John.

Regards,
Bazli


-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Friday, January 16, 2015 11:58 PM
To: Mohd Bazli Ab Karim
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

It has just been pointed out to me that you can also workaround this issue on 
your existing system by increasing the osd_max_write_size setting on your OSDs 
(default 90MB) to something higher, but still smaller than your osd journal 
size.  That might get you on a path to having an accessible filesystem before 
you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc:
 In function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law

[ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Mohd Bazli Ab Karim

Dear Ceph-Users, Ceph-Devel,

Apologize me if you get double post of this email.

I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 down 
and only 1 up) at the moment.
Plus I have one CephFS client mounted to it.

Now, the MDS always get aborted after recovery and active for 4 secs.
Some parts of the log are as below:

-3 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.19 10.4.118.32:6821/243161 73  osd_op_re
ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
7770bc80 con 0x69c7dc0
-2 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.18 10.4.118.32:6818/243072 67  osd_op_re
ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 0) 
v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
0x1c6bb00
-1 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.47 10.4.118.35:6809/8290 79  osd_op_repl
y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
a00 con 0x1c6b9a0
 0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In function 
'void MDSTable::save_2(int, version_t)' thread 7
fbcc8226700 time 2015-01-15 14:10:28.46
mds/MDSTable.cc: 83: FAILED assert(r = 0)

 ceph version  ()
 1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
 2: (Context::complete(int)+0x9) [0x568d29]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
 4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
 5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
 6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
 7: (DispatchQueue::entry()+0x549) [0x975739]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
 9: (()+0x7e9a) [0x7fbcccb0de9a]
 10: (clone()+0x6d) [0x7fbccb4ba3fd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

Is there any workaround/patch to fix this issue? Let me know if need to see the 
log with debug-mds of certain level as well.
Any helps would be very much appreciated.

Thanks.
Bazli


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Mohd Bazli Ab Karim

Agree. I was about to upgrade to 0.90, but has postponed it due to this error.
Any chance for me to recover it first before upgrading it?

Thanks Wido.

Regards,
Bazli

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Wido den Hollander
Sent: Friday, January 16, 2015 3:50 PM
To: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

On 01/16/2015 08:37 AM, Mohd Bazli Ab Karim wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.


In Ceph world 0.72.2 is ancient en pretty old. If you want to play with CephFS 
I recommend you upgrade to 0.90 and also use at least kernel 3.18

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In
 function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli

 
 DISCLAIMER:


 This e-mail (including any attachments) is for the addressee(s) only and may 
 be confidential, especially as regards personal data. If you are not the 
 intended recipient, please note that any dealing, review, distribution, 
 printing, copying or use of this e-mail is strictly prohibited. If you have 
 received this email in error, please notify the sender immediately and delete 
 the original message (including any attachments).


 MIMOS Berhad is a research and development institution under the purview of 
 the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
 conclusions and other information in this e-mail that do not relate to the 
 official business of MIMOS Berhad and/or its subsidiaries shall be understood 
 as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
 neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
 same. All liability arising from or in connection with computer viruses 
 and/or corrupted e-mails is excluded to the fullest extent permitted by law.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any

Re: [ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

2014-04-30 Thread Mohd Bazli Ab Karim

Hi Zheng,

Sorry for the late reply. For sure, I will try this again after we completely 
verifying all content in the file system. Hopefully all will be good.
And, please confirm this, I will set debug_mds=10 for the ceph-mds, and do you 
want me to send the ceph-mon log too?

BTW, how to confirm that the mds has passed the beacon to mon or not?

Thank you so much Zheng!

Bazli

-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Tuesday, April 29, 2014 10:13 PM
To: Mohd Bazli Ab Karim
Cc: Luke Jing Yuan; Wong Ming Tat
Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function replay 
mds/journal.cc

On Tue, Apr 29, 2014 at 5:30 PM, Mohd Bazli Ab Karim bazli.abka...@mimos.my 
wrote:
 Hi Zheng,

 The another issue that Luke mentioned just now was like this.
 At first, we ran one mds (mon01) with the new compiled ceph-mds. It works 
 fine with only one MDS running at that time. However, when we ran two more 
 MDSes mon02 mon03 with the new compiled ceph-mds, it started acting weird.
 Mon01 which was became active at first, will have the error and started to 
 respawning. Once respawning happened, mon03 will take over from mon01 as 
 master mds, and replay happened again.
 Again, when mon03 became active, it will have the same error like below, and 
 respawning again. So, it seems to me that replay will continue to happen from 
 one mds to another when they got respawned.

 2014-04-29 15:36:24.917798 7f5c36476700  1 mds.0.server
 reconnect_clients -- 1 sessions
 2014-04-29 15:36:24.919620 7f5c2fb3e700  0 -- 10.4.118.23:6800/26401
  10.1.64.181:0/1558263174 pipe(0x2924f5780 sd=41 :6800 s=0 pgs=0
 cs=0 l=0 c=0x37056e0).accept peer addr is really
 10.1.64.181:0/1558263174 (socket is 10.1.64.181:57649/0)
 2014-04-29 15:36:24.921661 7f5c36476700  0 log [DBG] : reconnect by
 client.884169 10.1.64.181:0/1558263174 after 0.003774
 2014-04-29 15:36:24.921786 7f5c36476700  1 mds.0.12858 reconnect_done
 2014-04-29 15:36:25.109391 7f5c36476700  1 mds.0.12858 handle_mds_map
 i am now mds.0.12858
 2014-04-29 15:36:25.109413 7f5c36476700  1 mds.0.12858 handle_mds_map
 state change up:reconnect -- up:rejoin
 2014-04-29 15:36:25.109417 7f5c36476700  1 mds.0.12858 rejoin_start
 2014-04-29 15:36:26.918067 7f5c36476700  1 mds.0.12858
 rejoin_joint_start
 2014-04-29 15:36:33.520985 7f5c36476700  1 mds.0.12858 rejoin_done
 2014-04-29 15:36:36.252925 7f5c36476700  1 mds.0.12858 handle_mds_map
 i am now mds.0.12858
 2014-04-29 15:36:36.252927 7f5c36476700  1 mds.0.12858 handle_mds_map
 state change up:rejoin -- up:active
 2014-04-29 15:36:36.252932 7f5c36476700  1 mds.0.12858 recovery_done -- 
 successful recovery!
 2014-04-29 15:36:36.745833 7f5c36476700  1 mds.0.12858 active_start
 2014-04-29 15:36:36.987854 7f5c36476700  1 mds.0.12858 cluster recovered.
 2014-04-29 15:36:40.182604 7f5c36476700  0 mds.0.12858
 handle_mds_beacon no longer laggy
 2014-04-29 15:36:57.947441 7f5c2fb3e700  0 -- 10.4.118.23:6800/26401
  10.1.64.181:0/1558263174 pipe(0x2924f5780 sd=41 :6800 s=2 pgs=156
 cs=1 l=0 c=0x37056e0).fault with nothing to send, going to standby
 2014-04-29 15:37:10.534593 7f5c36476700  1 mds.-1.-1 handle_mds_map i
 (10.4.118.23:6800/26401) dne in the mdsmap, respawning myself
 2014-04-29 15:37:10.534604 7f5c36476700  1 mds.-1.-1 respawn
 2014-04-29 15:37:10.534609 7f5c36476700  1 mds.-1.-1  e: '/usr/bin/ceph-mds'
 2014-04-29 15:37:10.534612 7f5c36476700  1 mds.-1.-1  0: '/usr/bin/ceph-mds'
 2014-04-29 15:37:10.534616 7f5c36476700  1 mds.-1.-1  1: '--cluster=ceph'
 2014-04-29 15:37:10.534619 7f5c36476700  1 mds.-1.-1  2: '-i'
 2014-04-29 15:37:10.534621 7f5c36476700  1 mds.-1.-1  3: 'mon03'
 2014-04-29 15:37:10.534623 7f5c36476700  1 mds.-1.-1  4: '-f'
 2014-04-29 15:37:10.534641 7f5c36476700  1 mds.-1.-1  cwd /
 2014-04-29 15:37:12.155458 7f8907c8b780  0 ceph version  (), process
 ceph-mds, pid 26401
 2014-04-29 15:37:12.249780 7f8902d10700  1 mds.-1.0 handle_mds_map
 standby

 p/s. we ran ceph-mon and ceph-mds on same servers, (mon01,mon02,mon03)

 I sent to you two log files, mon01 and mon03 where the scenario of mon03 have 
 state-standby-replay-active-respawned. And also, mon01 which is now 
 running as active as a single MDS at this moment.


After the MDS became ative, it did not send beacon to the monitor. It seems 
like the MDS was busy doing something else. If this issue still happen, set 
debug_mds=10 and send the log to me.

Regards
Yan, Zheng

 Regards,
 Bazli
 -Original Message-
 From: Luke Jing Yuan
 Sent: Tuesday, April 29, 2014 4:46 PM
 To: Yan, Zheng
 Cc: Mohd Bazli Ab Karim; Wong Ming Tat
 Subject: RE: [ceph-users] Ceph mds laggy and failed assert in function
 replay mds/journal.cc

 Hi Zheng,

 Thanks for the information. Actually we encounter another issue, in our 
 original setup, we have 3 MDS running (say mon01, mon02 and mon03), when we 
 do the replay/recovery we did it on mon01. After we completed, we restarted 
 the mds again on mon02 and mon03 (without

[ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

2014-04-24 Thread Mohd Bazli Ab Karim

Dear Ceph-devel, ceph-users,

I am currently facing issue with my ceph mds server. Ceph-mds daemon does not 
want to bring up back.
Tried running that manually with ceph-mds -i mon01 -d but it shows that it 
stucks at failed assert(session) line 1303 in mds/journal.cc and aborted.

Can someone shed some light in this issue.
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

Let me know if I need to send log with debug enabled.

Regards,
Bazli


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.


--
-
-
DISCLAIMER:

This e-mail (including any attachments) is for the addressee(s)
only and may contain confidential information. If you are not the
intended recipient, please note that any dealing, review,
distribution, printing, copying or use of this e-mail is strictly
prohibited. If you have received this email in error, please notify
the sender  immediately and delete the original message.
MIMOS Berhad is a research and development institution under
the purview of the Malaysian Ministry of Science, Technology and
Innovation. Opinions, conclusions and other information in this e-
mail that do not relate to the official business of MIMOS Berhad
and/or its subsidiaries shall be understood as neither given nor
endorsed by MIMOS Berhad and/or its subsidiaries and neither
MIMOS Berhad nor its subsidiaries accepts responsibility for the
same. All liability arising from or in connection with computer
viruses and/or corrupted e-mails is excluded to the fullest extent
permitted by law.

2014-04-25 12:17:27.210367 7f3c30250780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 5492
starting mds.mon01 at :/0
2014-04-25 12:17:27.441530 7f3c2b2d6700  1 mds.-1.0 handle_mds_map standby
2014-04-25 12:17:27.624820 7f3c2b2d6700  1 mds.0.12834 handle_mds_map i am now 
mds.0.12834
2014-04-25 12:17:27.624825 7f3c2b2d6700  1 mds.0.12834 handle_mds_map state 
change up:standby -- up:replay
2014-04-25 12:17:27.624830 7f3c2b2d6700  1 mds.0.12834 replay_start
2014-04-25 12:17:27.624836 7f3c2b2d6700  1 mds.0.12834  recovery set is 
2014-04-25 12:17:27.624837 7f3c2b2d6700  1 mds.0.12834  need osdmap epoch 
29082, have 29081
2014-04-25 12:17:27.624839 7f3c2b2d6700  1 mds.0.12834  waiting for osdmap 
29082 (which blacklists prior instance)
2014-04-25 12:17:30.138623 7f3c2b2d6700  0 mds.0.cache creating system inode 
with ino:100
2014-04-25 12:17:30.138890 7f3c2b2d6700  0 mds.0.cache creating system inode 
with ino:1
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, 
MDSlaveUpdate*)' thread 7f3c26fbd700 time 2014-04-25 12:17:30.441635
mds/journal.cc: 1303: FAILED assert(session)
 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7830) [0x5af890]
 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea]
 3: (MDLog::_replay_thread()+0x678) [0x79dbb8]
 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded]
 5: (()+0x7e9a) [0x7f3c2f675e9a]
 6: (clone()+0x6d) [0x7f3c2e56a3fd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.
2014-04-25 12:17:30.442489 7f3c26fbd700 -1 mds/journal.cc: In function 'void 
EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7f3c26fbd700 time 
2014-04-25 12:17:30.441635
mds/journal.cc: 1303: FAILED assert(session)

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7830) [0x5af890]
 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea]
 3: (MDLog::_replay_thread()+0x678) [0x79dbb8]
 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded]
 5: (()+0x7e9a) [0x7f3c2f675e9a]
 6: (clone()+0x6d) [0x7f3c2e56a3fd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- begin dump of recent events ---
  -172 2014-04-25 12:17:27.208884 7f3c30250780  5 asok(0x1b99000) 
register_command

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-23 Thread Mohd Bazli Ab Karim

Hi Hong,

Could you apply the patch and see if it crash after sleep?
This could lead us to find the correct fix to MDS/client too.

As what I can see here, this patch should fix the crash, but how to fix MDS if 
the crash happens?
It happened to us, when it crashed, it was totally crash, and even restart the 
ceph-mds service with --reset-journal also not helping.
Anyone can shed some lights on this matter?

p/s: Is there any steps/tools to backup the MDS metadata? Say if MDS crash and 
refuse to run normally, can we restore the backup metadata? I'm thinking of it 
as a preventive steps, just in case if it happens again in future.

Many thanks.
Bazli

-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Sunday, March 23, 2014 2:53 PM
To: Sage Weil
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

On Sun, Mar 23, 2014 at 11:47 AM, Sage Weil s...@inktank.com wrote:
 Hi,

 I looked at this a bit earlier and wasn't sure why we would be getting
 a remote_reset event after a sleep/wake cycle.  The patch should fix
 the crash, but I'm a bit worried something is not quite right on the
 client side, too...


When client wakes up, it first tries reconnecting the old session. MDS refuses 
the reconnect request and sends a session close message to the client. After 
receiving the session close message, client closes the old session, then sends 
a session open message to the MDS.  The MDS receives the open request and 
triggers a remote reset
(Pipe.cc:466)

 sage

 On Sun, 23 Mar 2014, Yan, Zheng wrote:

 thank you for reporting this. Below patch should fix this issue

 ---
 diff --git a/src/mds/MDS.cc b/src/mds/MDS.cc index 57c7f4a..6b53c14
 100644
 --- a/src/mds/MDS.cc
 +++ b/src/mds/MDS.cc
 @@ -2110,6 +2110,7 @@ bool MDS::ms_handle_reset(Connection *con)
if (session-is_closed()) {
   dout(3)  ms_handle_reset closing connection for session  
 session-info.inst  dendl;
   messenger-mark_down(con);
 + con-set_priv(NULL);
   sessionmap.remove_session(session);
}
session-put();
 @@ -2138,6 +2139,7 @@ void MDS::ms_handle_remote_reset(Connection *con)
if (session-is_closed()) {
   dout(3)  ms_handle_remote_reset closing connection for session 
  session-info.inst  dendl;
   messenger-mark_down(con);
 + con-set_priv(NULL);
   sessionmap.remove_session(session);
}
session-put();

 On Fri, Mar 21, 2014 at 4:16 PM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
  Hi Hong,
 
 
 
  How's the client now? Would it able to mount to the filesystem now?
  It looks similar to our case,
  http://www.spinics.net/lists/ceph-devel/msg18395.html
 
  However, you need to collect some logs to confirm this.
 
 
 
  Thanks.
 
 
 
 
 
  From: hjcho616 [mailto:hjcho...@yahoo.com]
  Sent: Friday, March 21, 2014 2:30 PM
 
 
  To: Luke Jing Yuan
  Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] MDS crash when client goes to sleep
 
 
 
  Luke,
 
 
 
  Not sure what flapping ceph-mds daemon mean, but when I connected
  to MDS when this happened there no longer was any process with
  ceph-mds when I ran one daemon.  When I ran three there were one
  left but wasn't doing much.  I didn't record the logs but behavior
  was very similar in 0.72 emperor.  I am using debian packages.
 
 
 
  Client went to sleep for a while (like 8+ hours).  There was no I/O
  prior to the sleep other than the fact that cephfs was still mounted.
 
 
 
  Regards,
 
  Hong
 
 
 
  
 
  From: Luke Jing Yuan jyl...@mimos.my
 
 
  To: hjcho616 hjcho...@yahoo.com
  Cc: Mohd Bazli Ab Karim bazli.abka...@mimos.my;
  ceph-users@lists.ceph.com ceph-users@lists.ceph.com
  Sent: Friday, March 21, 2014 1:17 AM
 
  Subject: RE: [ceph-users] MDS crash when client goes to sleep
 
 
  Hi Hong,
 
  That's interesting, for Mr. Bazli and I, we ended with MDS stuck in
  (up:replay) and a flapping ceph-mds daemon, but then again we are
  using version 0.72.2. Having said so the triggering point seem
  similar to us as well, which is the following line:
 
-38 2014-03-20 20:08:44.495565 7fee3d7c4700  0 --
  192.168.1.20:6801/17079
  192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0
  cs=0 l=0
  c=0x1f0e2160).accept we reset (peer sent cseq 2), sending
  RESETSESSION
 
  So how long did your client go into sleep? Was there any I/O prior
  to the sleep?
 
  Regards,
  Luke
 
  From: hjcho616 [mailto:hjcho...@yahoo.com]
  Sent: Friday, 21 March, 2014 12:09 PM
  To: Luke Jing Yuan
  Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] MDS crash when client goes to sleep
 
  Nope just these segfaults.
 
  [149884.709608] ceph-mds[17366]: segfault at 200 ip
  7f09de9d60b8 sp
  7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
  [211263.265402] ceph-mds[17135]: segfault at 200 ip
  7f59eec280b8 sp
  7f59eb6b3520 error 4 in libgcc_s.so

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-21 Thread Mohd Bazli Ab Karim

Hi Hong,

How's the client now? Would it able to mount to the filesystem now? It looks 
similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
However, you need to collect some logs to confirm this.

Thanks.


From: hjcho616 [mailto:hjcho...@yahoo.com]
Sent: Friday, March 21, 2014 2:30 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Luke,

Not sure what flapping ceph-mds daemon mean, but when I connected to MDS when 
this happened there no longer was any process with ceph-mds when I ran one 
daemon.  When I ran three there were one left but wasn't doing much.  I didn't 
record the logs but behavior was very similar in 0.72 emperor.  I am using 
debian packages.

Client went to sleep for a while (like 8+ hours).  There was no I/O prior to 
the sleep other than the fact that cephfs was still mounted.

Regards,
Hong


From: Luke Jing Yuan jyl...@mimos.mymailto:jyl...@mimos.my
To: hjcho616 hjcho...@yahoo.commailto:hjcho...@yahoo.com
Cc: Mohd Bazli Ab Karim 
bazli.abka...@mimos.mymailto:bazli.abka...@mimos.my; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Sent: Friday, March 21, 2014 1:17 AM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,

That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay) 
and a flapping ceph-mds daemon, but then again we are using version 0.72.2. 
Having said so the triggering point seem similar to us as well, which is the 
following line:

  -38 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079  
192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION

So how long did your client go into sleep? Was there any I/O prior to the sleep?

Regards,
Luke

From: hjcho616 [mailto:hjcho...@yahoo.commailto:hjcho...@yahoo.com]
Sent: Friday, 21 March, 2014 12:09 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Nope just these segfaults.

[149884.709608] ceph-mds[17366]: segfault at 200 ip 7f09de9d60b8 sp 
7f09db461520 error 4 in libgcc_s.so.1[7f09de9c7000+15000]
[211263.265402] ceph-mds[17135]: segfault at 200 ip 7f59eec280b8 sp 
7f59eb6b3520 error 4 in libgcc_s.so.1[7f59eec19000+15000]
[214638.927759] ceph-mds[16896]: segfault at 200 ip 7fcb2c89e0b8 sp 
7fcb29329520 error 4 in libgcc_s.so.1[7fcb2c88f000+15000]
[289338.461271] ceph-mds[20878]: segfault at 200 ip 7f4b7211c0b8 sp 
7f4b6eba7520 error 4 in libgcc_s.so.1[7f4b7210d000+15000]
[373738.961475] ceph-mds[21341]: segfault at 200 ip 7f36c3d480b8 sp 
7f36c07d3520 error 4 in libgcc_s.so.1[7f36c3d39000+15000]

Regards,
Hong


From: Luke Jing Yuan jyl...@mimos.mymailto:jyl...@mimos.my
To: hjcho616 hjcho...@yahoo.commailto:hjcho...@yahoo.com
Cc: Mohd Bazli Ab Karim 
bazli.abka...@mimos.mymailto:bazli.abka...@mimos.my; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Sent: Thursday, March 20, 2014 10:53 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep

Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like 
that?

Regards,
Luke

On Mar 21, 2014, at 11:09 AM, hjcho616 
hjcho...@yahoo.commailto:hjcho...@yahoo.com wrote:
On client, I was no longer able to access the filesystem.  It would hang.  
Makes sense since MDS has crashed.  I tried running 3 MDS demon on the same 
machine.  Two crashes and one appears to be hung up(?). ceph health says MDS is 
in degraded state when that happened.

I was able to recover by restarting every node.  I currently have three 
machine, one with MDS and MON, and two with OSDs.

It is failing everytime my client machine goes to sleep.  If you need me to run 
something let me know what and how.

Regards,
Hong


From: Mohd Bazli Ab Karim 
bazli.abka...@mimos.mymailto:bazli.abka...@mimos.my
To: hjcho616 hjcho...@yahoo.commailto:hjcho...@yahoo.com; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.commailto:ceph

Re: [ceph-users] MDS crash when client goes to sleep

2014-03-20 Thread Mohd Bazli Ab Karim

Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to 
recover from replay?
We also facing this issue and I am interested to know on how to reproduce it.

Thanks.
Bazli

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep

When CephFS is mounted on a client and when client decides to go to sleep, MDS 
segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened 
in emperor and recent 0.77 release.  I am running Debian Wheezy with testing 
kernels 3.13.  What can I do to not crash the whole system if a client goes to 
sleep (and looks like disconnect may do the same)? Let me know if you need any 
more info.

Regards,
Hong

   -43 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 
-- 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 
-- ?+0 0x1ee9f080 con 0x2e56580
   -42 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 
== mon.0 192.168.1.20:6789/0 21764  mdsbeacon(6798/MDS1.2 up:active seq 
21120 v6970) v2  108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache 
check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, 
baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 
caps per inode
   -40 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
 :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 
192.168.1.101:52026/0
   -39 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52026/0)
   -38 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 
c=0x1f0e2160).fault 0: Success
   -36 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 
192.168.1.101:0/2113152127
   -35 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset 
closing connection for session client.6019 192.168.1.101:0/2113152127
   -34 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 
mark_down 0x1f0e2160 -- pipe dne
   -33 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 
 :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 
192.168.1.101:52027/0
   -32 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 
c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 
192.168.1.101:52027/0)
   -31 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 
== client.6019 192.168.1.101:0/2113152127 1  client_reconnect(77349 caps) 
v2  0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in 
reconnect state, ignoring reconnect, sending close
   -29 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect 
attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 
2014-03-20 20:08:44.925679 (allowed interval 45)
   -28 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 
-- 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 con 
0x1f0e22c0
   -27 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).reader couldn't read tag, Success
   -26 2014-03-20 20:08:44.927797 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault 0: Success
   -25 2014-03-20 20:08:44.927849 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 
 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 
c=0x1f0e22c0).fault, server, going to standby
   -24 2014-03-20 20:08:46.372279 7fee401d2700 10 monclient: tick
   -23 2014-03-20 20:08:46.372339 7fee401d2700 10 monclient: 
_check_auth_rotating have uptodate secrets (they expire after 2014-03-20 
20:08:16.372333)
   -22 2014-03-20 20:08:46.372373 7fee401d2700 10 monclient: renew subs? (now: 
2014-03-20 20:08:46.372372; renew after: 2014-03-20 20:09:56.370811) -- no
   -21 2014-03-20 20:08:46.372403 7fee401d2700 10  log_queue is 1 last_log 2 
sent 1 num 1 unsent 1 sending 1
   -20 2014-03-20 20:08:46.372421 7fee401d2700 10

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

[ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

Re: [ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

[ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

Re: [ceph-users] MDS crash when client goes to sleep

Re: [ceph-users] MDS crash when client goes to sleep

Re: [ceph-users] MDS crash when client goes to sleep

9 matches

Site Navigation

Mail list logo

Footer information