Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-21 Thread Mohd Bazli Ab Karim
Hi all,

Our MDS still fine today. Thanks everyone!

Regards,
Bazli

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mohd Bazli Ab Karim
Sent: Monday, January 19, 2015 11:38 AM
To: John Spray
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: RE: MDS aborted after recovery and active, FAILED assert (r =0)

Hi John,

Good shot!
I've increased the osd_max_write_size to 1GB (still smaller than osd journal 
size) and now the mds still running fine after an hour.
Now checking if fs still accessible or not. Will update from time to time.

Thanks again John.

Regards,
Bazli


-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Friday, January 16, 2015 11:58 PM
To: Mohd Bazli Ab Karim
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

It has just been pointed out to me that you can also workaround this issue on 
your existing system by increasing the osd_max_write_size setting on your OSDs 
(default 90MB) to something higher, but still smaller than your osd journal 
size.  That might get you on a path to having an accessible filesystem before 
you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc:
 In function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-18 Thread Mohd Bazli Ab Karim
Hi John,

Good shot!
I've increased the osd_max_write_size to 1GB (still smaller than osd journal 
size) and now the mds still running fine after an hour.
Now checking if fs still accessible or not. Will update from time to time.

Thanks again John.

Regards,
Bazli


-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Friday, January 16, 2015 11:58 PM
To: Mohd Bazli Ab Karim
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

It has just been pointed out to me that you can also workaround this issue on 
your existing system by increasing the osd_max_write_size setting on your OSDs 
(default 90MB) to something higher, but still smaller than your osd journal 
size.  That might get you on a path to having an accessible filesystem before 
you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc:
 In function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Wido den Hollander
On 01/16/2015 08:37 AM, Mohd Bazli Ab Karim wrote:
 Dear Ceph-Users, Ceph-Devel,
 
 Apologize me if you get double post of this email.
 
 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.
 

In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:
 
 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.19 10.4.118.32:6821/243161 73  osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.18 10.4.118.32:6818/243072 67  osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.47 10.4.118.35:6809/8290 79  osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
 too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
 function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)
 
  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.
 
 Thanks.
 Bazli
 
 
 DISCLAIMER:
 
 
 This e-mail (including any attachments) is for the addressee(s) only and may 
 be confidential, especially as regards personal data. If you are not the 
 intended recipient, please note that any dealing, review, distribution, 
 printing, copying or use of this e-mail is strictly prohibited. If you have 
 received this email in error, please notify the sender immediately and delete 
 the original message (including any attachments).
 
 
 MIMOS Berhad is a research and development institution under the purview of 
 the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
 conclusions and other information in this e-mail that do not relate to the 
 official business of MIMOS Berhad and/or its subsidiaries shall be understood 
 as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
 neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
 same. All liability arising from or in connection with computer viruses 
 and/or corrupted e-mails is excluded to the fullest extent permitted by law.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Mohd Bazli Ab Karim
Dear Ceph-Users, Ceph-Devel,

Apologize me if you get double post of this email.

I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 down 
and only 1 up) at the moment.
Plus I have one CephFS client mounted to it.

Now, the MDS always get aborted after recovery and active for 4 secs.
Some parts of the log are as below:

-3 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.19 10.4.118.32:6821/243161 73  osd_op_re
ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
7770bc80 con 0x69c7dc0
-2 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.18 10.4.118.32:6818/243072 67  osd_op_re
ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 0) 
v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
0x1c6bb00
-1 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 == 
osd.47 10.4.118.35:6809/8290 79  osd_op_repl
y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
a00 con 0x1c6b9a0
 0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In function 
'void MDSTable::save_2(int, version_t)' thread 7
fbcc8226700 time 2015-01-15 14:10:28.46
mds/MDSTable.cc: 83: FAILED assert(r = 0)

 ceph version  ()
 1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
 2: (Context::complete(int)+0x9) [0x568d29]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
 4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
 5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
 6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
 7: (DispatchQueue::entry()+0x549) [0x975739]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
 9: (()+0x7e9a) [0x7fbcccb0de9a]
 10: (clone()+0x6d) [0x7fbccb4ba3fd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

Is there any workaround/patch to fix this issue? Let me know if need to see the 
log with debug-mds of certain level as well.
Any helps would be very much appreciated.

Thanks.
Bazli


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Mohd Bazli Ab Karim
Agree. I was about to upgrade to 0.90, but has postponed it due to this error.
Any chance for me to recover it first before upgrading it?

Thanks Wido.

Regards,
Bazli

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Wido den Hollander
Sent: Friday, January 16, 2015 3:50 PM
To: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

On 01/16/2015 08:37 AM, Mohd Bazli Ab Karim wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.


In Ceph world 0.72.2 is ancient en pretty old. If you want to play with CephFS 
I recommend you upgrade to 0.90 and also use at least kernel 3.18

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In
 function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli

 
 DISCLAIMER:


 This e-mail (including any attachments) is for the addressee(s) only and may 
 be confidential, especially as regards personal data. If you are not the 
 intended recipient, please note that any dealing, review, distribution, 
 printing, copying or use of this e-mail is strictly prohibited. If you have 
 received this email in error, please notify the sender immediately and delete 
 the original message (including any attachments).


 MIMOS Berhad is a research and development institution under the purview of 
 the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
 conclusions and other information in this e-mail that do not relate to the 
 official business of MIMOS Berhad and/or its subsidiaries shall be understood 
 as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
 neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
 same. All liability arising from or in connection with computer viruses 
 and/or corrupted e-mails is excluded to the fullest extent permitted by law.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any 

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread John Spray
Hmm, upgrading should help here, as the problematic data structure
(anchortable) no longer exists in the latest version.  I haven't
checked, but hopefully we don't try to write it during upgrades.

The bug you're hitting is more or less the same as a similar one we
have with the sessiontable in the latest ceph, but you won't hit it
there unless you're very unlucky!

John

On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.19 10.4.118.32:6821/243161 73  osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.18 10.4.118.32:6818/243072 67  osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.47 10.4.118.35:6809/8290 79  osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
 too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
 function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli

 
 DISCLAIMER:


 This e-mail (including any attachments) is for the addressee(s) only and may 
 be confidential, especially as regards personal data. If you are not the 
 intended recipient, please note that any dealing, review, distribution, 
 printing, copying or use of this e-mail is strictly prohibited. If you have 
 received this email in error, please notify the sender immediately and delete 
 the original message (including any attachments).


 MIMOS Berhad is a research and development institution under the purview of 
 the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
 conclusions and other information in this e-mail that do not relate to the 
 official business of MIMOS Berhad and/or its subsidiaries shall be understood 
 as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
 neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
 same. All liability arising from or in connection with computer viruses 
 and/or corrupted e-mails is excluded to the fullest extent permitted by law.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread John Spray
It has just been pointed out to me that you can also workaround this
issue on your existing system by increasing the osd_max_write_size
setting on your OSDs (default 90MB) to something higher, but still
smaller than your osd journal size.  That might get you on a path to
having an accessible filesystem before you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.19 10.4.118.32:6821/243161 73  osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.18 10.4.118.32:6818/243072 67  osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
 == osd.47 10.4.118.35:6809/8290 79  osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
 too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
 function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli

 
 DISCLAIMER:


 This e-mail (including any attachments) is for the addressee(s) only and may 
 be confidential, especially as regards personal data. If you are not the 
 intended recipient, please note that any dealing, review, distribution, 
 printing, copying or use of this e-mail is strictly prohibited. If you have 
 received this email in error, please notify the sender immediately and 
 delete the original message (including any attachments).


 MIMOS Berhad is a research and development institution under the purview of 
 the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
 conclusions and other information in this e-mail that do not relate to the 
 official business of MIMOS Berhad and/or its subsidiaries shall be 
 understood as neither given nor endorsed by MIMOS Berhad and/or its 
 subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts 
 responsibility for the same. All liability arising from or in connection 
 with computer viruses and/or corrupted e-mails is excluded to the fullest 
 extent permitted by law.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Lindsay Mathieson
On Fri, 16 Jan 2015 08:48:38 AM Wido den Hollander wrote:
 In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
 CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

Does the kernel version matter if you are using ceph-fuse?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Yan, Zheng
On Sat, Jan 17, 2015 at 11:47 AM, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:
 On Fri, 16 Jan 2015 08:48:38 AM Wido den Hollander wrote:
 In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
 CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

 Does the kernel version matter if you are using ceph-fuse?


no, kernel version does not matter if you use ceph-fuse

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com