Re: [ceph-users] OSD Marked down unable to restart continuously failing

2020-01-11 Thread Eugen Block

Hi,

you say the daemons are locally up and running but restarting fails?  
Which one is it?
Do you see any messages suggesting flapping OSDs? After 5 retries  
within 10 minutes the OSDs would be marked out. What is the result of  
your checks for iostat etc.? Anything pointing to a high load on the  
OSD node?


Regards,
Eugen


Zitat von Radhakrishnan2 S :


Can someone please help to respond to the below query ?

Regards
Radha Krishnan S
TCS Enterprise Cloud Practice
Tata Consultancy Services
Cell:- +1 848 466 4870
Mailto: radhakrishnan...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting



-Radhakrishnan2 S/CHN/TCS wrote: -
To: "Ceph Users" 
From: Radhakrishnan2 S/CHN/TCS
Date: 01/09/2020 08:34AM
Subject: OSD Marked down unable to restart continuously failing

Hello Everyone,

One of the OSD node out of 16 has 12 OSD's with a bcache as NVMe,  
locally those osd daemons seem to be up and running, while the ceph  
osd tree shows them as down. Logs show that OSD's have struck IO for  
over 4096 sec.


I tried checking for iostat, netstat, ceph -w  along with the logs.  
Is there a way to identify why this happening ? In addition, when I  
restart the OSD daemons on the respective OSD node, restart is  
failing. Any quick help pls.


Regards
Radha Krishnan S
TCS Enterprise Cloud Practice
Tata Consultancy Services
Cell:- +1 848 466 4870
Mailto: radhakrishnan...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting



-"ceph-users"  wrote: -
To: d.aber...@profihost.ag, "Janne Johansson" 
From: "Wido den Hollander"
Sent by: "ceph-users"
Date: 01/09/2020 08:19AM
Cc: "Ceph Users" , a.bra...@profihost.ag,  
"p.kra...@profihost.ag" , j.kr...@profihost.ag

Subject: Re: [ceph-users] Looking for experience

"External email. Open with Caution"


On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote:


Am 09.01.20 um 13:39 schrieb Janne Johansson:


I'm currently trying to workout a concept for a ceph cluster which can
be used as a target for backups which satisfies the following
requirements:

- approx. write speed of 40.000 IOP/s and 2500 Mbyte/s


You might need to have a large (at least non-1) number of writers to get
to that sum of operations, as opposed to trying to reach it with one
single stream written from one single client.



We are aiming for about 100 writers.


So if I read it correctly the writes will be 64k each.

That should be doable, but you probably want something like NVMe for DB+WAL.

You might want to tune that larger writes also go into the WAL to speed
up the ingress writes. But you mainly want more spindles then less.

Wido



Cheers
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Marked down unable to restart continuously failing

2020-01-10 Thread Radhakrishnan2 S
Can someone please help to respond to the below query ? 

Regards
Radha Krishnan S
TCS Enterprise Cloud Practice
Tata Consultancy Services
Cell:- +1 848 466 4870
Mailto: radhakrishnan...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting



-Radhakrishnan2 S/CHN/TCS wrote: -
To: "Ceph Users" 
From: Radhakrishnan2 S/CHN/TCS
Date: 01/09/2020 08:34AM
Subject: OSD Marked down unable to restart continuously failing

Hello Everyone, 

One of the OSD node out of 16 has 12 OSD's with a bcache as NVMe, locally those 
osd daemons seem to be up and running, while the ceph osd tree shows them as 
down. Logs show that OSD's have struck IO for over 4096 sec. 

I tried checking for iostat, netstat, ceph -w  along with the logs. Is there a 
way to identify why this happening ? In addition, when I restart the OSD 
daemons on the respective OSD node, restart is failing. Any quick help pls.

Regards
Radha Krishnan S
TCS Enterprise Cloud Practice
Tata Consultancy Services
Cell:- +1 848 466 4870
Mailto: radhakrishnan...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting



-"ceph-users"  wrote: -
To: d.aber...@profihost.ag, "Janne Johansson" 
From: "Wido den Hollander" 
Sent by: "ceph-users" 
Date: 01/09/2020 08:19AM
Cc: "Ceph Users" , a.bra...@profihost.ag, 
"p.kra...@profihost.ag" , j.kr...@profihost.ag
Subject: Re: [ceph-users] Looking for experience

"External email. Open with Caution"


On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote:
> 
> Am 09.01.20 um 13:39 schrieb Janne Johansson:
>>
>> I'm currently trying to workout a concept for a ceph cluster which can
>> be used as a target for backups which satisfies the following
>> requirements:
>>
>> - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s
>>
>>
>> You might need to have a large (at least non-1) number of writers to get
>> to that sum of operations, as opposed to trying to reach it with one
>> single stream written from one single client. 
> 
> 
> We are aiming for about 100 writers.

So if I read it correctly the writes will be 64k each.

That should be doable, but you probably want something like NVMe for DB+WAL.

You might want to tune that larger writes also go into the WAL to speed
up the ingress writes. But you mainly want more spindles then less.

Wido

> 
> Cheers
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Marked down unable to restart continuously failing

2020-01-09 Thread Radhakrishnan2 S
Hello Everyone, 

One of the OSD node out of 16 has 12 OSD's with a bcache as NVMe, locally those 
osd daemons seem to be up and running, while the ceph osd tree shows them as 
down. Logs show that OSD's have struck IO for over 4096 sec. 

I tried checking for iostat, netstat, ceph -w  along with the logs. Is there a 
way to identify why this happening ? In addition, when I restart the OSD 
daemons on the respective OSD node, restart is failing. Any quick help pls.

Regards
Radha Krishnan S
TCS Enterprise Cloud Practice
Tata Consultancy Services
Cell:- +1 848 466 4870
Mailto: radhakrishnan...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting



-"ceph-users"  wrote: -
To: d.aber...@profihost.ag, "Janne Johansson" 
From: "Wido den Hollander" 
Sent by: "ceph-users" 
Date: 01/09/2020 08:19AM
Cc: "Ceph Users" , a.bra...@profihost.ag, 
"p.kra...@profihost.ag" , j.kr...@profihost.ag
Subject: Re: [ceph-users] Looking for experience

"External email. Open with Caution"


On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote:
> 
> Am 09.01.20 um 13:39 schrieb Janne Johansson:
>>
>> I'm currently trying to workout a concept for a ceph cluster which can
>> be used as a target for backups which satisfies the following
>> requirements:
>>
>> - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s
>>
>>
>> You might need to have a large (at least non-1) number of writers to get
>> to that sum of operations, as opposed to trying to reach it with one
>> single stream written from one single client. 
> 
> 
> We are aiming for about 100 writers.

So if I read it correctly the writes will be 64k each.

That should be doable, but you probably want something like NVMe for DB+WAL.

You might want to tune that larger writes also go into the WAL to speed
up the ingress writes. But you mainly want more spindles then less.

Wido

> 
> Cheers
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com