Re: [ceph-users] HA and data recovery of CEPH

2019-12-11 Thread Peng Bo
Thanks to all, now we can make that duration to 25 seconds around, this is the best result as we can. BR On Tue, Dec 3, 2019 at 10:30 PM Wido den Hollander wrote: > > > On 12/3/19 3:07 PM, Aleksey Gutikov wrote: > > > >> That is true. When an OSD goes down it will take a few seconds for it's >

Re: [ceph-users] HA and data recovery of CEPH

2019-12-03 Thread Wido den Hollander
On 12/3/19 3:07 PM, Aleksey Gutikov wrote: That is true. When an OSD goes down it will take a few seconds for it's Placement Groups to re-peer with the other OSDs. During that period writes to those PGs will stall for a couple of seconds. I wouldn't say it's 40s, but it can take ~10s.

Re: [ceph-users] HA and data recovery of CEPH

2019-12-03 Thread Aleksey Gutikov
That is true. When an OSD goes down it will take a few seconds for it's Placement Groups to re-peer with the other OSDs. During that period writes to those PGs will stall for a couple of seconds. I wouldn't say it's 40s, but it can take ~10s. Hello, According to my experience, in case of

Re: [ceph-users] HA and data recovery of CEPH

2019-11-28 Thread Wido den Hollander
On 11/29/19 6:28 AM, jes...@krogh.cc wrote: > Hi Nathan > > Is that true? > > The time it takes to reallocate the primary pg delivers “downtime” by > design.  right? Seen from a writing clients perspective  > That is true. When an OSD goes down it will take a few seconds for it's Placement

Re: [ceph-users] HA and data recovery of CEPH

2019-11-28 Thread h...@portsip.cn
"rule_id": 1, "rule_name": "myfs-metadata", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ {

Re: [ceph-users] HA and data recovery of CEPH

2019-11-28 Thread jesper
Hi Nathan Is that true? The time it takes to reallocate the primary pg delivers “downtime” by design.   right? Seen from a writing clients perspective  Jesper Sent from myMail for iOS Friday, 29 November 2019, 06.24 +0100 from pen...@portsip.com : >Hi Nathan,  > >Thanks for the help.

Re: [ceph-users] HA and data recovery of CEPH

2019-11-28 Thread Peng Bo
Hi Nathan, Thanks for the help. My colleague will provide more details. BR On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish wrote: > If correctly configured, your cluster should have zero downtime from a > single OSD or node failure. What is your crush map? Are you using > replica or EC? If your

Re: [ceph-users] HA and data recovery of CEPH

2019-11-28 Thread Nathan Fish
If correctly configured, your cluster should have zero downtime from a single OSD or node failure. What is your crush map? Are you using replica or EC? If your 'min_size' is not smaller than 'size', then you will lose availability. On Thu, Nov 28, 2019 at 10:50 PM Peng Bo wrote: > > Hi all, > >

[ceph-users] HA and data recovery of CEPH

2019-11-28 Thread Peng Bo
Hi all, We are working on use CEPH to build our HA system, the purpose is the system should always provide service even a node of CEPH is down or OSD is lost. Currently, as we practiced once a node/OSD is down, the CEPH cluster needs to take about 40 seconds to sync data, our system can't