Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. > C.A. > Sent: 23 September 2016 10:38 > To: n...@fisk.me.uk; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3 > > ummmok. > > and, how would the affected PG recover, just replacing the affected OSD/DISK? > or would the affected PG migrate to othe OSD/DISK? Yes, Ceph would start recovering the PG's to other OSD's. But until your PG size=min_size then IO will be blocked. > > thx > > On 23/09/16 10:56, Nick Fisk wrote: > > > >> -Original Message- > >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > >> Ja. C.A. > >> Sent: 23 September 2016 09:50 > >> To: ceph-users@lists.ceph.com > >> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3 > >> > >> Hi > >> > >> with rep_size=2 and min_size=2, what drawbacks are removed compared > >> with > >> rep_size=2 and min_size=1? > > If you lose a disk, everything will stop working until the affected PG's > > are at size=2 again. > > > >> thx > >> J. > >> > >> On 23/09/16 10:07, Wido den Hollander wrote: > >>>> Op 23 september 2016 om 10:04 schreef mj : > >>>> > >>>> > >>>> Hi, > >>>> > >>>> On 09/23/2016 09:41 AM, Dan van der Ster wrote: > >>>>>> If you care about your data you run with size = 3 and min_size = 2. > >>>>>> > >>>>>> Wido > >>>> We're currently running with min_size 1. Can we simply change this, > >>>> online, with: > >>>> > >>>> ceph osd pool set vm-storage min_size 2 > >>>> > >>>> and expect everything to continue running? > >>>> > >>> Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph > >>> that 2 replicas need to be online for I/O (Read and Write) > > to > >> continue. > >>> Wido > >>> > >>>> (our cluster is HEALTH_OK, enough disk space, etc, etc) > >>>> > >>>> MJ > >>>> ___ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
ummmok. and, how would the affected PG recover, just replacing the affected OSD/DISK? or would the affected PG migrate to othe OSD/DISK? thx On 23/09/16 10:56, Nick Fisk wrote: > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. >> C.A. >> Sent: 23 September 2016 09:50 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3 >> >> Hi >> >> with rep_size=2 and min_size=2, what drawbacks are removed compared with >> rep_size=2 and min_size=1? > If you lose a disk, everything will stop working until the affected PG's are > at size=2 again. > >> thx >> J. >> >> On 23/09/16 10:07, Wido den Hollander wrote: >>>> Op 23 september 2016 om 10:04 schreef mj : >>>> >>>> >>>> Hi, >>>> >>>> On 09/23/2016 09:41 AM, Dan van der Ster wrote: >>>>>> If you care about your data you run with size = 3 and min_size = 2. >>>>>> >>>>>> Wido >>>> We're currently running with min_size 1. Can we simply change this, >>>> online, with: >>>> >>>> ceph osd pool set vm-storage min_size 2 >>>> >>>> and expect everything to continue running? >>>> >>> Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 >>> replicas need to be online for I/O (Read and Write) > to >> continue. >>> Wido >>> >>>> (our cluster is HEALTH_OK, enough disk space, etc, etc) >>>> >>>> MJ >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. > C.A. > Sent: 23 September 2016 09:50 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3 > > Hi > > with rep_size=2 and min_size=2, what drawbacks are removed compared with > rep_size=2 and min_size=1? If you lose a disk, everything will stop working until the affected PG's are at size=2 again. > > thx > J. > > On 23/09/16 10:07, Wido den Hollander wrote: > >> Op 23 september 2016 om 10:04 schreef mj : > >> > >> > >> Hi, > >> > >> On 09/23/2016 09:41 AM, Dan van der Ster wrote: > >>>> If you care about your data you run with size = 3 and min_size = 2. > >>>> > >>>> Wido > >> We're currently running with min_size 1. Can we simply change this, > >> online, with: > >> > >> ceph osd pool set vm-storage min_size 2 > >> > >> and expect everything to continue running? > >> > > Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 > > replicas need to be online for I/O (Read and Write) to > continue. > > > > Wido > > > >> (our cluster is HEALTH_OK, enough disk space, etc, etc) > >> > >> MJ > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
Hi with rep_size=2 and min_size=2, what drawbacks are removed compared with rep_size=2 and min_size=1? thx J. On 23/09/16 10:07, Wido den Hollander wrote: >> Op 23 september 2016 om 10:04 schreef mj : >> >> >> Hi, >> >> On 09/23/2016 09:41 AM, Dan van der Ster wrote: If you care about your data you run with size = 3 and min_size = 2. Wido >> We're currently running with min_size 1. Can we simply change this, >> online, with: >> >> ceph osd pool set vm-storage min_size 2 >> >> and expect everything to continue running? >> > Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 > replicas need to be online for I/O (Read and Write) to continue. > > Wido > >> (our cluster is HEALTH_OK, enough disk space, etc, etc) >> >> MJ >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> Op 23 september 2016 om 10:04 schreef mj : > > > Hi, > > On 09/23/2016 09:41 AM, Dan van der Ster wrote: > >> If you care about your data you run with size = 3 and min_size = 2. > >> > >> Wido > > We're currently running with min_size 1. Can we simply change this, > online, with: > > ceph osd pool set vm-storage min_size 2 > > and expect everything to continue running? > Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 replicas need to be online for I/O (Read and Write) to continue. Wido > (our cluster is HEALTH_OK, enough disk space, etc, etc) > > MJ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
Hi, On 09/23/2016 09:41 AM, Dan van der Ster wrote: If you care about your data you run with size = 3 and min_size = 2. Wido We're currently running with min_size 1. Can we simply change this, online, with: ceph osd pool set vm-storage min_size 2 and expect everything to continue running? (our cluster is HEALTH_OK, enough disk space, etc, etc) MJ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
On Fri, Sep 23, 2016 at 9:29 AM, Wido den Hollander wrote: > > > > Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko > > : > > > > > > Hi, > > > > biggest issue with replica size 2 is that if you find an inconsistent > > object you will not be able to tell which copy is the correct one. With > > replica size 3 you could assume that those 2 copies that are the same > > are correct. > > > > Until Ceph guarantees stored data integrity (that is - until we have > > production-ready Bluestore), I would not go with replica size 2. > > > > Not only that, but the same could happen if you have flapping OSDs. > > OSD 0 and 1 share a PG. > > 0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 > comes up. 0 becomes primary, but the PG is 'down' because 1 had the last > data. You really need 1 to come back in this case before the PG will work > again. > > I have seen this happen multiple times in systems which got overloaded. > > If you care about your data you run with size = 3 and min_size = 2. > > Wido FWIW, when Intel presented their reference architectures at Ceph Day Switzerland, their "IOPS-Optimized" config had 2 replicas on "Intel SSD DC Series". I guess they trust their hardware. But personally even if I was forced to run 2x replicas, I'd try to use size=2, min_size=2. -- Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko > : > > > Hi, > > biggest issue with replica size 2 is that if you find an inconsistent > object you will not be able to tell which copy is the correct one. With > replica size 3 you could assume that those 2 copies that are the same > are correct. > > Until Ceph guarantees stored data integrity (that is - until we have > production-ready Bluestore), I would not go with replica size 2. > Not only that, but the same could happen if you have flapping OSDs. OSD 0 and 1 share a PG. 0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 comes up. 0 becomes primary, but the PG is 'down' because 1 had the last data. You really need 1 to come back in this case before the PG will work again. I have seen this happen multiple times in systems which got overloaded. If you care about your data you run with size = 3 and min_size = 2. Wido > On 23.09.2016 09:02, Götz Reinicke - IT Koordinator wrote: > > Hi, > > > > Am 23.09.16 um 05:55 schrieb Zhongyan Gu: > >> Hi there, > >> the default rbd pool replica size is 3. However, I found that in our > >> all ssd environment, capacity become a cost issue. We want to save > >> more capacity. So one option is change the replica size from 3 to 2. > >> anyone can share the experience of pros vs cons regarding replica size > >> 2 vs 3? > > from my (still limited) POV, one main aspect is: how reliabel is your > > hardware if you think off this? How often will a disk break, a server > > crash, a datacenter burn down, a networkswitch fail? And if there is a > > failure, how fast could that broken part be replaced or how fast is your > > availabel hardware to replicate the lost OSD to the remaining system. > > > > I dont have numbers, but for our first initial cluster we go as well > > with a repl size of 2 and I dont have bad feelings yet when i look at > > the server and network infrastrukture we got. > > > > Others with more experiacne will give some other hints and may be > > numbers. I never found some sort of calculator which can say "Oh you get > > this hardware? Than a repl size of x y z is what you need." > > > > HTH a bit . Regards . Götz > > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Tomasz Kuzemko > tomasz.kuze...@corp.ovh.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
Hi, biggest issue with replica size 2 is that if you find an inconsistent object you will not be able to tell which copy is the correct one. With replica size 3 you could assume that those 2 copies that are the same are correct. Until Ceph guarantees stored data integrity (that is - until we have production-ready Bluestore), I would not go with replica size 2. On 23.09.2016 09:02, Götz Reinicke - IT Koordinator wrote: > Hi, > > Am 23.09.16 um 05:55 schrieb Zhongyan Gu: >> Hi there, >> the default rbd pool replica size is 3. However, I found that in our >> all ssd environment, capacity become a cost issue. We want to save >> more capacity. So one option is change the replica size from 3 to 2. >> anyone can share the experience of pros vs cons regarding replica size >> 2 vs 3? > from my (still limited) POV, one main aspect is: how reliabel is your > hardware if you think off this? How often will a disk break, a server > crash, a datacenter burn down, a networkswitch fail? And if there is a > failure, how fast could that broken part be replaced or how fast is your > availabel hardware to replicate the lost OSD to the remaining system. > > I dont have numbers, but for our first initial cluster we go as well > with a repl size of 2 and I dont have bad feelings yet when i look at > the server and network infrastrukture we got. > > Others with more experiacne will give some other hints and may be > numbers. I never found some sort of calculator which can say "Oh you get > this hardware? Than a repl size of x y z is what you need." > > HTH a bit . Regards . Götz > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Tomasz Kuzemko tomasz.kuze...@corp.ovh.com signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
Hi, Am 23.09.16 um 05:55 schrieb Zhongyan Gu: > Hi there, > the default rbd pool replica size is 3. However, I found that in our > all ssd environment, capacity become a cost issue. We want to save > more capacity. So one option is change the replica size from 3 to 2. > anyone can share the experience of pros vs cons regarding replica size > 2 vs 3? from my (still limited) POV, one main aspect is: how reliabel is your hardware if you think off this? How often will a disk break, a server crash, a datacenter burn down, a networkswitch fail? And if there is a failure, how fast could that broken part be replaced or how fast is your availabel hardware to replicate the lost OSD to the remaining system. I dont have numbers, but for our first initial cluster we go as well with a repl size of 2 and I dont have bad feelings yet when i look at the server and network infrastrukture we got. Others with more experiacne will give some other hints and may be numbers. I never found some sort of calculator which can say "Oh you get this hardware? Than a repl size of x y z is what you need." HTH a bit . Regards . Götz smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd pool:replica size choose: 2 vs 3
Hi there, the default rbd pool replica size is 3. However, I found that in our all ssd environment, capacity become a cost issue. We want to save more capacity. So one option is change the replica size from 3 to 2. anyone can share the experience of pros vs cons regarding replica size 2 vs 3? thanks Zhongyan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com