Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Nick Fisk


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. 
> C.A.
> Sent: 23 September 2016 10:38
> To: n...@fisk.me.uk; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> 
> ummmok.
> 
> and, how would the affected PG recover, just replacing the affected OSD/DISK? 
> or would the affected PG migrate to othe OSD/DISK?

Yes, Ceph would start recovering the PG's to other OSD's. But until your PG 
size=min_size then IO will be blocked.

> 
> thx
> 
> On 23/09/16 10:56, Nick Fisk wrote:
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> >> Ja. C.A.
> >> Sent: 23 September 2016 09:50
> >> To: ceph-users@lists.ceph.com
> >> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> >>
> >> Hi
> >>
> >> with rep_size=2 and min_size=2, what drawbacks are removed compared
> >> with
> >> rep_size=2 and min_size=1?
> > If you lose a disk, everything will stop working until the affected PG's 
> > are at size=2 again.
> >
> >> thx
> >> J.
> >>
> >> On 23/09/16 10:07, Wido den Hollander wrote:
> >>>> Op 23 september 2016 om 10:04 schreef mj :
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 09/23/2016 09:41 AM, Dan van der Ster wrote:
> >>>>>> If you care about your data you run with size = 3 and min_size = 2.
> >>>>>>
> >>>>>> Wido
> >>>> We're currently running with min_size 1. Can we simply change this,
> >>>> online, with:
> >>>>
> >>>> ceph osd pool set vm-storage min_size 2
> >>>>
> >>>> and expect everything to continue running?
> >>>>
> >>> Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph
> >>> that 2 replicas need to be online for I/O (Read and Write)
> > to
> >> continue.
> >>> Wido
> >>>
> >>>> (our cluster is HEALTH_OK, enough disk space, etc, etc)
> >>>>
> >>>> MJ
> >>>> ___
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Ja. C.A.
ummmok.

and, how would the affected PG recover, just replacing the affected 
OSD/DISK? or would the affected PG migrate to othe OSD/DISK?

thx

On 23/09/16 10:56, Nick Fisk wrote:
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. 
>> C.A.
>> Sent: 23 September 2016 09:50
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
>>
>> Hi
>>
>> with rep_size=2 and min_size=2, what drawbacks are removed compared with
>> rep_size=2 and min_size=1?
> If you lose a disk, everything will stop working until the affected PG's are 
> at size=2 again.
>
>> thx
>> J.
>>
>> On 23/09/16 10:07, Wido den Hollander wrote:
>>>> Op 23 september 2016 om 10:04 schreef mj :
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On 09/23/2016 09:41 AM, Dan van der Ster wrote:
>>>>>> If you care about your data you run with size = 3 and min_size = 2.
>>>>>>
>>>>>> Wido
>>>> We're currently running with min_size 1. Can we simply change this,
>>>> online, with:
>>>>
>>>> ceph osd pool set vm-storage min_size 2
>>>>
>>>> and expect everything to continue running?
>>>>
>>> Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 
>>> replicas need to be online for I/O (Read and Write)
> to
>> continue.
>>> Wido
>>>
>>>> (our cluster is HEALTH_OK, enough disk space, etc, etc)
>>>>
>>>> MJ
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Nick Fisk


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ja. 
> C.A.
> Sent: 23 September 2016 09:50
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] rbd pool:replica size choose: 2 vs 3
> 
> Hi
> 
> with rep_size=2 and min_size=2, what drawbacks are removed compared with
> rep_size=2 and min_size=1?

If you lose a disk, everything will stop working until the affected PG's are at 
size=2 again.

> 
> thx
> J.
> 
> On 23/09/16 10:07, Wido den Hollander wrote:
> >> Op 23 september 2016 om 10:04 schreef mj :
> >>
> >>
> >> Hi,
> >>
> >> On 09/23/2016 09:41 AM, Dan van der Ster wrote:
> >>>> If you care about your data you run with size = 3 and min_size = 2.
> >>>>
> >>>> Wido
> >> We're currently running with min_size 1. Can we simply change this,
> >> online, with:
> >>
> >> ceph osd pool set vm-storage min_size 2
> >>
> >> and expect everything to continue running?
> >>
> > Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 
> > replicas need to be online for I/O (Read and Write)
to
> continue.
> >
> > Wido
> >
> >> (our cluster is HEALTH_OK, enough disk space, etc, etc)
> >>
> >> MJ
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Ja. C.A.
Hi

with rep_size=2 and min_size=2, what drawbacks are removed compared with 
rep_size=2 and min_size=1?

thx
J.

On 23/09/16 10:07, Wido den Hollander wrote:
>> Op 23 september 2016 om 10:04 schreef mj :
>>
>>
>> Hi,
>>
>> On 09/23/2016 09:41 AM, Dan van der Ster wrote:
 If you care about your data you run with size = 3 and min_size = 2.

 Wido
>> We're currently running with min_size 1. Can we simply change this,
>> online, with:
>>
>> ceph osd pool set vm-storage min_size 2
>>
>> and expect everything to continue running?
>>
> Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 
> replicas need to be online for I/O (Read and Write) to continue.
>
> Wido
>
>> (our cluster is HEALTH_OK, enough disk space, etc, etc)
>>
>> MJ
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Wido den Hollander

> Op 23 september 2016 om 10:04 schreef mj :
> 
> 
> Hi,
> 
> On 09/23/2016 09:41 AM, Dan van der Ster wrote:
> >> If you care about your data you run with size = 3 and min_size = 2.
> >>
> >> Wido
> 
> We're currently running with min_size 1. Can we simply change this, 
> online, with:
> 
> ceph osd pool set vm-storage min_size 2
> 
> and expect everything to continue running?
> 

Yes, it will. No rebalance will happen. min_size = 2 just tells Ceph that 2 
replicas need to be online for I/O (Read and Write) to continue.

Wido

> (our cluster is HEALTH_OK, enough disk space, etc, etc)
> 
> MJ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread mj

Hi,

On 09/23/2016 09:41 AM, Dan van der Ster wrote:

If you care about your data you run with size = 3 and min_size = 2.

Wido


We're currently running with min_size 1. Can we simply change this, 
online, with:


ceph osd pool set vm-storage min_size 2

and expect everything to continue running?

(our cluster is HEALTH_OK, enough disk space, etc, etc)

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Dan van der Ster
On Fri, Sep 23, 2016 at 9:29 AM, Wido den Hollander  wrote:
>
>
> > Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko 
> > :
> >
> >
> > Hi,
> >
> > biggest issue with replica size 2 is that if you find an inconsistent
> > object you will not be able to tell which copy is the correct one. With
> > replica size 3 you could assume that those 2 copies that are the same
> > are correct.
> >
> > Until Ceph guarantees stored data integrity (that is - until we have
> > production-ready Bluestore), I would not go with replica size 2.
> >
>
> Not only that, but the same could happen if you have flapping OSDs.
>
> OSD 0 and 1 share a PG.
>
> 0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 
> comes up. 0 becomes primary, but the PG is 'down' because 1 had the last 
> data. You really need 1 to come back in this case before the PG will work 
> again.
>
> I have seen this happen multiple times in systems which got overloaded.
>
> If you care about your data you run with size = 3 and min_size = 2.
>
> Wido

FWIW, when Intel presented their reference architectures at Ceph Day
Switzerland, their "IOPS-Optimized" config had 2 replicas on "Intel
SSD DC Series".

I guess they trust their hardware. But personally even if I was forced
to run 2x replicas, I'd try to use size=2, min_size=2.

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Wido den Hollander

> Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko 
> :
> 
> 
> Hi,
> 
> biggest issue with replica size 2 is that if you find an inconsistent
> object you will not be able to tell which copy is the correct one. With
> replica size 3 you could assume that those 2 copies that are the same
> are correct.
> 
> Until Ceph guarantees stored data integrity (that is - until we have
> production-ready Bluestore), I would not go with replica size 2.
> 

Not only that, but the same could happen if you have flapping OSDs.

OSD 0 and 1 share a PG.

0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 comes 
up. 0 becomes primary, but the PG is 'down' because 1 had the last data. You 
really need 1 to come back in this case before the PG will work again.

I have seen this happen multiple times in systems which got overloaded.

If you care about your data you run with size = 3 and min_size = 2.

Wido

> On 23.09.2016 09:02, Götz Reinicke - IT Koordinator wrote:
> > Hi,
> > 
> > Am 23.09.16 um 05:55 schrieb Zhongyan Gu:
> >> Hi there,
> >> the default rbd pool replica size is 3. However, I found that in our
> >> all ssd environment, capacity become a cost issue. We want to save
> >> more capacity. So one option is change the replica size from 3 to 2.
> >> anyone can share the experience of pros vs cons regarding replica size
> >> 2 vs 3?
> > from my (still limited) POV, one main aspect is: how reliabel is your
> > hardware if you think off this? How often will a disk break, a server
> > crash, a datacenter burn down, a networkswitch fail? And if there is a
> > failure, how fast could that broken part be replaced or how fast is your
> > availabel hardware to replicate the lost OSD to the remaining system.
> > 
> > I dont have numbers, but for our first initial cluster we go as well
> > with a repl size of 2 and I dont have bad feelings yet when i look at
> > the server and network infrastrukture we got.
> > 
> > Others with more experiacne will give some other hints and may be
> > numbers. I never found some sort of calculator which can say "Oh you get
> > this hardware? Than a repl size of x y z is what you need."
> >  
> > HTH a bit . Regards . Götz
> > 
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> -- 
> Tomasz Kuzemko
> tomasz.kuze...@corp.ovh.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Tomasz Kuzemko
Hi,

biggest issue with replica size 2 is that if you find an inconsistent
object you will not be able to tell which copy is the correct one. With
replica size 3 you could assume that those 2 copies that are the same
are correct.

Until Ceph guarantees stored data integrity (that is - until we have
production-ready Bluestore), I would not go with replica size 2.

On 23.09.2016 09:02, Götz Reinicke - IT Koordinator wrote:
> Hi,
> 
> Am 23.09.16 um 05:55 schrieb Zhongyan Gu:
>> Hi there,
>> the default rbd pool replica size is 3. However, I found that in our
>> all ssd environment, capacity become a cost issue. We want to save
>> more capacity. So one option is change the replica size from 3 to 2.
>> anyone can share the experience of pros vs cons regarding replica size
>> 2 vs 3?
> from my (still limited) POV, one main aspect is: how reliabel is your
> hardware if you think off this? How often will a disk break, a server
> crash, a datacenter burn down, a networkswitch fail? And if there is a
> failure, how fast could that broken part be replaced or how fast is your
> availabel hardware to replicate the lost OSD to the remaining system.
> 
> I dont have numbers, but for our first initial cluster we go as well
> with a repl size of 2 and I dont have bad feelings yet when i look at
> the server and network infrastrukture we got.
> 
> Others with more experiacne will give some other hints and may be
> numbers. I never found some sort of calculator which can say "Oh you get
> this hardware? Than a repl size of x y z is what you need."
>  
> HTH a bit . Regards . Götz
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-23 Thread Götz Reinicke - IT Koordinator
Hi,

Am 23.09.16 um 05:55 schrieb Zhongyan Gu:
> Hi there,
> the default rbd pool replica size is 3. However, I found that in our
> all ssd environment, capacity become a cost issue. We want to save
> more capacity. So one option is change the replica size from 3 to 2.
> anyone can share the experience of pros vs cons regarding replica size
> 2 vs 3?
from my (still limited) POV, one main aspect is: how reliabel is your
hardware if you think off this? How often will a disk break, a server
crash, a datacenter burn down, a networkswitch fail? And if there is a
failure, how fast could that broken part be replaced or how fast is your
availabel hardware to replicate the lost OSD to the remaining system.

I dont have numbers, but for our first initial cluster we go as well
with a repl size of 2 and I dont have bad feelings yet when i look at
the server and network infrastrukture we got.

Others with more experiacne will give some other hints and may be
numbers. I never found some sort of calculator which can say "Oh you get
this hardware? Than a repl size of x y z is what you need."
 
HTH a bit . Regards . Götz




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-22 Thread Zhongyan Gu
Hi there,
the default rbd pool replica size is 3. However, I found that in our all
ssd environment, capacity become a cost issue. We want to save more
capacity. So one option is change the replica size from 3 to 2. anyone can
share the experience of pros vs cons regarding replica size 2 vs 3?

thanks
Zhongyan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com