Re: [ceph-users] replica questions

2017-03-03 Thread Vy Nguyen Tan
Hi,

You should read email from Wido den Hollander:
"Hi,

As a Ceph consultant I get numerous calls throughout the year to help people
 with getting their broken Ceph clusters back online.

The causes of downtime vary vastly, but one of the biggest causes is that
people use replication 2x. size = 2, min_size = 1.

In 2016 the amount of cases I have where data was lost due to these
settings grew exponentially.

Usually a disk failed, recovery kicks in and while recovery is happening a
second disk fails. Causing PGs to become incomplete.

There have been to many times where I had to use xfs_repair on broken disks
and use ceph-objectstore-tool to export/import PGs.

I really don't like these cases, mainly because they can be prevented
easily by using size = 3 and min_size = 2 for all pools.

With size = 2 you go into the danger zone as soon as a single disk/daemon
fails. With size = 3 you always have two additional copies left thus
keeping your data safe(r).

If you are running CephFS, at least consider running the 'metadata' pool
with size = 3 to keep the MDS happy.

Please, let this be a big warning to everybody who is running with size =
2. The downtime and problems caused by missing objects/replicas are usually
big and it takes days to recover from those. But very often data is lost
and/or corrupted which causes even more problems.

I can't stress this enough. Running with size = 2 in production is a
SERIOUS hazard and should not be done imho.

To anyone out there running with size = 2, please reconsider this!

Thanks,

Wido"

Btw, could you please share your experience about HA network for Ceph ?
What type of bonding do you have? are you using stackable switches?



On Fri, Mar 3, 2017 at 6:24 PM, Maxime Guyot <maxime.gu...@elits.com> wrote:

> Hi Henrik and Matteo,
>
>
>
> While I agree with Henrik: increasing your replication factor won’t
> improve recovery or read performance on its own. If you are changing from
> replica 2 to replica 3, you might need to scale-out your cluster to have
> enough space for the additional replica, and that would improve the
> recovery and read performance.
>
>
>
> Cheers,
>
> Maxime
>
>
>
> *From: *ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of
> Henrik Korkuc <li...@kirneh.eu>
> *Date: *Friday 3 March 2017 11:35
> *To: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
> *Subject: *Re: [ceph-users] replica questions
>
>
>
> On 17-03-03 12:30, Matteo Dacrema wrote:
>
> Hi All,
>
>
>
> I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD
> every 5 OSDs with replica 2 for a total RAW space of 150 TB.
>
> I’ve few question about it:
>
>
>
>- It’s critical to have replica 2? Why?
>
> Replica size 3 is highly recommended. I do not know exact numbers but it
> decreases chance of data loss as 2 disk failures appear to be quite
> frequent thing, especially in larger clusters.
>
>
>- Does replica 3 makes recovery faster?
>
> no
>
>
>- Does replica 3 makes rebalancing and recovery less heavy for
>customers? If I lose 1 node does replica 3 reduce the IO impact respect a
>replica 2?
>
> no
>
>
>- Does read performance increase with replica 3?
>
> no
>
>
>
> Thank you
>
> Regards
>
> Matteo
>
>
>
> 
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
>
>
>
>
> ___
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replica questions

2017-03-03 Thread Maxime Guyot
Hi Henrik and Matteo,

While I agree with Henrik: increasing your replication factor won’t improve 
recovery or read performance on its own. If you are changing from replica 2 to 
replica 3, you might need to scale-out your cluster to have enough space for 
the additional replica, and that would improve the recovery and read 
performance.

Cheers,
Maxime

From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Henrik Korkuc 
<li...@kirneh.eu>
Date: Friday 3 March 2017 11:35
To: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] replica questions

On 17-03-03 12:30, Matteo Dacrema wrote:
Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 
OSDs with replica 2 for a total RAW space of 150 TB.
I’ve few question about it:


  *   It’s critical to have replica 2? Why?
Replica size 3 is highly recommended. I do not know exact numbers but it 
decreases chance of data loss as 2 disk failures appear to be quite frequent 
thing, especially in larger clusters.


  *   Does replica 3 makes recovery faster?
no


  *   Does replica 3 makes rebalancing and recovery less heavy for customers? 
If I lose 1 node does replica 3 reduce the IO impact respect a replica 2?
no


  *   Does read performance increase with replica 3?
no


Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.





___

ceph-users mailing list

ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replica questions

2017-03-03 Thread Henrik Korkuc

On 17-03-03 12:30, Matteo Dacrema wrote:

Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD 
every 5 OSDs with replica 2 for a total RAW space of 150 TB.

I’ve few question about it:

  * It’s critical to have replica 2? Why?

Replica size 3 is highly recommended. I do not know exact numbers but it 
decreases chance of data loss as 2 disk failures appear to be quite 
frequent thing, especially in larger clusters.


  * Does replica 3 makes recovery faster?


no


  * Does replica 3 makes rebalancing and recovery less heavy for
customers? If I lose 1 node does replica 3 reduce the IO impact
respect a replica 2?


no


  * Does read performance increase with replica 3?


no


Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they 
are addressed. If you have received this email in error please notify 
the system manager. This message contains confidential information and 
is intended only for the individual named. If you are not the named 
addressee you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately by e-mail if you have received 
this e-mail by mistake and delete this e-mail from your system. If you 
are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] replica questions

2017-03-03 Thread Matteo Dacrema
Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 
OSDs with replica 2 for a total RAW space of 150 TB.
I’ve few question about it:

It’s critical to have replica 2? Why?
Does replica 3 makes recovery faster?
Does replica 3 makes rebalancing and recovery less heavy for customers? If I 
lose 1 node does replica 3 reduce the IO impact respect a replica 2?
Does read performance increase with replica 3?

Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com