Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Anthony D'Atri
All very true and worth considering, but I feel compelled to mention the strategy of setting mon_osd_down_out_subtree_limit carefully to prevent automatic rebalancing. *If* the loss of a failure domain is temporary, ie. something you can fix fairly quickly, it can be preferable to not start

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread David Turner
I agree that running in min_size of 1 is worse than running with only 3 failure domains. Even if it's just for a short time and you're monitoring it closely... it takes mere seconds before you could have corrupt data with min_size of 1 (depending on your use case). That right there is the key.

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Laszlo Budai
What you're saying that if we only have 3 failure domains then ceph can do nothing to maintain 3 copies in case of an entire failure domain is lost, that is correct. BUT if you're losing 2 replicas out of 3 of your data, and your min size is set to 2 (the recommended minimum) then you have an

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread David Turner
You wouldn't be able to guarantee that the cluster will not use 2 servers from the same rack. The problem with 3 failure domains, however, is if you lose a full failure domain ceph can do nothing to maintain 3 copies of your data. It leaves you in a position where you need to rush to the

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Laszlo Budai
Hi David, If I understand correctly your suggestion is the following: If we have for instance 12 servers grouped into 3 racks (4/rack) then you would build a crush map saying that you have 6 racks (virtual ones), and 2 servers in each of them, right? In this case if we are setting the failure

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread Deepak Naidu
day, June 01, 2017 2:05 PM To: Deepak Naidu; ceph-users Subject: Re: [ceph-users] Crushmap from Rack aware to Node aware If all 6 racks are tagged for Ceph storage nodes, I'd go ahead and just put the nodes in there now and configure the crush map accordingly. That way you can grow each of the

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread David Turner
:* David Turner [mailto:drakonst...@gmail.com] > *Sent:* Thursday, June 01, 2017 12:23 PM > *To:* Deepak Naidu; ceph-users > *Subject:* Re: [ceph-users] Crushmap from Rack aware to Node aware > > > > The way to do this is to download your crush map, modify it manually after

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread Deepak Naidu
Subject: Re: [ceph-users] Crushmap from Rack aware to Node aware The way to do this is to download your crush map, modify it manually after decompiling it to text format or modify it using the crushtool. Once you have your crush map with the rules in place that you want, you will upload

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread David Turner
The way to do this is to download your crush map, modify it manually after decompiling it to text format or modify it using the crushtool. Once you have your crush map with the rules in place that you want, you will upload the crush map to the cluster. When you change your failure domain from