Hi Tim, With the current setup you can only handle 1 host failure without loosing any data, BUT everything will probably freeze until you bring the failed node (or the OSD"s in it) back up.
Your setup indicates k=6, m=2 and all 8 shards are distributed to 4 hosts (2 shards/osds per host). Be aware that a pool which uses this erasure code profile will have a min_size of 7! (min_size = k+1) So this means in case of a node failure there are only 6 shards available so no writes are then accepted to the pool -> freeze of i/o. If you change the profile to k=5 and m=3 you can have a node failure without freezing i/o. (min_size = 6) If you want to sustain 2 node failures you must increase the m even further: for instance k=7, m=5 step choose indep 6 type host step choose indep 2 type osd this will distribute the 12 (k+m) shards over your 6 hosts (2 shards per host) min_size = 8 so you can have 2 node failures without freezing i/o. Caspar 2018-02-08 21:43 GMT+01:00 Tim Gipson <tgip...@ena.com>: > Hey all, > > We are trying to get an erasure coding cluster up and running but we are > having a problem getting the cluster to remain up if we lose an OSD host. > > Currently we have 6 OSD hosts with 6 OSDs a piece. I'm trying to build an > EC profile and a crush rule that will allow the cluster to continue running > if we lose a host, but I seem to misunderstand how the configuration of an > EC pool/cluster is supposed to be implemented. I would like to be able to > set this up to allow for 2 host failures before data loss occurs. > > Here is my crush rule: > > { > "rule_id": 2, > "rule_name": "EC_ENA", > "ruleset": 2, > "type": 3, > "min_size": 6, > "max_size": 8, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "choose_indep", > "num": 4, > "type": "host" > }, > { > "op": "choose_indep", > "num": 2, > "type": "osd" > }, > { > "op": "emit" > } > ] > } > > Here is my EC profile: > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=6 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > > Any direction or help would be greatly appreciated. > > Thanks, > > Tim Gipson > Systems Engineer > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com