Hi Tim,

With the current setup you can only handle 1 host failure without loosing
any data, BUT everything will probably freeze until you bring the failed
node (or the OSD"s in it) back up.

Your setup indicates k=6, m=2 and all 8 shards are distributed to 4 hosts
(2 shards/osds per host). Be aware that a pool which uses this erasure code
profile will have a min_size of 7! (min_size = k+1)
So this means in case of a node failure there are only 6 shards available
so no writes are then accepted to the pool -> freeze of i/o.

If you change the profile to k=5 and m=3 you can have a node failure
without freezing i/o. (min_size = 6)

If you want to sustain 2 node failures you must increase the m even further:

for instance k=7, m=5

step choose indep 6 type host
step choose indep 2 type osd

this will distribute the 12 (k+m) shards over your 6 hosts (2 shards per
host)

min_size = 8 so you can have 2 node failures without freezing i/o.

Caspar

2018-02-08 21:43 GMT+01:00 Tim Gipson <tgip...@ena.com>:

> Hey all,
>
> We are trying to get an erasure coding cluster up and running but we are
> having a problem getting the cluster to remain up if we lose an OSD host.
>
> Currently we have 6 OSD hosts with 6 OSDs a piece.  I'm trying to build an
> EC profile and a crush rule that will allow the cluster to continue running
> if we lose a host, but I seem to misunderstand how the configuration of an
> EC pool/cluster is supposed to be implemented.  I would like to be able to
> set this up to allow for 2 host failures before data loss occurs.
>
> Here is my crush rule:
>
> {
>     "rule_id": 2,
>     "rule_name": "EC_ENA",
>     "ruleset": 2,
>     "type": 3,
>     "min_size": 6,
>     "max_size": 8,
>     "steps": [
>         {
>             "op": "take",
>             "item": -1,
>             "item_name": "default"
>         },
>         {
>             "op": "choose_indep",
>             "num": 4,
>             "type": "host"
>         },
>         {
>             "op": "choose_indep",
>             "num": 2,
>             "type": "osd"
>         },
>         {
>             "op": "emit"
>         }
>     ]
> }
>
> Here is my EC profile:
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Any direction or help would be greatly appreciated.
>
> Thanks,
>
> Tim Gipson
> Systems Engineer
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to