I'm looking at adding per-region Erasure code policies to our Swift cluster. Currently I'm experimenting with a small one - 3 hosts per region (each with 6 devices). Doing some experimentation seems to have highlighted a subtle relation between desire to minimize overhead and durability to survive a *host* outage. I'll do some examples below, and feel free to check my math :-)

For brevity use k = number of data fragments, m = number of parity fragments.

Suppose I use a (k=4, m=2) policy for each region. My overhead is m/k = 50% (i.e 1G uses 1,5G on disk). Each of my 3 hosts has 2 fragments, so if I lose a host I still have 4 in total so can reassemble objects :-)

Suppose I use a (k=8. m=2) policy, Now my overhead is m/k = 25% (yay, better than 50%). However now my fragments get spread around like: 3, 3, 4, If I lose a host I have at most 7 fragments - not enough to reassemble objects :-(

To me this suggests that a certain minimum number of *hosts* per region is needed for a given EC policy to be durable in the advent of host outage (or destruction). Is this correct - or have a flubbed the calculations?



Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to