We built a cluster with 8 nodes using Ignite persistence and 1 backup, and
had two nodes fail at different times, the first being storage did not get
mounted and ran out of space early, and the second an SSD failed.   There
are some things that we could have done better, but this event brings up
the question of how backups are distributed.

There are two approaches that have substantially different behavior on
double faults, and double faults are more likely at scale.

1) random placement of backup partitions relative to primary
2) backup partitions have similar affinity to the primary partitions, where
in the extreme nodes are paired so that primaries on the node pair have
backups on the other node of the pair

With a 64 node cluster, #2 would have 1/63th of the likelihood of data loss
when 2 nodes fail vs #1.

I'm guessing that ignite ships with #1, but we could provide our own
affinity function which would accomplish #2 if we chose?

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.

Reply via email to