Thanks for suggestions. Will try to ensure infra as suggested and will
explore topology validator if this can be used.

On Tue, 1 Nov 2022, 21:51 Jeremy McMillan, <>

> Can you tell two stories which start out all nodes in the intended cluster
> configuration are down, one story resulting in a successful cluster
> startup, but the other detecting an invalid configuration, and refusing to
> start?
> I can anticipate problems understanding what to do when the first node
> attempts to start, but only has its own AZ represented in the topology. How
> can this first node know whether future nodes will be able to fulfill the
> condition backup_replicas + 1 >=  AZ_count? The general case, allowing
> elastic deployment, requires individual Ignite nodes to work in a
> best-effort capacity.
> I would approach this from a DevOps perspective, and just validate the
> deployment before starting up any infrastructure. Look at all of the
> relevant config files which would be deployed. Enumerate a projection of
> deployed nodes and their AZs. Compare this against the desired backup
> filter configuration and fail before starting any Ignite nodes with a
> deployment automation tool exception.
> On Tue, Nov 1, 2022 at 9:49 AM Surinder Mehra <> wrote:
>> Thanks for your reply. Let me try to answer your 2 questions below.
>> 1. I understand that it sacrifices the backups incase it can't place
>> backups appropriately. Question is, is it possible to fail the deployment
>> rather than risking single copy of data present in cluster. If this only
>> copy goes down, we will have downtime as data won't be present in cluster.
>> We should rather throw error if enough hardware is not present than risking
>> data unavailability issue during business activity
>> 2. Why we want 3 copies of data. It's a design choice. We want to ensure
>> even if 2 nodes go down, we still have 3rd present to serve the data.
>> Hope I answered your question
>> On Tue, 1 Nov 2022, 19:40 Jeremy McMillan, <>
>> wrote:
>>> This question is a design question.
>>> What kids of fault states do you expect to tolerate? What is your
>>> failure budget?
>>> Why are you trying to make more than 2 copies of the data distribute
>>> across only two failure domains?
>>> Also "fail fast" means discover your implementation defects faster than
>>> your release cycle, not how fast you can cause data loss.
>>> On Tue, Nov 1, 2022, 09:01 Surinder Mehra <> wrote:
>>>> gentle reminder.
>>>> One additional question: We have observed that if available AZs are
>>>> less than backups count, ignite skips creating backups. Is this correct
>>>> understanding? If yes, how can we fail fast if backups can not be placed
>>>> due to AZ limitation?
>>>> On Mon, Oct 31, 2022 at 6:30 PM Surinder Mehra <>
>>>> wrote:
>>>>> Hi,
>>>>> As per link attached, to ensure primary and backup partitions are not
>>>>> stored on same node, We used AWS AZ as backup filter and now I can see if 
>>>>> I
>>>>> start two ignite nodes on the same machine, primary partitions are evenly
>>>>> distributed but backups are always zero which is expected.
>>>>> My question is what would happen if AZ-1 has 2 machines and AZ-2 has 1
>>>>> machine and ignite cluster has only 3 nodes, each machine having one 
>>>>> ignite
>>>>> node.
>>>>> Node1[AZ1] - keys 1-100
>>>>> Node2[AZ1] -  keys 101-200
>>>>> Node3[AZ2] - keys  201 -300
>>>>> In the above scenario, if the backup count is 2, how would back up
>>>>> partitions be distributed.
>>>>> 1. Would it mean node3 will have 2 backup copies of primary partitions
>>>>> of node 1 and 2 ?
>>>>> 2. If we have a 4 node cluster with 2 nodes in each AZ, would backup
>>>>> copies also be placed on different nodes(In other words, does the backup
>>>>> filter also apply to how backup copies are placed on nodes) ?

Reply via email to