Hey,

I had two running master nodes i had to add another master node. My 
distributed config:

{
  "replication": true,
  "hotAlignment" : false,
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": "majority",
  "executionMode": "synchronous",
  "readYourWrites": true,
  "newNodeStrategy": "dynamic",
  "servers": {
    "orientdbMaster1": "master",
    "orientdbMaster2": "master",
    "orientdbMaster3": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}


2017-08-14 10:44:44:254 WARNI [orientdbMaster1] Timeout (20001ms) on 
waiting for synchronous responses from nodes=[orientdbMaster2, 
orientdbMaster3] responsesSoFar=[orientdbMaster3] request=(id=1.263 
task=gossip timestamp: 1502707464247 lockManagerServer: orientdbMaster1) 
[ODistributedDatabaseImpl]

As soon the new machine joined the cluster following chain of events 
happened:

1. Added orientdbMaster3
2. orientdbMaster3 started synchronising the database with orientdbMaster2
3. During this time orientdbMaster2 became unreachable for orientdbMaster1. 
Got this in the log continuously

WARNI [orientdbMaster1] Timeout (20001ms) on waiting for synchronous 
responses from nodes=[orientdbMaster2, orientdbMaster3] 
responsesSoFar=[orientdbMaster3] request=(id=1.263 task=gossip timestamp: 
1502707464247 lockManagerServer: orientdbMaster1) [ODistributedDatabaseImpl]

4. Writes were not possible as the quorum of 2 was not reached. All the 
writes failed.
5. After the orientdbMaster3 was up, orientdbMaster2 started to rebuild the 
indexes. (Took a lot of time)


This caused a huge down time.

The same issues happens whenever a node which was the lock Manager was 
restarted. The machine starts to get the entire database.

Questions:

1. Why is the entire database needed to be fetched again on every restart 
of the lockManger node?
2. How is the new lock Manager elected in the beginning and what is the 
process of re-election?
3. Can i specify the new node to get the database from a specific node?
4. Why are writes not possible on the node which is helping re-sync of 
database?
5. Why the indices rebuild whenever there is re-sync?

I have waste a lot of time when i added a new machine and this caused a 
huge downtime as well.


Thanks,
Zeeshan

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to