Hi,

Need expert's view to address a problem we are seeing now and then:  A 
ovsdb-server node in a 3-nodes raft cluster keeps printing out the 
"raft_is_connected: false" message, and its "connected" state in its _Server DB 
stays as false.

According to the ovsdb-server(5) manpage, it means this server is not 
contacting with a majority of its cluster.

Except its "connected" state, from what we can see, this server is in the 
follower state and works fine, and connection between it and the other two 
servers appear healthy as well.

Below is its raft structure snapshot at the time of the problem. Note that its 
candidate_retrying field stays as true.

Hopefully the provide information can help to figure out what goes wrong here. 
Unfortunately we don't have a solid case to reproduce it:

(gdb) print *(struct raft *)0xa872c0
$19 = {
  hmap_node = {
    hash = 2911123117,
    next = 0x0
  },
  log = 0xa83690,
  cid = {
    parts = {2699238234, 2258650653, 3035282424, 813064186}
  },
  sid = {
    parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  local_address = 0xa874e0 "tcp:10.8.51.55:6643",
  local_nickname = 0xa876d0 "3fdb",
  name = 0xa876b0 "OVN_Northbound",
  servers = {
    buckets = 0xad4bc0,
    one = 0x0,
    mask = 3,
    n = 3
  },
  election_timer = 1000,
  election_timer_new = 0,
  term = 3,
  vote = {
    parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  synced_term = 3,
  synced_vote = {
    parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  entries = 0xbf0fe0,
  log_start = 2,
  log_end = 312,
  log_synced = 311,
  allocated_log = 512,
  snap = {
    term = 1,
    data = 0xaafb10,
    eid = {
      parts = {1838862864, 1569866528, 2969429118, 3021055395}
    },
    servers = 0xaafa70,
    election_timer = 1000
  },
  role = RAFT_FOLLOWER,
  commit_index = 311,
  last_applied = 311,
  leader_sid = {
    parts = {642765114, 43797788, 2533161504, 3088745929}
  },
  election_base = 6043283367,
  election_timeout = 6043284593,
  joining = false,
  remote_addresses = {
    map = {
      buckets = 0xa87410,
      one = 0xa879c0,
      mask = 0,
      n = 1
    }
  },
  join_timeout = 6037634820,
  leaving = false,
  left = false,
  leave_timeout = 0,
  failed = false,
  waiters = {
    prev = 0xa87448,
    next = 0xa87448
  },
  listener = 0xaafad0,
  listen_backoff = -9223372036854775808,
  conns = {
    prev = 0xbcd660,
    next = 0xaafc20
  },
  add_servers = {
    buckets = 0xa87480,
    one = 0x0,
    mask = 0,
    n = 0
  },
  remove_server = 0x0,
  commands = {
    buckets = 0xa874a8,
    one = 0x0,
    mask = 0,
    n = 0
  },
  ping_timeout = 6043283700,
  n_votes = 1,
  candidate_retrying = true,
  had_leader = false,
  ever_had_leader = true
}

Thanks
- Yun
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to