Hi,
Need expert's view to address a problem we are seeing now and then: A
ovsdb-server node in a 3-nodes raft cluster keeps printing out the
"raft_is_connected: false" message, and its "connected" state in its _Server DB
stays as false.
According to the ovsdb-server(5) manpage, it means this server is not
contacting with a majority of its cluster.
Except its "connected" state, from what we can see, this server is in the
follower state and works fine, and connection between it and the other two
servers appear healthy as well.
Below is its raft structure snapshot at the time of the problem. Note that its
candidate_retrying field stays as true.
Hopefully the provide information can help to figure out what goes wrong here.
Unfortunately we don't have a solid case to reproduce it:
(gdb) print *(struct raft *)0xa872c0
$19 = {
hmap_node = {
hash = 2911123117,
next = 0x0
},
log = 0xa83690,
cid = {
parts = {2699238234, 2258650653, 3035282424, 813064186}
},
sid = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
},
local_address = 0xa874e0 "tcp:10.8.51.55:6643",
local_nickname = 0xa876d0 "3fdb",
name = 0xa876b0 "OVN_Northbound",
servers = {
buckets = 0xad4bc0,
one = 0x0,
mask = 3,
n = 3
},
election_timer = 1000,
election_timer_new = 0,
term = 3,
vote = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
},
synced_term = 3,
synced_vote = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
},
entries = 0xbf0fe0,
log_start = 2,
log_end = 312,
log_synced = 311,
allocated_log = 512,
snap = {
term = 1,
data = 0xaafb10,
eid = {
parts = {1838862864, 1569866528, 2969429118, 3021055395}
},
servers = 0xaafa70,
election_timer = 1000
},
role = RAFT_FOLLOWER,
commit_index = 311,
last_applied = 311,
leader_sid = {
parts = {642765114, 43797788, 2533161504, 3088745929}
},
election_base = 6043283367,
election_timeout = 6043284593,
joining = false,
remote_addresses = {
map = {
buckets = 0xa87410,
one = 0xa879c0,
mask = 0,
n = 1
}
},
join_timeout = 6037634820,
leaving = false,
left = false,
leave_timeout = 0,
failed = false,
waiters = {
prev = 0xa87448,
next = 0xa87448
},
listener = 0xaafad0,
listen_backoff = -9223372036854775808,
conns = {
prev = 0xbcd660,
next = 0xaafc20
},
add_servers = {
buckets = 0xa87480,
one = 0x0,
mask = 0,
n = 0
},
remove_server = 0x0,
commands = {
buckets = 0xa874a8,
one = 0x0,
mask = 0,
n = 0
},
ping_timeout = 6043283700,
n_votes = 1,
candidate_retrying = true,
had_leader = false,
ever_had_leader = true
}
Thanks
- Yun
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss