On 01.03.2021 16:45, Jan Friesse wrote: > Andrei, > >> On 01.03.2021 15:45, Jan Friesse wrote: >>> Andrei, >>> >>>> On 01.03.2021 12:26, Jan Friesse wrote: >>>>>> >>>>> >>>>> Thanks for digging into logs. I believe Eric is hitting >>>>> https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, >>>>> but may take some time to get into distributions) - it also contains >>>>> workaround. >>>>> >>>> >>>> I tested corosync-qnetd at df3c672 which should include these fixes. It >>>> changed behavior, still I cannot explain it. >>>> >>>> Again, ha1+ha2+qnetd, ha2 is current DC, I disconnect ha1 (block >>>> everything with ha1 source MAC), stonith disabled. corosync and >>> >>> So ha1 is blocked on both ha2 and qnetd and blocking is symmetric (I >>> mean, nothing is sent to ha1 and nothing is received from ha1)? >>> >> >> No, it is asymmetric. ha1 cannot *send* anything to ha2 or qnetd; it >> should be able to *receive* from both. > > That's problem for corosync 2.x. Corosync 3.x with knet solves this by > establishing connection only when node can both send and receive packets > from other nodes, but udpu behavior is weird (on corosync side) when it > is possible to receive message but not sent one (or vice versa). > > It also explains why there are multiple "waiting for qdevice" messages > logged. > > Could you please try to block both outgoing and incomming packets? >
Several times both nodes detected problem and reacted almost synchronously, so it probably was it. ... >>> >>> No mater what, are you able to provide some step-by-step reproducer of >>> that 40 sec delay? >> >> No. As I said next time I tested I got entirely different timing. I will >> try after cold boot again. >> > > Perfect, thanks. > I was able to reproduce it again with asymmetric fencing after cold boot. Are you still interested? _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
