Andrei,

On 01.03.2021 15:45, Jan Friesse wrote:
Andrei,

On 01.03.2021 12:26, Jan Friesse wrote:


Thanks for digging into logs. I believe Eric is hitting
https://github.com/corosync/corosync-qdevice/issues/10 (already fixed,
but may take some time to get into distributions) - it also contains
workaround.


I tested corosync-qnetd at df3c672 which should include these fixes. It
changed behavior, still I cannot explain it.

Again, ha1+ha2+qnetd, ha2 is current DC, I disconnect ha1 (block
everything with ha1 source MAC), stonith disabled. corosync and

So ha1 is blocked on both ha2 and qnetd and blocking is symmetric (I
mean, nothing is sent to ha1 and nothing is received from ha1)?


No, it is asymmetric. ha1 cannot *send* anything to ha2 or qnetd; it
should be able to *receive* from both.

That's problem for corosync 2.x. Corosync 3.x with knet solves this by establishing connection only when node can both send and receive packets from other nodes, but udpu behavior is weird (on corosync side) when it is possible to receive message but not sent one (or vice versa).

It also explains why there are multiple "waiting for qdevice" messages logged.

Could you please try to block both outgoing and incomming packets?


corosync-qdevice on nodes are still 2.4.5 if it matters.

Shouldn't really matter as long as both corosync-qdevice and
corosync-qnetd are version 3.0.1.


corosync-qdevice on nodes is still 2.4.5. corosync-qnetd on witness is
git snapshot from last November. I was not sure I could mix corosync and
corosync-qdevice of different versions and looking at git commit all

It is (or should be) possible. I was testing this scenario (old qnetd + new qdevice and old qdevice + new qnetd) before releasing 3.0.1 (not extensivelly tho so of there can be some bugs which I haven't spotted).

changes seem to be in qnetd anyway.

True


...


That's a bit harder to explain but it has a reason.


OK, thank you.
...

No mater what, are you able to provide some step-by-step reproducer of
that 40 sec delay?

No. As I said next time I tested I got entirely different timing. I will
try after cold boot again.


Perfect, thanks.

Regards,
  Honza

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to