On 1/13/21 3:31 PM, Ulrich Windl wrote:
Roger Zhou <zz...@suse.com> schrieb am 13.01.2021 um 05:32 in Nachricht
<97ac2305-85b4-cbb0-7133-ac1372143...@suse.com>:
On 1/12/21 4:23 PM, Ulrich Windl wrote:
Hi!

Before setting up our first pacemaker cluster we thought one low-speed
redundant network would be good in addition to the normal high-speed network.
However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to
drive such a configuration with corosync.

Passive RRP mode with UDPU still sends each packet through both nets,

Indeed, packets are sent in the round-robin fashion.

being throttled by the slower network.
(Originally we were using multicast, but that was even worse)

Now I realized that even under modest load, I see messages about "retransmit
list", like this:
Jan 08 10:57:56 h16 corosync[3562]:   [TOTEM ] Retransmit List: 3e2
Jan 08 10:57:56 h16 corosync[3562]:   [TOTEM ] Retransmit List: 3e2 3e4
Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 60e 610 612
614
Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 610 614
Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 614
Jan 08 11:13:41 h16 corosync[3562]:   [TOTEM ] Retransmit List: 6ed


What's the latency of this low speed link?

The normal net is fibre-based:
4 packets transmitted, 4 received, 0% packet loss, time 3058ms
rtt min/avg/max/mdev = 0.131/0.175/0.205/0.027 ms

The redundant net is copper-based:
5 packets transmitted, 5 received, 0% packet loss, time 4104ms
rtt min/avg/max/mdev = 0.293/0.304/0.325/0.019 ms


Aha, RTT < 1ms, the network is fast enough. It clear my doubt to guess the latency of the slow link might even in tens or even hundred ms level. Then, I might wonder if corosync packet get the bad luck and get delayed due to workload on one of the link.


Questions on that:
Will the situation be much better with knet?

knet provides "link_mode: passive" could fit your thought slightly which is
not
round-robin. But, it still doesn't fit your game well, since knet assumes
the
similar latency among links again. You may have to tune parameters for the
low
speed link and likely sacrifice the benefit from the fast link.

Well in the past when using HP Service Guard, everything was working quite 
differently:
There was a true heartbeat on each cluster net, determining ist "being alive", 
and when the cluster performed no action there was no traffic on the cluster links 
(except that heartbeat).
When the cluster actually had to talk, it was using the link that was flagged 
"alive" with a preference of primary first, then secondary when both were 
available.


"link_mode: passive" together with knet_link_priority would be useful. Also, use sctp in knet could be the alternative too.

Cheers,
Roger

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to