On 26/06/2020 07:56, Jan Friesse wrote: > Robert, > thank you for the info/report. More comments inside. > >> All, >> Hello. Hope all is well. I have been researching Oracle Linux 8.2 >> and ran across a situation that is not well documented. I decided to >> provide some details to the community in case I am missing something. >> >> Basically, if you increase the totem token above approximately 33000 >> with the knet transport, then a two node cluster will not properly >> form. The exact threshold value will slightly fluctuate, depending >> on hardware type and debugging, but will consistently fail above 40000. > > At least corosync with 40sec timeout works just fine for me. >
I just tried 41 second token timeout on a 2-node and a 4-node cluster (pcs/corosync/pacemaker) and it started up just fine. I think we'd need to see the logs. > # corosync-cmapctl | grep token > runtime.config.totem.token (u32) = 40650 > > # corosync-quorumtool > Quorum information > ------------------ > Date: Fri Jun 26 08:45:12 2020 > Quorum provider: corosync_votequorum > Nodes: 2 > Node ID: 1 > Ring ID: 1.11be1 > Quorate: Yes > > Votequorum information > ---------------------- > Expected votes: 3 > Highest expected: 3 > Total votes: 2 > Quorum: 2 > Flags: Quorate > > Membership information > ---------------------- > Nodeid Votes Name > 1 1 vmvlan-vmcos8-n05 (local) > 6 1 vmvlan-vmcos8-n06 > > > It is indeed true that forming took a bit more time (30 sec to be more > precise) > >> >> The failure to form a cluster would occur when running the "pcs >> cluster start --all" command or if I would start one cluster, let it >> stabilize, then start the second. When it fails to form a cluster, >> each side would say they are ONLINE, but the other side is >> UNCLEAN(offline) (cluster state: partition WITHOUT quorum). If I >> define proper stonith resources, then they will not fence since the >> cluster never makes it to an initial quorum state. So, the cluster >> will stay in this split state indefinitely. > > Maybe some timeout in pcs? > >> >> Changing the transport back to udpu or udp, the higher totem tokens >> worked as expected. > > Yup. You've correctly find out that knet_* timeouts helps. Basically > knet let link not working till it gets enough pongs. UDP/UDPU doesn't > have this concept so it will create cluster faster. > >> >> From the debug logging, I suspect that the Election Trigger (20 >> seconds) fires before all nodes are properly identified by the knet >> transport. I noticed that with a totem token passing 32 seconds, the >> knet_ping* defaults were pushing up against that 20 second mark. The >> output of "corosync-cfgtool -s" will show each node's link as enabled, >> but each side will state the other side's link is not connected. >> Since each side thinks the other node is not active, they fail to >> properly send a join message to the other node during the election. >> They will essentially form a singleton cluster(??). > > Till now your analysis is correct. Corosync is really unable to send > join message and forms single node cluster. > >> It is more puzzling when you start one node at a time, waiting for the >> node to stabilize before starting the other. It is like the first >> node will never see the remote knet interfaces become active, >> regardless of how long you wait. > > This shouldn't happen. Knet will eventually receive enough pongs so > corosync broadcast message to other nodes, which founds out that new > membership should be formed. > >> >> The solution is to manually set the knet ping_timeout and >> ping_interval to lower values than the default values derived from the >> totem token. This seems to allow for the knet transport to determine >> link status of all nodes before the election timer pops. > > These timeouts are indeed not the best one. I had few ideas how to > improve them, because currently they are in favor of multiple links > clusters. Single links cluster may work better with slightly different > defaults. > >> >> I tested this on both physical hardware and with VMs. Both react >> similarly. >> >> Bare bones test case to reproduce: >> yum install pcs pacemaker fence-agents-all >> firewall-cmd --permanent --add-service=high-availability >> firewall-cmd --add-service=high-availability >> systemctl start pcsd.service >> systemctl enable pcsd.service >> systemctl disable corosync >> systemctl disable pacemaker >> passwd hacluster >> pcs host auth node1 node2 >> pcs cluster setup rhcs_test node1 node2 totem token=41000 >> pcs cluster start --all >> >> Example command to create cluster that will properly form and get quorum: >> pcs cluster setup rhcs_test node1 node2 totem token=61000 transport >> knet link ping_interval=1250 ping_timeout=2500 >> >> Hope this helps someone in the future. > > Yup. It is interesting finding and thanks for that. > > Regards, > Honza > >> >> Thanks >> Robert >> >> >> Robert Hayden | Lead Technology Architect | Cerner Corporation >> >> >> CONFIDENTIALITY NOTICE This message and any included attachments are >> from Cerner Corporation and are intended only for the addressee. The >> information contained in this message is confidential and may >> constitute inside or non-public information under international, >> federal, or state securities laws. Unauthorized forwarding, printing, >> copying, distribution, or use of such information is strictly >> prohibited and may be unlawful. If you are not the addressee, please >> promptly delete this message and notify the sender of the delivery >> error by e-mail or you may call Cerner's corporate offices in Kansas >> City, Missouri, U.S.A at (+1) (816)221-1024. >> >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/