On 3/13/21 12:55 AM, Strahil Nikolov wrote:
I will try to get into the details on monday, when I have access to the cluster again. I guess the /var/log/cluster/corosync.log and /etc/corosync/corosync.conf are the most interesting.

So far, I have 6 node cluster with separate VLANs for HANA replication, prod and backup. Initially, I used pcs to create the corosync.conf with 2 IPs per node, token 40000, consensus 48000 and wait_for_all=1. Later I have expanded the cluster to 3 links and added qnet to the setup (only after I made it run (token 29000) ), so I'm ruling it out.
qdevice isn't using knet - right?
And VOTEQUORUM_QDEVICE_DEFAULT_SYNC_TIMEOUT is 30s. Unrelated coincidence?

Klaus
I updated the cluster nodes from RHEL 8.1 to 8.2 , removed the consensus and enabled debug.

As knet is using udp by default, and because the problem is hitting me both in udp (default settings) and sctp - the problem is not in the protocol.

I've also enabled pacemaker blackbox, although I doubt that has any effect on corosync.

How can I enable trace logs for corosync only ?

Best Regards,
Strahil Nikolov



    On Fri, Mar 12, 2021 at 17:01, Jan Friesse
    <[email protected]> wrote:
    Strahil,

    > Interesting...
    > Yet, this doesn't explain why token of 30000 causes the nodes to
    never assemble a cluster (waiting for half an hour, using
    wait_for_all=1) , while setting it to 29000 works like a charm.

    Definitively.

    Could you please provide a bit more info about your setup
    (config/logs/how many nodes cluster has/...)? Because I've just
    briefly
    tested two nodes setup with 30 sec token timeout and it was working
    perfectly fine.

    >
    > Thankfully we got RH subsciption, so RH devs will provide more
    detailed output on the issue.

    As Jehan correctly noted if it would really get to RH devs it would
    probably get to me ;) But before that GSS will take care of checking
    configs/hw/logs/... and they are really good in finding problems with
    setup/hw/...

    >
    > I was hoping that I missed in the documentation about the
    maximum token size...

    Nope.

    No matter what, if you can send config/logs/... we may try to find
    out
    what is root of the problem here on ML or you can really try GSS,
    but as
    Jehan told, it would be nice if you can post result so other
    people (me
    included) knows what was the main problem.

    Thanks and regards,
      Honza


    >
    > Best Regards,
    > Strahil Nikolov
    >
    >
    >
    >
    >
    >
    > В четвъртък, 11 март 2021 г., 19:12:58 ч. Гринуич+2, Jan Friesse
    <[email protected] <mailto:[email protected]>> написа:
    >
    >
    >
    >
    >
    > Strahil,
    >> Hello all,
    >> I'm building a test cluster on RHEL8.2 and I have noticed that
    the cluster fails to assemble ( nodes stay inquorate as if the
    network is not working) if I set the token at 30000 or more (30s+).
    >
    > Knet waits for enough pong replies for other nodes before it
    marks them
    > as alive and starts sending/receiving packets from them. By
    default it
    > needs to receive 2 pongs and ping is sent 4 times in token
    timeout so it
    > means 15 sec until node is considered up for 30 sec token timeout.
    >
    >> What is the maximum token value with knet ?On SLES12 (I think
    it was  corosync 1) , I used to set the token/consensus with far
    greater values on some of our clusters.
    >
    > I'm really not aware about any arbitrary limits.
    >
    >
    >> Best Regards,Strahil Nikolov
    >>
    >
    > Regards,
    >
    >    Honza
    >
    >>
    >>
    >> _______________________________________________
    >> Manage your subscription:
    >> https://lists.clusterlabs.org/mailman/listinfo/users
    <https://lists.clusterlabs.org/mailman/listinfo/users>
    >>
    >> ClusterLabs home: https://www.clusterlabs.org/
    <https://www.clusterlabs.org/>
    >
    >>
    >


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to