I tried to install corosync 3.x and it works pretty well. But when I install pacemaker, it installs previous version of corosync as dependency and breaks all the setup. Any suggestions?
2018-08-23 9:32 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: > David, > > BTW, where I can download Corosync 3.x? >> I've only seen Corosync 2.99.3 Alpha4 at http://corosync.github.io/coro >> sync/ >> > > Yes, that's Alpha 4 of Corosync 3. > > > > >> 2018-08-23 9:11 GMT+02:00 David Tolosa <david.tol...@upcnet.es>: >> >> I'm currently using an Ubuntu 18.04 server configuration with netplan. >>> >>> Here you have my current YAML configuration: >>> >>> # This file describes the network interfaces available on your system >>> # For more information, see netplan(5). >>> network: >>> version: 2 >>> renderer: networkd >>> ethernets: >>> eno1: >>> addresses: [192.168.0.1/24] >>> enp4s0f0: >>> addresses: [192.168.1.1/24] >>> enp5s0f0: >>> {} >>> vlans: >>> vlan.XXX: >>> id: XXX >>> link: enp5s0f0 >>> addresses: [ 10.1.128.5/29 ] >>> gateway4: 10.1.128.1 >>> nameservers: >>> addresses: [ 8.8.8.8, 8.8.4.4 ] >>> search: [ foo.com, bar.com ] >>> vlan.YYY: >>> id: YYY >>> link: enp5s0f0 >>> addresses: [ 10.1.128.5/29 ] >>> >>> >>> So, eno1 and enp4s0f0 are the two ethernet ports connected each other >>> with crossover cables to node2. >>> enp5s0f0 port is used to connect outside/services using vlans defined in >>> the same file. >>> >>> In short, I'm using systemd-networkd default Ubuntu 18 server service for >>> >> > Ok, so systemd-networkd is really doing ifdown and somebody actually tries > fix it and merge into upstream (sadly with not too much luck :( ) > > https://github.com/systemd/systemd/pull/7403 > > > manage networks. Im not detecting any NetworkManager-config-server >>> package in my repository neither. >>> >> > I'm not sure how it's called in Debian based distributions, but it's just > one small file in /etc, so you can extract it from RPM. > > So the only solution that I have left, I suppose, is to test corosync 3.x >>> and see if it works better handling RRP. >>> >> > You may also reconsider to try ether completely static network > configuration or NetworkManager + NetworkManager-config-server. > > > Corosync 3.x with knet will work for sure, but be prepared for quite a > long compile path, because you first have to compile knet and then > corosync. What may help you a bit is that we have a ubuntu 18.04 in our > jenkins, so it should be possible corosync build log > https://ci.kronosnet.org/view/corosync/job/corosync-build-al > l-voting/lastBuild/corosync-build-all-voting=ubuntu-18-04-lt > s-x86-64/consoleText, knet build log https://ci.kronosnet.org/view/ > knet/job/knet-build-all-voting/lastBuild/knet-build-all- > voting=ubuntu-18-04-lts-x86-64/consoleText). > > Also please consult http://people.redhat.com/ccaul > fie/docs/KnetCorosync.pdf about changes in corosync configuration. > > Regards, > Honza > > >>> Thank you for your quick response! >>> >>> 2018-08-23 8:40 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: >>> >>> David, >>>> >>>> Hello, >>>> >>>>> Im getting crazy about this problem, that I expect to resolve here, >>>>> with >>>>> your help guys: >>>>> >>>>> I have 2 nodes with Corosync redundant ring feature. >>>>> >>>>> Each node has 2 similarly connected/configured NIC's. Both nodes are >>>>> connected each other by two crossover cables. >>>>> >>>>> >>>> I believe this is root of the problem. Are you using NetworkManager? If >>>> so, have you installed NetworkManager-config-server? If not, please >>>> install >>>> it and test again. >>>> >>>> >>>> I configured both nodes with rrp mode passive. Everything is working >>>>> well >>>>> at this point, but when I shutdown 1 node to test failover, and this >>>>> node > returns to be online, corosync is marking the interface as >>>>> FAULTY >>>>> and rrp >>>>> >>>>> >>>> I believe it's because with crossover cables configuration when other >>>> side is shutdown, NetworkManager detects it and does ifdown of the >>>> interface. And corosync is unable to handle ifdown properly. Ifdown is >>>> bad >>>> with single ring, but it's just killer with RRP (127.0.0.1 poisons every >>>> node in the cluster). >>>> >>>> fails to recover the initial state: >>>> >>>>> >>>>> 1. Initial scenario: >>>>> >>>>> # corosync-cfgtool -s >>>>> Printing ring status. >>>>> Local node ID 1 >>>>> RING ID 0 >>>>> id = 192.168.0.1 >>>>> status = ring 0 active with no faults >>>>> RING ID 1 >>>>> id = 192.168.1.1 >>>>> status = ring 1 active with no faults >>>>> >>>>> >>>>> 2. When I shutdown the node 2, all continues with no faults. Sometimes >>>>> the >>>>> ring ID's are bonding with 127.0.0.1 and then bond back to their >>>>> respective >>>>> heartbeat IP. >>>>> >>>>> >>>> Again, result of ifdown. >>>> >>>> >>>> 3. When node 2 is back online: >>>>> >>>>> # corosync-cfgtool -s >>>>> Printing ring status. >>>>> Local node ID 1 >>>>> RING ID 0 >>>>> id = 192.168.0.1 >>>>> status = ring 0 active with no faults >>>>> RING ID 1 >>>>> id = 192.168.1.1 >>>>> status = Marking ringid 1 interface 192.168.1.1 FAULTY >>>>> >>>>> >>>>> # service corosync status >>>>> ● corosync.service - Corosync Cluster Engine >>>>> Loaded: loaded (/lib/systemd/system/corosync.service; enabled; >>>>> vendor >>>>> preset: enabled) >>>>> Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min >>>>> 38s ago >>>>> Docs: man:corosync >>>>> man:corosync.conf >>>>> man:corosync_overview >>>>> Main PID: 1439 (corosync) >>>>> Tasks: 2 (limit: 4915) >>>>> CGroup: /system.slice/corosync.service >>>>> └─1439 /usr/sbin/corosync -f >>>>> >>>>> >>>>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] >>>>> The >>>>> network interface [192.168.0.1] is now up. >>>>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface >>>>> [192.168.0.1] is now up. >>>>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] >>>>> The >>>>> network interface [192.168.1.1] is now up. >>>>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface >>>>> [192.168.1.1] is now up. >>>>> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM ] >>>>> A >>>>> new membership (192.168.0.1:601760) was formed. Members >>>>> Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership ( >>>>> 192.168.0.1:601760) was formed. Members >>>>> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM ] >>>>> A >>>>> new membership (192.168.0.1:601764) was formed. Members joined: 2 >>>>> Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership ( >>>>> 192.168.0.1:601764) was formed. Members joined: 2 >>>>> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM ] >>>>> Marking ringid 1 interface 192.168.1.1 FAULTY >>>>> Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1 >>>>> interface >>>>> 192.168.1.1 FAULTY >>>>> >>>>> >>>>> If I execute corosync-cfgtool, clears the faulty error but after some >>>>> seconds return to be FAULTY. >>>>> The only thing that it resolves the problem is to restart de service >>>>> with >>>>> service corosync restart. >>>>> >>>>> Here you have some of my configuration settings on node 1 (I probed >>>>> already >>>>> to change rrp_mode): >>>>> >>>>> *- corosync.conf* >>>>> >>>>> >>>>> totem { >>>>> version: 2 >>>>> cluster_name: node >>>>> token: 5000 >>>>> token_retransmits_before_loss_const: 10 >>>>> secauth: off >>>>> threads: 0 >>>>> rrp_mode: passive >>>>> nodeid: 1 >>>>> interface { >>>>> ringnumber: 0 >>>>> bindnetaddr: 192.168.0.0 >>>>> #mcastaddr: 226.94.1.1 >>>>> mcastport: 5405 >>>>> broadcast: yes >>>>> } >>>>> interface { >>>>> ringnumber: 1 >>>>> bindnetaddr: 192.168.1.0 >>>>> #mcastaddr: 226.94.1.2 >>>>> mcastport: 5407 >>>>> broadcast: yes >>>>> } >>>>> } >>>>> >>>>> logging { >>>>> fileline: off >>>>> to_stderr: yes >>>>> to_syslog: yes >>>>> to_logfile: yes >>>>> logfile: /var/log/corosync/corosync.log >>>>> debug: off >>>>> timestamp: on >>>>> logger_subsys { >>>>> subsys: AMF >>>>> debug: off >>>>> } >>>>> } >>>>> >>>>> amf { >>>>> mode: disabled >>>>> } >>>>> >>>>> quorum { >>>>> provider: corosync_votequorum >>>>> expected_votes: 2 >>>>> } >>>>> >>>>> nodelist { >>>>> node { >>>>> nodeid: 1 >>>>> ring0_addr: 192.168.0.1 >>>>> ring1_addr: 192.168.1.1 >>>>> } >>>>> >>>>> node { >>>>> nodeid: 2 >>>>> ring0_addr: 192.168.0.2 >>>>> ring1_addr: 192.168.1.2 >>>>> } >>>>> } >>>>> >>>>> aisexec { >>>>> user: root >>>>> group: root >>>>> } >>>>> >>>>> service { >>>>> name: pacemaker >>>>> ver: 1 >>>>> } >>>>> >>>>> >>>>> >>>>> *- /etc/hosts* >>>>> >>>>> >>>>> 127.0.0.1 localhost >>>>> 10.4.172.5 node1.upc.edu node1 >>>>> 10.4.172.6 node2.upc.edu node2 >>>>> >>>>> >>>>> So machines have 3 NICs? 2 for corosync/cluster traffic and one for >>>> regular traffic/services/outside world? >>>> >>>> >>>> Thank you for you help in advance! >>>>> >>>>> >>>> To conclude: >>>> - If you are using NetworkManager, try to install >>>> NetworkManager-config-server, it will probably help >>>> - If you are brave enough, try corosync 3.x (current Alpha4 is pretty >>>> stable - actually some other projects gain this stability with SP1 :) ) >>>> that has no RRP but uses knet for support redundant links (up-to 8 links >>>> can be configured) and doesn't have problems with ifdown. >>>> >>>> Honza >>>> >>>> >>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc >>>>> /Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>>> >>>>> >>>> >>> >>> -- >>> *David Tolosa Martínez* >>> Customer Support & Infrastructure >>> UPCnet - Edifici Vèrtex >>> Plaça d'Eusebi Güell, 6, 08034 Barcelona >>> Tel: 934054555 >>> >>> <https://www.upcnet.es> >>> >>> >> >> >> > -- *David Tolosa Martínez* Customer Support & Infrastructure UPCnet - Edifici Vèrtex Plaça d'Eusebi Güell, 6, 08034 Barcelona Tel: 934054555 <https://www.upcnet.es> -- INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES: Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i les relacions professionals i comercials amb els nostres clients i proveïdors | Base legal: consentiment, interès legítim i/o relació contractual | Destinataris: no seran comunicades a tercers excepte per obligació legal | Drets: pots exercir els teus drets d’accés, rectificació i supressió, així com els altres drets reconeguts a la normativa vigent, enviant-nos un missatge a priv...@upcnet.es <mailto:priv...@upcnet.es> | Més informació: consulta la nostra política completa de protecció de dades <https://www.upcnet.es/politica-de-privacitat>. AVÍS DE CONFIDENCIALITAT <https://www.upcnet.es/ca/avis-de-confidencialitat>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org