BTW, where I can download Corosync 3.x? I've only seen Corosync 2.99.3 Alpha4 at http://corosync.github.io/corosync/
2018-08-23 9:11 GMT+02:00 David Tolosa <[email protected]>: > I'm currently using an Ubuntu 18.04 server configuration with netplan. > > Here you have my current YAML configuration: > > # This file describes the network interfaces available on your system > # For more information, see netplan(5). > network: > version: 2 > renderer: networkd > ethernets: > eno1: > addresses: [192.168.0.1/24] > enp4s0f0: > addresses: [192.168.1.1/24] > enp5s0f0: > {} > vlans: > vlan.XXX: > id: XXX > link: enp5s0f0 > addresses: [ 10.1.128.5/29 ] > gateway4: 10.1.128.1 > nameservers: > addresses: [ 8.8.8.8, 8.8.4.4 ] > search: [ foo.com, bar.com ] > vlan.YYY: > id: YYY > link: enp5s0f0 > addresses: [ 10.1.128.5/29 ] > > > So, eno1 and enp4s0f0 are the two ethernet ports connected each other > with crossover cables to node2. > enp5s0f0 port is used to connect outside/services using vlans defined in > the same file. > > In short, I'm using systemd-networkd default Ubuntu 18 server service for > manage networks. Im not detecting any NetworkManager-config-server > package in my repository neither. > So the only solution that I have left, I suppose, is to test corosync 3.x > and see if it works better handling RRP. > > Thank you for your quick response! > > 2018-08-23 8:40 GMT+02:00 Jan Friesse <[email protected]>: > >> David, >> >> Hello, >>> Im getting crazy about this problem, that I expect to resolve here, with >>> your help guys: >>> >>> I have 2 nodes with Corosync redundant ring feature. >>> >>> Each node has 2 similarly connected/configured NIC's. Both nodes are >>> connected each other by two crossover cables. >>> >> >> I believe this is root of the problem. Are you using NetworkManager? If >> so, have you installed NetworkManager-config-server? If not, please install >> it and test again. >> >> >>> I configured both nodes with rrp mode passive. Everything is working well >>> at this point, but when I shutdown 1 node to test failover, and this >>> node > returns to be online, corosync is marking the interface as FAULTY >>> and rrp >>> >> >> I believe it's because with crossover cables configuration when other >> side is shutdown, NetworkManager detects it and does ifdown of the >> interface. And corosync is unable to handle ifdown properly. Ifdown is bad >> with single ring, but it's just killer with RRP (127.0.0.1 poisons every >> node in the cluster). >> >> fails to recover the initial state: >>> >>> 1. Initial scenario: >>> >>> # corosync-cfgtool -s >>> Printing ring status. >>> Local node ID 1 >>> RING ID 0 >>> id = 192.168.0.1 >>> status = ring 0 active with no faults >>> RING ID 1 >>> id = 192.168.1.1 >>> status = ring 1 active with no faults >>> >>> >>> 2. When I shutdown the node 2, all continues with no faults. Sometimes >>> the >>> ring ID's are bonding with 127.0.0.1 and then bond back to their >>> respective >>> heartbeat IP. >>> >> >> Again, result of ifdown. >> >> >>> 3. When node 2 is back online: >>> >>> # corosync-cfgtool -s >>> Printing ring status. >>> Local node ID 1 >>> RING ID 0 >>> id = 192.168.0.1 >>> status = ring 0 active with no faults >>> RING ID 1 >>> id = 192.168.1.1 >>> status = Marking ringid 1 interface 192.168.1.1 FAULTY >>> >>> >>> # service corosync status >>> ● corosync.service - Corosync Cluster Engine >>> Loaded: loaded (/lib/systemd/system/corosync.service; enabled; >>> vendor >>> preset: enabled) >>> Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min >>> 38s ago >>> Docs: man:corosync >>> man:corosync.conf >>> man:corosync_overview >>> Main PID: 1439 (corosync) >>> Tasks: 2 (limit: 4915) >>> CGroup: /system.slice/corosync.service >>> └─1439 /usr/sbin/corosync -f >>> >>> >>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] >>> The >>> network interface [192.168.0.1] is now up. >>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface >>> [192.168.0.1] is now up. >>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] >>> The >>> network interface [192.168.1.1] is now up. >>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface >>> [192.168.1.1] is now up. >>> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM ] A >>> new membership (192.168.0.1:601760) was formed. Members >>> Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership ( >>> 192.168.0.1:601760) was formed. Members >>> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM ] A >>> new membership (192.168.0.1:601764) was formed. Members joined: 2 >>> Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership ( >>> 192.168.0.1:601764) was formed. Members joined: 2 >>> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM ] >>> Marking ringid 1 interface 192.168.1.1 FAULTY >>> Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1 >>> interface >>> 192.168.1.1 FAULTY >>> >>> >>> If I execute corosync-cfgtool, clears the faulty error but after some >>> seconds return to be FAULTY. >>> The only thing that it resolves the problem is to restart de service with >>> service corosync restart. >>> >>> Here you have some of my configuration settings on node 1 (I probed >>> already >>> to change rrp_mode): >>> >>> *- corosync.conf* >>> >>> >>> totem { >>> version: 2 >>> cluster_name: node >>> token: 5000 >>> token_retransmits_before_loss_const: 10 >>> secauth: off >>> threads: 0 >>> rrp_mode: passive >>> nodeid: 1 >>> interface { >>> ringnumber: 0 >>> bindnetaddr: 192.168.0.0 >>> #mcastaddr: 226.94.1.1 >>> mcastport: 5405 >>> broadcast: yes >>> } >>> interface { >>> ringnumber: 1 >>> bindnetaddr: 192.168.1.0 >>> #mcastaddr: 226.94.1.2 >>> mcastport: 5407 >>> broadcast: yes >>> } >>> } >>> >>> logging { >>> fileline: off >>> to_stderr: yes >>> to_syslog: yes >>> to_logfile: yes >>> logfile: /var/log/corosync/corosync.log >>> debug: off >>> timestamp: on >>> logger_subsys { >>> subsys: AMF >>> debug: off >>> } >>> } >>> >>> amf { >>> mode: disabled >>> } >>> >>> quorum { >>> provider: corosync_votequorum >>> expected_votes: 2 >>> } >>> >>> nodelist { >>> node { >>> nodeid: 1 >>> ring0_addr: 192.168.0.1 >>> ring1_addr: 192.168.1.1 >>> } >>> >>> node { >>> nodeid: 2 >>> ring0_addr: 192.168.0.2 >>> ring1_addr: 192.168.1.2 >>> } >>> } >>> >>> aisexec { >>> user: root >>> group: root >>> } >>> >>> service { >>> name: pacemaker >>> ver: 1 >>> } >>> >>> >>> >>> *- /etc/hosts* >>> >>> >>> 127.0.0.1 localhost >>> 10.4.172.5 node1.upc.edu node1 >>> 10.4.172.6 node2.upc.edu node2 >>> >>> >> So machines have 3 NICs? 2 for corosync/cluster traffic and one for >> regular traffic/services/outside world? >> >> >>> Thank you for you help in advance! >>> >> >> To conclude: >> - If you are using NetworkManager, try to install >> NetworkManager-config-server, it will probably help >> - If you are brave enough, try corosync 3.x (current Alpha4 is pretty >> stable - actually some other projects gain this stability with SP1 :) ) >> that has no RRP but uses knet for support redundant links (up-to 8 links >> can be configured) and doesn't have problems with ifdown. >> >> Honza >> >> >>> >>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> > > > -- > *David Tolosa Martínez* > Customer Support & Infrastructure > UPCnet - Edifici Vèrtex > Plaça d'Eusebi Güell, 6, 08034 Barcelona > Tel: 934054555 > > <https://www.upcnet.es> > -- *David Tolosa Martínez* Customer Support & Infrastructure UPCnet - Edifici Vèrtex Plaça d'Eusebi Güell, 6, 08034 Barcelona Tel: 934054555 <https://www.upcnet.es> -- INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES: Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i les relacions professionals i comercials amb els nostres clients i proveïdors | Base legal: consentiment, interès legítim i/o relació contractual | Destinataris: no seran comunicades a tercers excepte per obligació legal | Drets: pots exercir els teus drets d’accés, rectificació i supressió, així com els altres drets reconeguts a la normativa vigent, enviant-nos un missatge a [email protected] <mailto:[email protected]> | Més informació: consulta la nostra política completa de protecció de dades <https://www.upcnet.es/politica-de-privacitat>. AVÍS DE CONFIDENCIALITAT <https://www.upcnet.es/ca/avis-de-confidencialitat>
_______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
