Hi Christine, Thanks for reply. Let me add few details. When I run the corosync service I se the corosync process running. If I stop it and run:
corosync -f I see three warnings: warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used. warning [MAIN ] Please migrate config file to nodelist. warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1) warning [MAIN ] Could not set priority -2147483648: Permission denied (13) but I see node joined. My corosync.conf file is below. With service corosync up and running I have the following output: corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 10.0.0.11 status = ring 0 active with no faults RING ID 1 id = 192.168.0.11 status = ring 1 active with no faults corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.0.11) r(1) ip(192.168.0.11) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.0.12) r(1) ip(192.168.0.12) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined For the moment I have two nodes in my cluster (third node and some issues and at the moment I did crm node standby on it). Here the dependency I have installed for corosync (that works fine with pacemaker 1.1.14 and corosync 2.3.5): libnspr4-dev_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb libnspr4_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb libnss3-dev_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb libnss3-nssdb_2%253a3.19.2.1-0ubuntu0.14.04.2_all.deb libnss3_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb libqb-dev_0.16.0.real-1ubuntu4_amd64.deb libqb0_0.16.0.real-1ubuntu4_amd64.deb corosync.conf --------------------- quorum { provider: corosync_votequorum expected_votes: 3 } totem { version: 2 crypto_cipher: none crypto_hash: none rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 10.0.0.0 mcastport: 5405 ttl: 1 } interface { ringnumber: 1 bindnetaddr: 192.168.0.0 mcastport: 5405 ttl: 1 } transport: udpu max_network_delay: 100 retransmits_before_loss_const: 25 window_size: 150 } nodelist { node { ring0_addr: pg1 ring1_addr: pg1p nodeid: 1 } node { ring0_addr: pg2 ring1_addr: pg2p nodeid: 2 } node { ring0_addr: pg3 ring1_addr: pg3p nodeid: 3 } } logging { to_syslog: yes } > On 22 Jun 2018, at 09:24, Christine Caulfield <ccaul...@redhat.com> wrote: > > On 21/06/18 16:16, Salvatore D'angelo wrote: >> Hi, >> >> I upgraded my PostgreSQL/Pacemaker cluster with these versions. >> Pacemaker 1.1.14 -> 1.1.18 >> Corosync 2.3.5 -> 2.4.4 >> Crmsh 2.2.0 -> 3.0.1 >> Resource agents 3.9.7 -> 4.1.1 >> >> I started on a first node (I am trying one node at a time upgrade). >> On a PostgreSQL slave node I did: >> >> *crm node standby <node>* >> *service pacemaker stop* >> *service corosync stop* >> >> Then I build the tool above as described on their GitHub.com >> <http://GitHub.com <http://github.com/>> page. >> >> *./autogen.sh (where required)* >> *./configure* >> *make (where required)* >> *make install* >> >> Everything went ok. I expect new file overwrite old one. I left the >> dependency I had with old software because I noticed the .configure >> didn’t complain. >> I started corosync. >> >> *service corosync start* >> >> To verify corosync work properly I used the following commands: >> *corosync-cfg-tool -s* >> *corosync-cmapctl | grep members* >> >> Everything seemed ok and I verified my node joined the cluster (at least >> this is my impression). >> >> Here I verified a problem. Doing the command: >> corosync-quorumtool -ps >> >> I got the following problem: >> Cannot initialise CFG service >> > That says that corosync is not running. Have a look in the log files to > see why it stopped. The pacemaker logs below are showing the same thing, > but we can't make any more guesses until we see what corosync itself is > doing. Enabling debug in corosync.conf will also help if more detail is > needed. > > Also starting corosync with 'corosync -pf' on the command-line is often > a quick way of checking things are starting OK. > > Chrissie > > >> If I try to start pacemaker, I only see pacemaker process running and >> pacemaker.log containing the following lines: >> >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init:Changed >> active directory to /var/lib/pacemaker/cores/ >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >> get_cluster_type:Detected an active 'corosync' cluster/ >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >> mcp_read_config:Reading configure for stack: corosync/ >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main:Starting >> Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc >> lha-fencing nagios corosync-native atomic-attrd acls/ >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main:Maximum core >> file size is: 18446744073709551615/ >> /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: >> qb_ipcs_us_publish:server name: pacemakerd/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: warning: >> corosync_node_name:Could not connect to Cluster Configuration Database >> API, error CS_ERR_TRY_AGAIN/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >> corosync_node_name:Unable to get node name for nodeid 1/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name:Could >> not obtain a node name for corosync nodeid 1/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Created >> entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node >> (null)/1 (1 total)/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Node 1 >> has uuid 1/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >> crm_update_peer_proc:cluster_connect_cpg: Node (null)[1] - corosync-cpg >> is now online/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: error: >> cluster_connect_quorum:Could not connect to the Quorum API: 2/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >> qb_ipcs_us_withdraw:withdrawing server sockets/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main:Exiting pacemakerd/ >> /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: >> crm_xml_cleanup:Cleaning up memory from libxml2/ >> >> *What is wrong in my procedure?* >> >> >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >> https://lists.clusterlabs.org/mailman/listinfo/users >> <https://lists.clusterlabs.org/mailman/listinfo/users> >> >> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org