Hi,
Thanks for reply. I scratched my cluster and created it again and then migrated as before. This time I uninstalled pacemaker, corosync, crmsh and resource agents with make uninstall
then I installed new packages. The problem is the same, when I launch: corosync-quorumtool -ps
I got: Cannot initialize QUORUM service
Here the log with debug enabled: |
corosync.log
Description: Binary data
[18019] pg3 corosyncerror [QB ] couldn't create circular mmap on /dev/shm/qb-cfg-event-18020-18028-23-data [18019] pg3 corosyncerror [QB ] qb_rb_open:cfg-event-18020-18028-23: Resource temporarily unavailable (11) [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: /dev/shm/qb-cfg-request-18020-18028-23-header [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: /dev/shm/qb-cfg-response-18020-18028-23-header [18019] pg3 corosyncerror [QB ] shm connection FAILED: Resource temporarily unavailable (11) [18019] pg3 corosyncerror [QB ] Error in connection setup (18020-18028-23): Resource temporarily unavailable (11)
I tried to check /dev/shm and I am not sure these are the right commands, however:
df -h /dev/shm Filesystem Size Used Avail Use% Mounted on shm 64M 16M 49M 24% /dev/shm
ls /dev/shm qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data qb-quorum-request-18020-18095-32-data qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header qb-quorum-request-18020-18095-32-header
Is 64 Mb enough for /dev/shm. If no, why it worked with previous corosync release?
On 22/06/18 11:23, Salvatore D'angelo wrote:Hi, Here the log:
[17323] pg1 corosyncerror [QB ] couldn't create circular mmap on/dev/shm/qb-cfg-event-17324-17334-23-data[17323] pg1 corosyncerror [QB ]qb_rb_open:cfg-event-17324-17334-23: Resource temporarily unavailable (11)[17323] pg1 corosyncdebug [QB ] Free'ing ringbuffer:/dev/shm/qb-cfg-request-17324-17334-23-header[17323] pg1 corosyncdebug [QB ] Free'ing ringbuffer:/dev/shm/qb-cfg-response-17324-17334-23-header[17323] pg1 corosyncerror [QB ] shm connection FAILED: Resourcetemporarily unavailable (11)[17323] pg1 corosyncerror [QB ] Error in connection setup(17324-17334-23): Resource temporarily unavailable (11)[17323] pg1 corosyncdebug [QB ] qb_ipcs_disconnect(17324-17334-23)state:0is /dev/shm full?Chrissie
On 22 Jun 2018, at 12:10, Christine Caulfield <ccaul...@redhat.com> wrote:
On 22/06/18 10:39, Salvatore D'angelo wrote:
Hi,
Can you tell me exactly which log you need. I’ll provide you as soon as possible.
Regarding some settings, I am not the original author of this cluster. People created it left the company I am working with and I inerithed the code and sometime I do not know why some settings are used. The old versions of pacemaker, corosync, crash and resource agents were compiled and installed. I simply downloaded the new versions compiled and installed them. I didn’t get any compliant during ./configure that usually checks for library compatibility.
To be honest I do not know if this is the right approach. Should I “make unistall" old versions before installing the new one? Which is the suggested approach? Thank in advance for your help.
OK fair enough!
To be honest the best approach is almost always to get the latest packages from the distributor rather than compile from source. That way you can be more sure that upgrades will be more smoothly. Though, to be honest, I'm not sure how good the Ubuntu packages are (they might be great, they might not, I genuinely don't know)
When building from source and if you don't know the provenance of the previous version then I would recommend a 'make uninstall' first - or removal of the packages if that's where they came from.
One thing you should do is make sure that all the cluster nodes are running the same version. If some are running older versions then nodes could drop out for obscure reasons. We try and keep minor versions on-wire compatible but it's always best to be cautious.
The tidying of your corosync.conf wan wait for the moment, lets get things mostly working first. If you enable debug logging in corosync.conf:
logging { to_syslog: yes debug: on }
Then see what happens and post the syslog file that has all of the corosync messages in it, we'll take it from there.
Chrissie
On 22 Jun 2018, at 11:30, Christine Caulfield <ccaul...@redhat.com> wrote:
On 22/06/18 10:14, Salvatore D'angelo wrote:
Hi Christine,
Thanks for reply. Let me add few details. When I run the corosync service I se the corosync process running. If I stop it and run:
corosync -f
I see three warnings: warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used. warning [MAIN ] Please migrate config file to nodelist. warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1) warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
but I see node joined.
Those certainly need fixing but are probably not the cause. Also why do you have these values below set?
max_network_delay: 100 retransmits_before_loss_const: 25 window_size: 150
I'm not saying they are causing the trouble, but they aren't going to help keep a stable cluster.
Without more logs (full logs are always better than just the bits you think are meaningful) I still can't be sure. it could easily be just that you've overwritten a packaged version of corosync with your own compiled one and they have different configure options or that the libraries now don't match.
Chrissie
My corosync.conf file is below.
With service corosync up and running I have the following output: *corosync-cfgtool -s* Printing ring status. Local node ID 1 RING ID 0 id= 10.0.0.11 status= ring 0 active with no faults RING ID 1 id= 192.168.0.11 status= ring 1 active with no faults
*corosync-cmapctl | grep members* runtime.totem.pg.mrp.srp.*members*.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.*members*.1.ip (str) = r(0) ip(10.0.0.11) r(1) ip(192.168.0.11) runtime.totem.pg.mrp.srp.*members*.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.*members*.1.status (str) = joined runtime.totem.pg.mrp.srp.*members*.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.*members*.2.ip (str) = r(0) ip(10.0.0.12) r(1) ip(192.168.0.12) runtime.totem.pg.mrp.srp.*members*.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.*members*.2.status (str) = joined
For the moment I have two nodes in my cluster (third node and some issues and at the moment I did crm node standby on it).
Here the dependency I have installed for corosync (that works fine with pacemaker 1.1.14 and corosync 2.3.5): libnspr4-dev_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb libnspr4_2%253a4.10.10-0ubuntu0.14.04.1_amd64.deb libnss3-dev_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb libnss3-nssdb_2%253a3.19.2.1-0ubuntu0.14.04.2_all.deb libnss3_2%253a3.19.2.1-0ubuntu0.14.04.2_amd64.deb libqb-dev_0.16.0.real-1ubuntu4_amd64.deb libqb0_0.16.0.real-1ubuntu4_amd64.deb
*corosync.conf* --------------------- quorum { provider: corosync_votequorum expected_votes: 3 } totem { version: 2 crypto_cipher: none crypto_hash: none rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 10.0.0.0 mcastport: 5405 ttl: 1 } interface { ringnumber: 1 bindnetaddr: 192.168.0.0 mcastport: 5405 ttl: 1 } transport: udpu max_network_delay: 100 retransmits_before_loss_const: 25 window_size: 150 } nodelist { node { ring0_addr: pg1 ring1_addr: pg1p nodeid: 1 } node { ring0_addr: pg2 ring1_addr: pg2p nodeid: 2 } node { ring0_addr: pg3 ring1_addr: pg3p nodeid: 3 } } logging { to_syslog: yes }
On 22 Jun 2018, at 09:24, Christine Caulfield <ccaul...@redhat.com <mailto:ccaul...@redhat.com>> wrote:
On 21/06/18 16:16, Salvatore D'angelo wrote:
Hi,
I upgraded my PostgreSQL/Pacemaker cluster with these versions. Pacemaker 1.1.14 -> 1.1.18 Corosync 2.3.5 -> 2.4.4 Crmsh 2.2.0 -> 3.0.1 Resource agents 3.9.7 -> 4.1.1
I started on a first node (I am trying one node at a time upgrade). On a PostgreSQL slave node I did:
*crm node standby <node>* *service pacemaker stop* *service corosync stop*
Then I build the tool above as described on their GitHub.com <http://GitHub.com> <http://GitHub.com <http://github.com/>> page.
*./autogen.sh (where required)* *./configure* *make (where required)* *make install*
Everything went ok. I expect new file overwrite old one. I left the dependency I had with old software because I noticed the .configure didn’t complain. I started corosync.
*service corosync start*
To verify corosync work properly I used the following commands: *corosync-cfg-tool -s* *corosync-cmapctl | grep members*
Everything seemed ok and I verified my node joined the cluster (at least this is my impression).
Here I verified a problem. Doing the command: corosync-quorumtool -ps
I got the following problem: Cannot initialise CFG service
That says that corosync is not running. Have a look in the log files to see why it stopped. The pacemaker logs below are showing the same thing, but we can't make any more guesses until we see what corosync itself is doing. Enabling debug in corosync.conf will also help if more detail is needed.
Also starting corosync with 'corosync -pf' on the command-line is often a quick way of checking things are starting OK.
Chrissie
If I try to start pacemaker, I only see pacemaker process running and pacemaker.log containing the following lines:
/Jun 21 15:09:38 [17115] pg1 pacemakerd: info: crm_log_init:Changed active directory to /var/lib/pacemaker/cores/ /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: get_cluster_type:Detected an active 'corosync' cluster/ /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: mcp_read_config:Reading configure for stack: corosync/ /Jun 21 15:09:38 [17115] pg1 pacemakerd: notice: main:Starting Pacemaker 1.1.18 | build=2b07d5c5a9 features: libqb-logging libqb-ipc lha-fencing nagios corosync-native atomic-attrd acls/ /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: main:Maximum core file size is: 18446744073709551615/ /Jun 21 15:09:38 [17115] pg1 pacemakerd: info: qb_ipcs_us_publish:server name: pacemakerd/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: warning: corosync_node_name:Could not connect to Cluster Configuration Database API, error CS_ERR_TRY_AGAIN/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: corosync_node_name:Unable to get node name for nodeid 1/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: notice: get_node_name:Could not obtain a node name for corosync nodeid 1/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Created entry 1aeef8ac-643b-44f7-8ce3-d82bbf40bbc1/0x557dc7f05d30 for node (null)/1 (1 total)/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_get_peer:Node 1 has uuid 1/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_update_peer_proc:cluster_connect_cpg: Node (null)[1] - corosync-cpg is now online/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: error: cluster_connect_quorum:Could not connect to the Quorum API: 2/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: qb_ipcs_us_withdraw:withdrawing server sockets/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: main:Exiting pacemakerd/ /Jun 21 15:09:53 [17115] pg1 pacemakerd: info: crm_xml_cleanup:Cleaning up memory from libxml2/
*What is wrong in my procedure?*
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________Users mailing list: Users@clusterlabs.orghttps://lists.clusterlabs.org/mailman/listinfo/usersProject Home: http://www.clusterlabs.orgGetting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdfBugs: http://bugs.clusterlabs.org
|
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org