Your message dated Fri, 19 May 2017 19:42:39 +0200 with message-id <[email protected]> and subject line Re: Bug#756606: [Pkg-postgresql-public] Bug#756606: postgresql-9.1: Init-Script does not work together with heartbeat has caused the Debian Bug report #756606, regarding postgresql-9.1: Init-Script does not work together with heartbeat to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 756606: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756606 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: postgresql-9.1 Version: 9.1.13-0wheezy1 Severity: important Dear Maintainer, After drbd, heartbeat and postgresql-9.1 is installed and basically configured, the attempt to run postgresql init script from heartbeat's haresources fails in multiple ways. Unfortunately, I cannot even tell why it doesn't work, since I cannot even see reasons why this doesn't work in ha-debug log with debuging enabled. The following lists what's going wrong: 1) A usual case for Postgres HA Clusters is to have /var/lib/postgresql in a DRBD synced resource, which is only mounted on one node at a time. When you have a resource group configured to start drbddisk, mount /var/lib/postgresql, start postgresql (in that order - see haresources - file listed later in this report) and start up heartbeat on both nodes, these resources are only started on the primary node for this resource group (first field in haresources file). These resources are not acquired on the standby - node. Unfortunately, when stoping heartbeat on the standby node, heartbeat nevertheless tries to give up resources, even it hasn't acquired them before. Since /var/lib/postgresql wasn't mounted before on that node, issuing "/etc/init.d/postgresql stop" on the standby node fails, since it cannot find necessary files in /var/lib/postgresql . Even without having heartbeat STONITH configured, this leads to a hard server reset somehow. Solution: "/etc/init.d/postgresql stop" shouldn't return an error when the datadir is empty to make it usable along with heartbeat. 2) When starting heartbeat, it seems like postgresql isn't started at all. I do not understand this, since all other init-scripts I have tested (samba, cron) are working fine when used instead of postgresql in quoted haresources file below. I have tried this on multiple, clean Debian wheezy installs from Bare metal server to workstation VirtualBox setups. The result is always the same. You find the logs and configurations used following this line: /etc/ha.d/haresources : prod-cl3 drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 IPaddr::192.168.20.18/24/eth0 postgresql ======================= /etc/ha.d/ha.cf : debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 90 udpport 694 ucast eth1 10.250.250.16 auto_failback on node prod-cl3 node prod-cl4 ======================= /etc/drbd.conf : include "drbd.d/global_common.conf"; include "drbd.d/*.res"; ======================= /etc/drbd.d/global_common.conf : global { usage-count yes; } common { protocol C; startup { wfc-timeout 15; degr-wfc-timeout 120; } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 96256; } } ======================= /etc/drbd.d/prod-cl.res : resource var_lib_postgres { protocol C; on prod-cl3 { device /dev/drbd0; disk /dev/prod-cl3_data/var_lib_postgres; address 10.250.250.16:7789; meta-disk internal; } on prod-cl4 { device /dev/drbd0; disk /dev/prod-cl4_data/var_lib_postgres; address 10.250.250.17:7789; meta-disk internal; } } ======================= ha-debug log, showing postgres isn't even started on primary node when heartbeat starts: Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Core dumps could be lost if multiple dumps occur. Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Pacemaker support: false Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: ************************** Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Configuration validated. Starting heartbeat 3.0.5 Jul 30 13:51:11 prod-cl3 heartbeat: [20847]: info: heartbeat: version 3.0.5 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Heartbeat generation: 1406638883 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound send socket to device: eth1 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound receive socket to device: eth1 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: started on port 694 interface eth1 to 10.250.250.17 Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'up' Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: node prod-cl4: is dead Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Comm_now_up(): updating status to active Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'active' Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: No STONITH device configured. Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: Shared disks are not protected. Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Resources being acquired from prod-cl4. Jul 30 13:52:43 prod-cl3 heartbeat: [20876]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[20876]: 2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/status status Jul 30 13:52:43 prod-cl3 heartbeat: [20877]: info: Local Resource acquisition completed. mach_down[20910]: 2014/07/30_13:52:43 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): child count 2 Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: mach_down takeover complete. Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Initial resource acquisition complete (mach_down) Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): child count 1 mach_down[20910]: 2014/07/30_13:52:43 info: mach_down takeover complete for node prod-cl4. Jul 30 13:52:43 prod-cl3 heartbeat: [20968]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[20968]: 2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp ip-request-resp[20968]: 2014/07/30_13:52:43 received ip-request-resp drbddisk::var_lib_postgres OK yes ResourceManager[20989]: 2014/07/30_13:52:43 info: Acquiring resource group: prod-cl3 drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 postgresql ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/drbddisk var_lib_postgres start Filesystem[21057]: 2014/07/30_13:52:43 INFO: Resource is stopped ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/postgresql ext4 start Filesystem[21131]: 2014/07/30_13:52:43 INFO: Running start for /dev/drbd0 on /var/lib/postgresql FATAL: Module scsi_hostadapter not found. Filesystem[21125]: 2014/07/30_13:52:43 INFO: Success INFO: Success IPaddr[21200]: 2014/07/30_13:52:43 INFO: Resource is stopped ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/IPaddr 192.168.20.18/24/eth0 start IPaddr[21282]: 2014/07/30_13:52:43 INFO: Using calculated netmask for 192.168.20.18: 255.255.255.0 IPaddr[21282]: 2014/07/30_13:52:43 INFO: eval ifconfig eth0:0 192.168.20.18 netmask 255.255.255.0 broadcast 192.168.20.255 IPaddr[21258]: 2014/07/30_13:52:43 INFO: Success INFO: Success Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: Local Resource acquisition completed. (none) Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: local resource transition completed. ======================= ha-debug log, showing server crash when postgresql isn't properly stoped (due to missing files in datadir as described): Jul 30 13:57:49 prod-cl4 heartbeat: [3340]: info: Heartbeat shutdown in progress. (3340) Jul 30 13:57:49 prod-cl4 heartbeat: [3410]: info: Giving up all HA resources. ResourceManager[3424]: 2014/07/30_13:57:49 info: Releasing resource group: prod-cl3 drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 postgresql ResourceManager[3424]: 2014/07/30_13:57:49 info: Running /etc/init.d/postgresql stop Stopping PostgreSQL 9.1 database server: mainError: /var/lib/postgresql/9.1/main is not accessible or does not exist ... failed! failed! ResourceManager[3424]: 2014/07/30_13:57:50 ERROR: Return code 1 from /etc/init.d/postgresql ResourceManager[3424]: 2014/07/30_13:57:51 info: Retrying failed stop operation [postgresql] ResourceManager[3424]: 2014/07/30_13:5 -- System Information: Debian Release: 7.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-4-amd64 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages postgresql-9.1 depends on: ii libc6 2.13-38+deb7u3 ii libcomerr2 1.42.5-1.1 ii libgssapi-krb5-2 1.10.1+dfsg-5+deb7u1 ii libkrb5-3 1.10.1+dfsg-5+deb7u1 ii libldap-2.4-2 2.4.31-1+nmu2 ii libpam0g 1.1.3-7.1 ii libpq5 9.1.13-0wheezy1 ii libssl1.0.0 1.0.1e-2+deb7u11 ii libxml2 2.8.0+dfsg1-7+wheezy1 ii locales 2.13-38+deb7u3 ii postgresql-client-9.1 9.1.13-0wheezy1 ii postgresql-common 134wheezy4 ii ssl-cert 1.0.32 ii tzdata 2014e-0wheezy1 postgresql-9.1 recommends no packages. Versions of packages postgresql-9.1 suggests: pn locales-all <none> pn oidentd | ident-server <none> -- no debconf information
--- End Message ---
--- Begin Message ---Re: To Marc Richter 2014-07-31 <[email protected]> > > Unfortunately, when stoping heartbeat on the standby node, heartbeat > > nevertheless tries to give up resources, even it hasn't acquired them > > before. Since /var/lib/postgresql wasn't mounted before on that node, > > issuing "/etc/init.d/postgresql stop" on the standby node fails, since it > > cannot find necessary files in /var/lib/postgresql . > > The init script was never designed to be a drop-in heartbeat HA agent. > The exit codes are probably simply wrong in some cases for that. > > Any reason you aren't using the pgsql agent provided by pacemaker? With the switch to systemd, and lots of PostgreSQL agents for HA resource managers available, fixing the init script for heartbeat isn't really going to happen anymore. Closing this bug now. Thanks for the report, Christoph
--- End Message ---
_______________________________________________ Pkg-postgresql-public mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-postgresql-public
