Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)
Hi, Every action on httpd is very slow due to ModSecurity 2.9. The reload in postrotate may take awhile. Here is the output log for message this morning : Jun 14 03:43:05 mail-px-** crmd[2685]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jun 14 03:43:05 mail-px-** pengine[2684]: notice: On loss of CCM Quorum: Ignore Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op monitor for WebSite on node1: not running (7) Jun 14 03:43:05 mail-px-** pengine[2684]: notice: Recover WebSite#011(Started node1) Jun 14 03:43:05 mail-px-** pengine[2684]: notice: Calculated Transition 367: /var/lib/pacemaker/pengine/pe-input-173.bz2 Jun 14 03:43:05 mail-px-** pengine[2684]: notice: On loss of CCM Quorum: Ignore Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op monitor for WebSite on node1: not running (7) Jun 14 03:43:05 mail-px-** pengine[2684]: notice: Recover WebSite#011(Started node1) Jun 14 03:43:05 mail-px-** crmd[2685]: notice: Initiating action 4: stop WebSite_stop_0 on node1 (local) Jun 14 03:43:05 mail-px-** systemd: Reloading. Jun 14 03:43:05 mail-px-** pengine[2684]: notice: Calculated Transition 368: /var/lib/pacemaker/pengine/pe-input-174.bz2 Jun 14 03:43:05 mail-px-** systemd: Configuration file /usr/lib/systemd/system/fusioninventory-agent.service is marked executable. Please remove executable permission bits. Proceeding anyway. Jun 14 03:43:05 mail-px-** systemd: Configuration file /usr/lib/systemd/system/auditd.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jun 14 03:43:05 mail-px-** systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Jun 14 03:43:05 mail-px-** systemd: Removed slice user-0.slice. Jun 14 03:43:05 mail-px-** systemd: Stopping user-0.slice. Jun 14 03:44:35 mail-px-** systemd: httpd.service stop-sigterm timed out. Killing. Jun 14 03:44:35 mail-px-** systemd: httpd.service: main process exited, code=killed, status=9/KILL Jun 14 03:44:35 mail-px-** systemd: Stopped The Apache HTTP Server. Jun 14 03:44:35 mail-px-** systemd: Unit httpd.service entered failed state. Jun 14 03:44:35 mail-px-** systemd: httpd.service failed. Jun 14 03:44:37 mail-px-** crmd[2685]: notice: Operation WebSite_stop_0: ok (node=node1, call=29, rc=0, cib-update=464, confirmed=true) Jun 14 03:44:37 mail-px-** crmd[2685]: notice: Initiating action 10: start WebSite_start_0 on node1 (local) Jun 14 03:44:37 mail-px-** systemd: Reloading. Jun 14 03:44:37 mail-px-** systemd: Configuration file /usr/lib/systemd/system/fusioninventory-agent.service is marked executable. Please remove executable permission bits. Proceeding anyway. Jun 14 03:44:37 mail-px-** systemd: Configuration file /usr/lib/systemd/system/auditd.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jun 14 03:44:37 mail-px-** systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. Jun 14 03:44:37 mail-px-** systemd: Configuration file /run/systemd/system/httpd.service.d/50-pacemaker.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jun 14 03:44:37 mail-px-** systemd: Starting Cluster Controlled httpd... Jun 14 03:44:55 mail-px-** puppet-agent[1645]: Did not receive certificate Jun 14 03:44:57 mail-px-** systemd: Started Cluster Controlled httpd. Jun 14 03:44:59 mail-px-** crmd[2685]: notice: Operation WebSite_start_0: ok (node=node1, call=30, rc=0, cib-update=465, confirmed=true) Jun 14 03:44:59 mail-px-** crmd[2685]: notice: Initiating action 3: monitor WebSite_monitor_30 on node1 (local) Jun 14 03:44:59 mail-px-** crmd[2685]: notice: Transition 368 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete Jun 14 03:44:59 mail-px-** crmd[2685]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] The strange thing is that the problem is not present every logrotate... Jérémy -Message d'origine- De : Ken Gaillot [mailto:kgail...@redhat.com] Envoyé : mardi 14 juin 2016 16:40 À : users@clusterlabs.org Objet : Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7) On 06/14/2016 03:10 AM, Jeremy Voisin wrote: > Hi all, > > > > We actually have a 2 nodes cluster with corosync and pacemaker for > httpd. We have 2 VIP configured. > > > > Since weve added ModSecurity 2.9, httpd restart is very slow. So I > increased the start / stop timeout. But
Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)
On 06/14/2016 03:10 AM, Jeremy Voisin wrote: > Hi all, > > > > We actually have a 2 nodes cluster with corosync and pacemaker for > httpd. We have 2 VIP configured. > > > > Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I > increased the start / stop timeout. But sometimes, after logrotate the > following error occurs : > > > > Failed Actions: > > * WebSite_monitor_30 on node1 'not running' (7): call=26, > status=complete, exitreason='none', > > last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms > > > > Here is the full output of crm_mon : > > Last updated: Tue Jun 14 07:22:28 2016 Last change: Fri Jun 10 > 09:28:03 2016 by root via cibadmin on node1 > > Stack: corosync > > Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with > quorum > > 2 nodes and 4 resources configured > > > > Online: [ node1 node2 ] > > > > WebSite (systemd:httpd):Started node1 > > Resource Group: WAFCluster > > VirtualIP (ocf::heartbeat:IPaddr2): Started node1 > > MailMon(ocf::heartbeat:MailTo):Started node1 > > VirtualIP2 (ocf::heartbeat:IPaddr2): Started node1 > > > > Failed Actions: > > * WebSite_monitor_30 on node1 'not running' (7): call=26, > status=complete, exitreason='none', > > last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms > > > > # pcs resource --full > > Resource: WebSite (class=systemd type=httpd) > > Attributes: configfile=/etc/httpd/conf/httpd.conf > statusurl=http://127.0.0.1/server-status monitor=1min > > Operations: monitor interval=300s (WebSite-monitor-interval-300s) > > start interval=0s timeout=300s (WebSite-start-interval-0s) > > stop interval=0s timeout=300s (WebSite-stop-interval-0s) > > Group: WAFCluster > > Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2) > >Attributes: ip=195.70.7.74 cidr_netmask=27 > >Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s) > >stop interval=0s timeout=20s (VirtualIP-stop-interval-0s) > >monitor interval=30s (VirtualIP-monitor-interval-30s) > > Resource: MailMon (class=ocf provider=heartbeat type=MailTo) > >Attributes: email=sys...@dfi.ch > >Operations: start interval=0s timeout=10 (MailMon-start-interval-0s) > >stop interval=0s timeout=10 (MailMon-stop-interval-0s) > >monitor interval=10 timeout=10 (MailMon-monitor-interval-10) > > Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2) > >Attributes: ip=195.70.7.75 cidr_netmask=27 > >Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s) > >stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s) > >monitor interval=30s (VirtualIP2-monitor-interval-30s) > > > > > > If I run /crm_resource –P/ the Failed Actions disappear. > > > > How can I fix the monitor “not running” error ? > > > > Thanks, > > Jérémy Why does logrotate cause the site to stop responding? Normally it's a graceful restart, which shouldn't cause any interruptions. Any solution will have to be in logrotate, to keep it from interrupting service. Personally, my preferred configuration is to make apache log to syslog instead of its usual log file. You can even configure syslog to log it to the usual file, so there's no major difference. Then, you don't need a separate logrotate script for apache, it gets rotated with the system log. That avoids having to restart apache, which for a busy site can be a big deal. It also gives you the option of tying into syslog tools such as remote logging. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)
Hi all, We actually have a 2 nodes cluster with corosync and pacemaker for httpd. We have 2 VIP configured. Since weve added ModSecurity 2.9, httpd restart is very slow. So I increased the start / stop timeout. But sometimes, after logrotate the following error occurs : Failed Actions: * WebSite_monitor_30 on node1 'not running' (7): call=26, status=complete, exitreason='none', last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms Here is the full output of crm_mon : Last updated: Tue Jun 14 07:22:28 2016 Last change: Fri Jun 10 09:28:03 2016 by root via cibadmin on node1 Stack: corosync Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum 2 nodes and 4 resources configured Online: [ node1 node2 ] WebSite (systemd:httpd):Started node1 Resource Group: WAFCluster VirtualIP (ocf::heartbeat:IPaddr2): Started node1 MailMon(ocf::heartbeat:MailTo):Started node1 VirtualIP2 (ocf::heartbeat:IPaddr2): Started node1 Failed Actions: * WebSite_monitor_30 on node1 'not running' (7): call=26, status=complete, exitreason='none', last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms # pcs resource --full Resource: WebSite (class=systemd type=httpd) Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://127.0.0.1/server-status monitor=1min Operations: monitor interval=300s (WebSite-monitor-interval-300s) start interval=0s timeout=300s (WebSite-start-interval-0s) stop interval=0s timeout=300s (WebSite-stop-interval-0s) Group: WAFCluster Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=195.70.7.74 cidr_netmask=27 Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s) stop interval=0s timeout=20s (VirtualIP-stop-interval-0s) monitor interval=30s (VirtualIP-monitor-interval-30s) Resource: MailMon (class=ocf provider=heartbeat type=MailTo) Attributes: email=sys...@dfi.ch Operations: start interval=0s timeout=10 (MailMon-start-interval-0s) stop interval=0s timeout=10 (MailMon-stop-interval-0s) monitor interval=10 timeout=10 (MailMon-monitor-interval-10) Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=195.70.7.75 cidr_netmask=27 Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s) stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s) monitor interval=30s (VirtualIP2-monitor-interval-30s) If I run crm_resource P the Failed Actions disappear. How can I fix the monitor not running error ? Thanks, Jérémy smime.p7s Description: S/MIME cryptographic signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org