subject:"\"\\\\\\\[ClusterLabs\\\\\\\] Processing failed op monitor for WebSite on node1\\\\\\\: not running \\\\\\\(7\\\\\\\)\""

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Jeremy Voisin

Hi,

Every action on httpd is very slow due to ModSecurity 2.9. The reload in
postrotate may take awhile.

Here is the output log for message this morning : 
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
367: /var/lib/pacemaker/pengine/pe-input-173.bz2
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: On loss of CCM Quorum:
Ignore
Jun 14 03:43:05 mail-px-** pengine[2684]: warning: Processing failed op
monitor for WebSite on node1: not running (7)
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Recover
WebSite#011(Started node1)
Jun 14 03:43:05 mail-px-** crmd[2685]:  notice: Initiating action 4: stop
WebSite_stop_0 on node1 (local)
Jun 14 03:43:05 mail-px-** systemd: Reloading.
Jun 14 03:43:05 mail-px-** pengine[2684]:  notice: Calculated Transition
368: /var/lib/pacemaker/pengine/pe-input-174.bz2
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:43:05 mail-px-** systemd: Removed slice user-0.slice.
Jun 14 03:43:05 mail-px-** systemd: Stopping user-0.slice.
Jun 14 03:44:35 mail-px-** systemd: httpd.service stop-sigterm timed out.
Killing.
Jun 14 03:44:35 mail-px-** systemd: httpd.service: main process exited,
code=killed, status=9/KILL
Jun 14 03:44:35 mail-px-** systemd: Stopped The Apache HTTP Server.
Jun 14 03:44:35 mail-px-** systemd: Unit httpd.service entered failed state.
Jun 14 03:44:35 mail-px-** systemd: httpd.service failed.
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Operation WebSite_stop_0: ok
(node=node1, call=29, rc=0, cib-update=464, confirmed=true)
Jun 14 03:44:37 mail-px-** crmd[2685]:  notice: Initiating action 10: start
WebSite_start_0 on node1 (local)
Jun 14 03:44:37 mail-px-** systemd: Reloading.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/fusioninventory-agent.service is marked executable.
Please remove executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Configuration file
/run/systemd/system/httpd.service.d/50-pacemaker.conf is marked
world-inaccessible. This has no effect as configuration data is accessible
via APIs without restrictions. Proceeding anyway.
Jun 14 03:44:37 mail-px-** systemd: Starting Cluster Controlled httpd...
Jun 14 03:44:55 mail-px-** puppet-agent[1645]: Did not receive certificate
Jun 14 03:44:57 mail-px-** systemd: Started Cluster Controlled httpd.
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Operation WebSite_start_0:
ok (node=node1, call=30, rc=0, cib-update=465, confirmed=true)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Initiating action 3: monitor
WebSite_monitor_30 on node1 (local)
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: Transition 368 (Complete=4,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete
Jun 14 03:44:59 mail-px-** crmd[2685]:  notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

The strange thing is that the problem is not present every logrotate...

Jérémy

-Message d'origine-
De : Ken Gaillot [mailto:kgail...@redhat.com] 
Envoyé : mardi 14 juin 2016 16:40
À : users@clusterlabs.org
Objet : Re: [ClusterLabs] Processing failed op monitor for WebSite on node1:
not running (7)

On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for 
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since weve added ModSecurity 2.9, httpd restart is very slow. So I 
> increased the start / stop timeout. But

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Ken Gaillot

On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I
> increased the start / stop timeout. But sometimes, after logrotate the
> following error occurs :
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> Here is the full output of crm_mon :
> 
> Last updated: Tue Jun 14 07:22:28 2016  Last change: Fri Jun 10
> 09:28:03 2016 by root via cibadmin on node1
> 
> Stack: corosync
> 
> Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
> quorum
> 
> 2 nodes and 4 resources configured
> 
>  
> 
> Online: [ node1 node2 ]
> 
>  
> 
> WebSite (systemd:httpd):Started node1
> 
> Resource Group: WAFCluster
> 
>  VirtualIP  (ocf::heartbeat:IPaddr2):   Started node1
> 
>  MailMon(ocf::heartbeat:MailTo):Started node1
> 
>  VirtualIP2 (ocf::heartbeat:IPaddr2):   Started node1
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> # pcs resource --full
> 
> Resource: WebSite (class=systemd type=httpd)
> 
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://127.0.0.1/server-status monitor=1min
> 
>   Operations: monitor interval=300s (WebSite-monitor-interval-300s)
> 
>   start interval=0s timeout=300s (WebSite-start-interval-0s)
> 
>   stop interval=0s timeout=300s (WebSite-stop-interval-0s)
> 
> Group: WAFCluster
> 
>   Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.74 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP-monitor-interval-30s)
> 
>   Resource: MailMon (class=ocf provider=heartbeat type=MailTo)
> 
>Attributes: email=sys...@dfi.ch
> 
>Operations: start interval=0s timeout=10 (MailMon-start-interval-0s)
> 
>stop interval=0s timeout=10 (MailMon-stop-interval-0s)
> 
>monitor interval=10 timeout=10 (MailMon-monitor-interval-10)
> 
>   Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.75 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP2-monitor-interval-30s)
> 
>  
> 
>  
> 
> If I run /crm_resource –P/ the Failed Actions disappear.
> 
>  
> 
> How can I fix the monitor “not running” error ?
> 
>  
> 
> Thanks,
> 
> Jérémy

Why does logrotate cause the site to stop responding? Normally it's a
graceful restart, which shouldn't cause any interruptions.

Any solution will have to be in logrotate, to keep it from interrupting
service.

Personally, my preferred configuration is to make apache log to syslog
instead of its usual log file. You can even configure syslog to log it
to the usual file, so there's no major difference. Then, you don't need
a separate logrotate script for apache, it gets rotated with the system
log. That avoids having to restart apache, which for a busy site can be
a big deal. It also gives you the option of tying into syslog tools such
as remote logging.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Jeremy Voisin

Hi all,

 

We actually have a 2 nodes cluster with corosync and pacemaker for httpd. We
have 2 VIP configured.

 

Since weve added ModSecurity 2.9, httpd restart is very slow. So I
increased the start / stop timeout. But sometimes, after logrotate the
following error occurs : 

 

Failed Actions:

* WebSite_monitor_30 on node1 'not running' (7): call=26,
status=complete, exitreason='none',

last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms

 

Here is the full output of crm_mon : 

Last updated: Tue Jun 14 07:22:28 2016  Last change: Fri Jun 10
09:28:03 2016 by root via cibadmin on node1

Stack: corosync

Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
quorum

2 nodes and 4 resources configured

 

Online: [ node1 node2 ]

 

WebSite (systemd:httpd):Started node1

Resource Group: WAFCluster

 VirtualIP  (ocf::heartbeat:IPaddr2):   Started node1

 MailMon(ocf::heartbeat:MailTo):Started node1

 VirtualIP2 (ocf::heartbeat:IPaddr2):   Started node1

 

Failed Actions:

* WebSite_monitor_30 on node1 'not running' (7): call=26,
status=complete, exitreason='none',

last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms

 

# pcs resource --full

Resource: WebSite (class=systemd type=httpd)

  Attributes: configfile=/etc/httpd/conf/httpd.conf
statusurl=http://127.0.0.1/server-status monitor=1min

  Operations: monitor interval=300s (WebSite-monitor-interval-300s)

  start interval=0s timeout=300s (WebSite-start-interval-0s)

  stop interval=0s timeout=300s (WebSite-stop-interval-0s)

Group: WAFCluster

  Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)

   Attributes: ip=195.70.7.74 cidr_netmask=27

   Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s)

   stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)

   monitor interval=30s (VirtualIP-monitor-interval-30s)

  Resource: MailMon (class=ocf provider=heartbeat type=MailTo)

   Attributes: email=sys...@dfi.ch

   Operations: start interval=0s timeout=10 (MailMon-start-interval-0s)

   stop interval=0s timeout=10 (MailMon-stop-interval-0s)

   monitor interval=10 timeout=10 (MailMon-monitor-interval-10)

  Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)

   Attributes: ip=195.70.7.75 cidr_netmask=27

   Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s)

   stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s)

   monitor interval=30s (VirtualIP2-monitor-interval-30s)

 

 

If I run crm_resource P the Failed Actions disappear.

 

How can I fix the monitor not running error ?

 

Thanks,

Jérémy



smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

[ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

3 matches

Site Navigation

Mail list logo

Footer information