Hi,I only keep a couple of pe-input file, and that pe-inpurt-1 version was already overwritten.
I redid my tests as describe in my previous mails.
At the end of the test it was again written to pe-input1, which is included as attachment.
gr. Johan On 2013-05-07 04:08, Andrew Beekhof wrote:
I have a much clearer idea of the problem you're seeing now, thankyou. Could you attach /var/lib/pacemaker/pengine/pe-input-1.bz2 from CSE-1 ? On 03/05/2013, at 10:40 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote:Hi, Below you can see my setup and my test, this shows that my cloned resource with on-fail=block does not recover automatically. My Setup: # rpm -aq | grep -i pacemaker pacemaker-libs-1.1.9-1512.el6.i686 pacemaker-cluster-libs-1.1.9-1512.el6.i686 pacemaker-cli-1.1.9-1512.el6.i686 pacemaker-1.1.9-1512.el6.i686 # crm configure show node CSE-1 node CSE-2 primitive d_tomcat ocf:ntc:tomcat \ op monitor interval="15s" timeout="510s" on-fail="block" \ op start interval="0" timeout="510s" \ params instance_name="NMS" monitor_use_ssl="no" monitor_urls="/cse/health" monitor_timeout="120" \ meta migration-threshold="1" primitive ip_11 ocf:heartbeat:IPaddr2 \ op monitor interval="10s" \ params broadcast="172.16.11.31" ip="172.16.11.31" nic="bond0.111" iflabel="ha" \ meta migration-threshold="1" failure-timeout="10" primitive ip_19 ocf:heartbeat:IPaddr2 \ op monitor interval="10s" \ params broadcast="172.18.19.31" ip="172.18.19.31" nic="bond0.119" iflabel="ha" \ meta migration-threshold="1" failure-timeout="10" group svc-cse ip_19 ip_11 clone cl_tomcat d_tomcat colocation colo_tomcat inf: svc-cse cl_tomcat order order_tomcat inf: cl_tomcat svc-cse property $id="cib-bootstrap-options" \ dc-version="1.1.9-1512.el6-2a917dd" \ cluster-infrastructure="cman" \ pe-warn-series-max="9" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ pe-input-series-max="9" \ pe-error-series-max="9" \ last-lrm-refresh="1367582088" Currently only 1 node is available, CSE-1. This is how I am currently testing my setup: => Starting point: Everything up and running # crm resource status Resource Group: svc-cse ip_19 (ocf::heartbeat:IPaddr2): Started ip_11 (ocf::heartbeat:IPaddr2): Started Clone Set: cl_tomcat [d_tomcat] Started: [ CSE-1 ] Stopped: [ d_tomcat:1 ] => Causing failure: Change system so tomcat is running but has a failure (in attachment step_2.log) # crm resource status Resource Group: svc-cse ip_19 (ocf::heartbeat:IPaddr2): Stopped ip_11 (ocf::heartbeat:IPaddr2): Stopped Clone Set: cl_tomcat [d_tomcat] d_tomcat:0 (ocf::ntc:tomcat): Started (unmanaged) FAILED Stopped: [ d_tomcat:1 ] => Fixing failure: Revert system so tomcat is running without failure (in attachment step_3.log) # crm resource status Resource Group: svc-cse ip_19 (ocf::heartbeat:IPaddr2): Stopped ip_11 (ocf::heartbeat:IPaddr2): Stopped Clone Set: cl_tomcat [d_tomcat] d_tomcat:0 (ocf::ntc:tomcat): Started (unmanaged) FAILED Stopped: [ d_tomcat:1 ] As you can see in the logs the OCF script doesn't return any failure. This is noticed by pacemaker, however it doesn't reflect in crm_mon and it doesn't start the depending resources. Gr. Johan On 2013-05-03 03:04, Andrew Beekhof wrote:On 02/05/2013, at 5:45 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote:On 2013-05-01 05:48, Andrew Beekhof wrote:On 17/04/2013, at 9:54 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote:Hi All, I'm trying to setup a specific configuration in our cluster, however I'm struggling with my configuration. This is what I'm trying to achieve: On both nodes of the cluster a daemon must be running (tomcat). Some failover addresses are configured and must be running on the node with a correctly running tomcat. I have this achieved with a cloned tomcat resource and an collocation between the cloned tomcat and the failover addresses. When I cause a failure in the tomcat on the node running the failover addresses, the failover addresses will failover to the other node as expected. crm_mon shows that this tomcat has a failure. When I configure the tomcat resource with failure-timeout=0, the failure alarm in crm_mon isn't cleared whenever the tomcat failure is fixed.All sounds right so far.If my broken tomcat is automatically fixed, I expect this to be noticed by pacemaker and that that node will be able to run my failover addresses, however I don't see this happening.This is very hard to discuss without seeing logs. So you created a tomcat error, waited for pacemaker to notice, fixed the error and observed the pacemaker did not re-notice? How long did you wait? More than the 15s repeat interval I assume? Did at least the resource agent notice?When I configure the tomcat resource with failure-timeout=30, the failure alarm in crm_mon is cleared after 30seconds however the tomcat is still having a failure.Can you define "still having a failure"? You mean it still shows up in crm_mon? Have you read this link? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-rules-recheck.html"Still having a failure" means that the tomcat is still broken and my OCF script reports it as a failure.What I expect is that pacemaker reports the failure as the failure exists and as long as it exists and that pacemaker reports that everything is ok once everything is back ok. Do I do something wrong with my configuration? Or how can I achieve my wanted setup? Here is my configuration: node CSE-1 node CSE-2 primitive d_tomcat ocf:custom:tomcat \ op monitor interval="15s" timeout="510s" on-fail="block" \ op start interval="0" timeout="510s" \ params instance_name="NMS" monitor_use_ssl="no" monitor_urls="/cse/health" monitor_timeout="120" \ meta migration-threshold="1" failure-timeout="0" primitive ip_1 ocf:heartbeat:IPaddr2 \ op monitor interval="10s" \ params nic="bond0" broadcast="10.1.1.1" iflabel="ha" ip="10.1.1.1" primitive ip_2 ocf:heartbeat:IPaddr2 \ op monitor interval="10s" \ params nic="bond0" broadcast="10.1.1.2" iflabel="ha" ip="10.1.1.2" group svc-cse ip_1 ip_2 clone cl_tomcat d_tomcat colocation colo_tomcat inf: svc-cse cl_tomcat order order_tomcat inf: cl_tomcat svc-cse property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="cman" \ no-quorum-policy="ignore" \ stonith-enabled="false" Thanks! Greetings, Johan Huysmans _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org<step_2.log><step_3.log>_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
pe-input-1.bz2
Description: application/bzip
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org