On 12/05/2016 09:30 AM, Darko Gavrilovic wrote: > On 12/5/2016 10:17 AM, Ken Gaillot wrote: >> On 12/03/2016 05:19 AM, Darko Gavrilovic wrote: >>> Here is the output for that resource.. edited >>> >>> primitive svc-mysql ocf:heartbeat:mysql \ >>> params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" >>> datadir="/var/lib/mysql" user="mysql" group="mysql" >>> log="/var/log/mysqld.log" pid="/var/run/mysqld/mysqld.pid" >>> socket="/var/lib/mysql/mysql.sock" test_table="***" test_user="***" >>> test_passwd="****" \ >>> op monitor interval="30s" timeout="60s" OCF_CHECK_LEVEL="5" \ >>> op start interval="0" timeout="120s" \ >>> op stop interval="0" timeout="120s" \ >>> meta target-role="Started" migration-threshold="2" >>> >>> ...skipping >>> order mysql-before-httpd inf: svc-mysql:start svc-httpd:start >>> order mysql-before-ssh inf: svc-mysql:start svc-ssh:start >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.0.6-f709c638237cdff7556cb6ab615f32826c0f8c06" \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="2" \ >>> last-lrm-refresh="1480762389" \ >>> no-quorum-policy="ignore" \ >>> stonith-enabled="true" \ >>> ms-drbd0="Master" >>> >>> >>> dg >>> >>> >>> On 12/3/2016 1:25 AM, Kostiantyn Ponomarenko wrote: >>>> I assume that you are using crmsh. >>>> If so, I suggest to post an output from "crm configure show" command >>>> here. >>>> >>>> Thank you, >>>> Kostia >>>> >>>> On Sat, Dec 3, 2016 at 5:54 AM, Darko Gavrilovic >>>> <da...@chass.utoronto.ca <mailto:da...@chass.utoronto.ca>> wrote: >>>> >>>> Hello, I have a two node cluster running that seems to be >>>> failing to >>>> start resources. >>>> >>>> Resource Group: services >>>> svc-mysql (ocf::heartbeat:mysql) Stopped >>>> svc-httpd (ocf::heartbeat:apache) Stopped >>>> svc-ssh (lsb:sshd-virt) Stopped >>>> svc-tomcat6 (lsb:tomcat6) Stopped >>>> svc-plone (lsb:plone) Stopped >>>> svc-bacula (lsb:bacula-fd-virt) Stopped >>>> >>>> When I run crm resource start services the service group does not >>>> start. >>>> >>>> I also tried starting the first resource in the group. >>>> crm resource start svc-mysql >>>> >>>> It does not start either. >>>> >>>> The error I am seeing is: >>>> Dec 2 21:59:43 pengine: [25829]: WARN: native_color: Resource >>>> svc-mysql cannot run anywhere >>>> Dec 2 22:00:26 pengine: [25829]: WARN: native_color: Resource >>>> svc-mysql cannot run anywhere >> >> The most common reasons for the above message are: >> >> * Location or order constraints don't leave any place for the resource >> to run >> >> * The resource has failed the maximum number of times on all nodes. >> (Does "crm_mon" show any failures?) > > crm_mon does not list any failures for this service group from what I > can see. > >> >>>> >>>> 4b4f-a239-8a10dad40587, cib=0.3857.2) : Resource op removal >>>> Dec 2 21:59:32 server1 crmd: [25830]: info: te_rsc_command: > <snip> > >>>> Initiating action 56: monitor svc-mysql_monitor_0 on server2 >>>> Dec 2 21:59:33 server1 crmd: [25830]: WARN: status_from_rc: Action >>>> 56 (svc-mysql_monitor_0) on server2 failed (target: 7 vs. rc: 0): >>>> Error >> >> The above error indicates that mysql is running on server2 but the >> cluster didn't start it there. (The "_monitor_0" is called a "probe"; >> it's used to check the status of the service before the cluster starts >> it. The "target: 7" means it expects the service to be stopped. The "rc: >> 0" means the service is actually running.) >> >> Make sure you're not starting mysql at boot or by any other means than >> the cluster. > > Thanks. Is there a way or command for me to check on which server does > the cluster think the service last started on?
crm_mon shows what the cluster thinks the current state is, including what resources are started where. For way more (user-unfriendly) detail, you can look at the operation history in the CIB XML, which you can see with (for example) "cibadmin -Q". The <status> section will have a <node_state> entry for each node, with a <lrm_resource> entry for each resource, with <lrm_rsc_op> entries for all failed operations and the most recent successful operation. So you can see all the *_start_0 operations with a whole bunch of information including rc-code which is what it returned (0 is success). >>>> Dec 2 21:59:33 server1 crmd: [25830]: info: >>>> abort_transition_graph: >>>> match_graph_event:272 - Triggered transition abort (complete=0, >>>> tag=lrm_rsc_op, id=svc-mysql_monitor_0, >>>> magic=0:0;56:15:7:aee06ee3-9576-4b4f-a239-8a10dad40587, >>>> cib=0.3859.2) : Event failed >>>> Dec 2 21:59:33 server1 crmd: [25830]: info: match_graph_event: >>>> Action svc-mysql_monitor_0 (56) confirmed on server2 (rc=4) >>>> Dec 2 21:59:33 server1 crmd: [25830]: info: te_rsc_command: >>>> Initiating action 187: stop svc-mysql_stop_0 on server2 >>>> Dec 2 21:59:35 server1 crmd: [25830]: info: match_graph_event: >>>> Action svc-mysql_stop_0 (187) confirmed on server2 (rc=0) >>>> Dec 2 22:10:20 server1 crmd: [19708]: info: do_lrm_rsc_op: >>>> Performing key=101:1:7:6e477ca6-4ffe-4e89-82c2-c6149d528128 >>>> op=svc-mysql_monitor_0 ) >>>> Dec 2 22:10:20 server1 crmd: [19708]: info: process_lrm_event: LRM >>>> operation svc-mysql_monitor_0 (call=51, rc=7, cib-update=42, >>>> confirmed=true) not running >>>> Dec 2 22:12:24 server1 crmd: [19708]: info: te_rsc_command: >>>> Initiating action 102: monitor svc-mysql_monitor_0 on server2 >>>> Dec 2 22:12:24 server1 crmd: [19708]: info: match_graph_event: >>>> Action svc-mysql_monitor_0 (102) confirmed on server2 (rc=0) >>>> >>>> >>>> Any advice on how to tackle this? >>>> >>>> dg _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org