Re: [ClusterLabs] Pacemaker not invoking monitor after $interval

2016-05-20 Thread Felix Zachlod (Lists)
> -Ursprüngliche Nachricht-
> Von: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> Gesendet: Freitag, 20. Mai 2016 13:52
> An: Felix Zachlod (Lists) <fz.li...@onesty-tech.de>
> Cc: users@clusterlabs.org
> Betreff: Re: [ClusterLabs] Pacemaker not invoking monitor after 
> $interval
> 
> Le Fri, 20 May 2016 11:33:39 +,
> "Felix Zachlod (Lists)" <fz.li...@onesty-tech.de> a écrit :
> 
> > Hello!
> >
> > I am currently working on a cluster setup which includes several 
> > resources with "monitor interval=XXs" set. As far as I understand 
> > this should run the monitor action on the resource agent every XX 
> > seconds. But it seems it doesn't.
> 
> How do you know it doesn't? Are you looking at crm_mon? log files?

I created a debug output from my RA. Furthermore I had a blackbox dump.
But it now turned out, that for my resource I had to change meta-data to 
advertise 

monitor action twice (one for slave, one for master) and setup 

op monitor role=x interval=y instead of op monitor interval=x

Since that I changed it at least for this resource monitor is working as 
desired. At least for now. Not sure why a Master/Slave resource has to have 
distinct monitor actions advertised for both roles but it seems related to that.

Still don't see any monitor invocations in the log but seems there is still 
something wrong with the log level.

Thanks anyway!

regards, Felix
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker not invoking monitor after $interval

2016-05-20 Thread Felix Zachlod (Lists)
Hello!

I am currently working on a cluster setup which includes several resources with 
"monitor interval=XXs" set. As far as I understand this should run the monitor 
action on the resource agent every XX seconds. But it seems it doesn't. 
Actually monitor is only invoked in special condition, e.g. cleanup, start and 
so on, but never for a running (or stopped) resource. So it won't detect any 
resource failures, unless a manual action takes place. It won't update master 
preference either when set in the monitor action.

Are there any special conditions under which the monitor will not be executed? 
(Cluster IS managed though)

property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.13-10.el7_2.2-44eb2dd \
cluster-infrastructure=corosync \
cluster-name=sancluster \
maintenance-mode=false \
symmetric-cluster=false \
last-lrm-refresh=1463739404 \
stonith-enabled=true \
stonith-action=reboot

Thank you in advance, regards, Felix

--
Mit freundlichen Grüßen
Dipl. Inf. (FH) Felix Zachlod

Onesty Tech GmbH
Lieberoser Str. 7
03046 Cottbus

Tel.: +49 (355) 289430
Fax.: +49 (355) 28943100
f...@onesty-tech.de

Registergericht Amtsgericht Cottbus, HRB 7885 Geschäftsführer Romy Schötz, 
Thomas Menzel



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker reload Master/Slave resource

2016-05-20 Thread Felix Zachlod (Lists)
version 1.1.13-10.el7_2.2-44eb2dd

Hello!

I am currently developing a master/slave resource agent. So far it is working 
just fine, but this resource agent implements reload() and this does not work 
as expected when running as Master:
The reload action is invoked and it succeeds returning 0. The resource is still 
Master and monitor will return $OCF_RUNNING_MASTER.

But Pacemaker considers the instance being slave afterwards. Actually only 
reload is invoked, no monitor, no demote etc.

I first thought that reload should possibly return $OCF_RUNNING_MASTER too but 
this leads to the resource failing on reload. It seems 0 is the only valid 
return code.

I can recover the cluster state running resource $resourcename promote, which 
will call

notify
promote
notify

Afterwards my resource is considered Master again. After  PEngine Recheck Timer 
(I_PE_CALC) just popped (90ms), the cluster manager will promote the 
resource itself.
But this can lead to unexpected results, it could promote the resource on the 
wrong node so that both sides are actually running master, the cluster will not 
even notice it does not call monitor either.

Is this a bug?

regards, Felix


trace   May 20 12:58:31 cib_create_op(609):0: Sending call options: 0010, 
1048576
trace   May 20 12:58:31 cib_native_perform_op_delegate(384):0: Sending 
cib_modify message to CIB service (timeout=120s)
trace   May 20 12:58:31 crm_ipc_send(1175):0: Sending from client: cib_shm 
request id: 745 bytes: 1070 timeout:12 msg...
trace   May 20 12:58:31 crm_ipc_send(1188):0: Message sent, not waiting for 
reply to 745 from cib_shm to 1070 bytes...
trace   May 20 12:58:31 cib_native_perform_op_delegate(395):0: Reply: No data 
to dump as XML
trace   May 20 12:58:31 cib_native_perform_op_delegate(398):0: Async call, 
returning 268
trace   May 20 12:58:31 do_update_resource(2274):0: Sent resource state update 
message: 268 for reload=0 on scst_dg_ssd
trace   May 20 12:58:31 cib_client_register_callback_full(606):0: Adding 
callback cib_rsc_callback for call 268
trace   May 20 12:58:31 process_lrm_event(2374):0: Op scst_dg_ssd_reload_0 
(call=449, stop-id=scst_dg_ssd:449, remaining=3): Confirmed
notice  May 20 12:58:31 process_lrm_event(2392):0: Operation 
scst_dg_ssd_reload_0: ok (node=alpha, call=449, rc=0, cib-update=268, 
confirmed=true)
debug   May 20 12:58:31 update_history_cache(196):0: Updating history for 
'scst_dg_ssd' with reload op
trace   May 20 12:58:31 crm_ipc_read(992):0: No message from lrmd received: 
Resource temporarily unavailable
trace   May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition from 
lrmd[0x22b0ec0] failed: No message of desired type (-42)
trace   May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
trace   May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: 
C_FSA_INTERNAL   State: S_NOT_DC
trace   May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
trace   May 20 12:58:31 crm_fsa_trigger(295):0: Exited  (queue len: 0)
trace   May 20 12:58:31 crm_ipc_read(989):0: Received cib_shm event 2108, 
size=183, rc=183, text: http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org