Re: [ClusterLabs] big trouble with a DRBD resource

Ken Gaillot Wed, 16 Aug 2017 07:34:54 -0700

On Wed, 2017-08-16 at 15:20 +0200, Lentes, Bernd wrote:
> 
> > Hi,
> > 
> 
> > 
> > What happened:
> > I tried to configure a simple drbd resource following
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
> > I used this simple snip from the doc:
> > configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
> >    op monitor interval=60s
> > 
> > I did it on live cluster, which is in testing currently. I will never do 
> > this
> > again. Shadow will be my friend.
> > 
> > The cluster reacted promptly:
> > crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params
> > drbd_resource=idcc-devel \
> >   > op monitor interval=60
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller 
> > than the
> > advised 240
> > WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than 
> > the
> > advised 100
> > WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, 
> > it
> > may not be supported by the RA
> > 
> > From what i understand until now is that i didn't configure start/stop
> > operations, so the cluster chooses the default from default-action-timeout.
> > It didn't configure the monitor operation, because this is not in the 
> > meta-data.
> 
> > 
> > The log says:
> > Aug  1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta
> > parameter misconfigured, expected clone-max -le 2, but found unset.
> >                                                                             
> >                              ^^^^^^^^^
> > Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation
> > prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, 
> > rc=6,
> > cib-update=37, confirmed=true)
> > Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation
> > prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6,
> > cib-update=38, confirmed=true)
> > 
> 
> > 
> > crm_mon said:
> > Failed actions:
> >    prim_drbd_idcc_devel_stop_0 on ha-idg-1 'not configured' (6): call=6967,
> >    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 
> > 2017',
> >    queued=0ms, exec=41ms
> >    prim_drbd_idcc_devel_monitor_60000 on ha-idg-1 'not configured' (6): 
> > call=6968,
> >    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 
> > 2017',
> >    queued=0ms, exec=41ms
> >    prim_drbd_idcc_devel_stop_0 on ha-idg-2 'not configured' (6): call=6963,
> >    status=complete, exit-reason='none', last-rc-change='Tue Aug  1 14:28:33 
> > 2017',
> >    queued=0ms, exec=40ms
> > 
> > A big problem was that i have a ClusterMon resource running on each node. It
> > triggered about 20000 snmp traps in 193 seconds to my management station, 
> > which
> > triggered 20000 e-Mails ...
> > From where comes this incredible amount of traps ? Nearly all traps said 
> > that
> > stop is not configured for the drdb resource. Why complaining so often ? And
> > why stopping after ~20.000 traps ?
> > And complaining about not configured monitor operation just 8 times.
> 
> Ok. I configured the drbd resource wrong/completely, and that caused the 
> trouble.
> What i would like to know:
> - from where does crm_mon retrieves its information ?


It uses the C API to be notified of CIB changes (which has all the
cluster state) and stonith events, and additionally polls the state
every couple of seconds.

> - why did i get tons of lines in syslog ? One message that the resource isn't 
> configured correctly/completely would be enough.
> I got thousands and thousands lines telling the same.

I'm not sure from this information. Most commonly, if a resource agent
start fails, and migration-threshold is left at the default (1,000,000),
it's the result of retrying start/stop repeatedly. However, "not
configured" is a fatal error, so pacemaker wouldn't retry that
particular operation. It would log the message every time a new
operation was executed and returned that result, and every time it did a
policy engine run (until the error was cleaned up).
 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671


_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] big trouble with a DRBD resource

Reply via email to