On Wed, 2017-08-16 at 15:20 +0200, Lentes, Bernd wrote: > > > Hi, > > > > > > > What happened: > > I tried to configure a simple drbd resource following > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296 > > I used this simple snip from the doc: > > configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \ > > op monitor interval=60s > > > > I did it on live cluster, which is in testing currently. I will never do > > this > > again. Shadow will be my friend. > > > > The cluster reacted promptly: > > crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params > > drbd_resource=idcc-devel \ > > > op monitor interval=60 > > WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller > > than the > > advised 240 > > WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than > > the > > advised 100 > > WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, > > it > > may not be supported by the RA > > > > From what i understand until now is that i didn't configure start/stop > > operations, so the cluster chooses the default from default-action-timeout. > > It didn't configure the monitor operation, because this is not in the > > meta-data. > > > > > The log says: > > Aug 1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta > > parameter misconfigured, expected clone-max -le 2, but found unset. > > > > ^^^^^^^^^ > > Aug 1 14:19:33 ha-idg-1 crmd[4692]: notice: process_lrm_event: Operation > > prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, > > rc=6, > > cib-update=37, confirmed=true) > > Aug 1 14:19:33 ha-idg-1 crmd[4692]: notice: process_lrm_event: Operation > > prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6, > > cib-update=38, confirmed=true) > > > > > > > crm_mon said: > > Failed actions: > > prim_drbd_idcc_devel_stop_0 on ha-idg-1 'not configured' (6): call=6967, > > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 > > 2017', > > queued=0ms, exec=41ms > > prim_drbd_idcc_devel_monitor_60000 on ha-idg-1 'not configured' (6): > > call=6968, > > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 > > 2017', > > queued=0ms, exec=41ms > > prim_drbd_idcc_devel_stop_0 on ha-idg-2 'not configured' (6): call=6963, > > status=complete, exit-reason='none', last-rc-change='Tue Aug 1 14:28:33 > > 2017', > > queued=0ms, exec=40ms > > > > A big problem was that i have a ClusterMon resource running on each node. It > > triggered about 20000 snmp traps in 193 seconds to my management station, > > which > > triggered 20000 e-Mails ... > > From where comes this incredible amount of traps ? Nearly all traps said > > that > > stop is not configured for the drdb resource. Why complaining so often ? And > > why stopping after ~20.000 traps ? > > And complaining about not configured monitor operation just 8 times. > > Ok. I configured the drbd resource wrong/completely, and that caused the > trouble. > What i would like to know: > - from where does crm_mon retrieves its information ?
It uses the C API to be notified of CIB changes (which has all the cluster state) and stonith events, and additionally polls the state every couple of seconds. > - why did i get tons of lines in syslog ? One message that the resource isn't > configured correctly/completely would be enough. > I got thousands and thousands lines telling the same. I'm not sure from this information. Most commonly, if a resource agent start fails, and migration-threshold is left at the default (1,000,000), it's the result of retrying start/stop repeatedly. However, "not configured" is a fatal error, so pacemaker wouldn't retry that particular operation. It would log the message every time a new operation was executed and returned that result, and every time it did a policy engine run (until the error was cleaned up). > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
