----- On Aug 7, 2017, at 10:43 PM, kgaillot kgail...@redhat.com wrote: > > The logs are very useful, but not particularly easy to follow. It takes > some practice and experience, but I think it's worth it if you have to > troubleshoot cluster events often.
I will give my best. > > It's on the to-do list to create a "Troubleshooting Pacemaker" document > that helps with this and using tools such as crm_simulate. > > The first step in understanding the logs is to learn what the pacemaker > daemons are and what they do, and what the DC node is. It starts to make > more sense from there: > > pacemakerd: spawns all other daemons and re-spawns them if they crash > attrd: manages node attributes > cib: manages reading/writing the configuration > lrmd: executes resource agents > pengine: given a cluster state, determines any actions needed > crmd: manages cluster membership and carries out the pengine's > decisions by asking the lrmd to perform actions > > At any given time, one node's crmd in the cluster (or partition if there > is a network split) is elected as the DC (designated controller). The DC > asks the pengine what needs to be done, then farms out the results to > all the other crmd's, which (if necessary) call their local lrmd to > actually execute the actions. > That's very helpful. Thanks. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org