On Wed, 2018-08-01 at 14:47 -0600, Casey & Gina wrote: > Actually, is it even necessary at all? Based on my other E-mail to > the list (Fence agent ends up stopped with no clear reason why), it > seems that sometimes the monitor fails with an "unknown error", > resulting in a cluster that won't fail over due to inability to > fence. I tried looking at the fence agent to determine which API
A failed monitor (or start) shouldn't prevent the cluster from using the device for fencing. If actual fence actions are failing, that should be seen separately from the fence resource failures. > calls might be being executed but I can't figure that out myself...in > any case I don't see how this is offering any real value...happy to > learn how I might be wrong, though... > > > On 2018-08-01, at 2:26 PM, Casey & Gina <[email protected]> > > wrote: > > > > How is the interval adjusted? Based on an example I found online, > > I thought `pcs resource op monitor interval=15m vmware_fence` > > should work, but after executing that `pcs config` still shows a > > monitor interval of 60s. A resource can have more than one monitor, so that command by itself just adds a second monitor. You have to delete the original one separately with pcs resource op remove. > > > > Thank you, > > -- > > Casey > > > > > On 2018-07-31, at 9:11 AM, Casey Allen Shobe <caseyandgina@icloud > > > .com> wrote: > > > > > > Aha, thank you! I missed the blatantly obvious. I will discuss > > > with my colleague and likely use a longer interval. > > > > > > > On Jul 30, 2018, at 11:25 PM, Klaus Wenninger <kwenning@redhat. > > > > com> wrote: > > > > > > > > > On 07/31/2018 01:47 AM, Casey & Gina wrote: > > > > > I've set up a number of clusters in a VMware environment, and > > > > > am using the fence_vmware_rest agent for fencing (from fence- > > > > > agents 4.2.1), as follows: > > > > > > > > > > Stonith Devices: > > > > > Resource: vmware_fence (class=stonith type=fence_vmware_rest) > > > > > Attributes: ip=<host> username=<username> password=<password> > > > > > ssl_insecure=1 pcmk_host_check=static-list pcmk_host_list=b- > > > > > gp2-dbpg35-1;b-gp2-dbpg35-2;b-gp2-dbpg35-3 > > > > > Operations: monitor interval=60s (vmware_fence-monitor- > > > > > interval-60s) > > > > > > > > > > We are using a dedicated service account on the VMware side > > > > > for pacemaker. > > > > > > > > > > The clusters are running fine, and no failover events have > > > > > happened recently. However, our VMware admin came to me > > > > > asking why the pacemaker service account is logging in and > > > > > executing API calls very frequently (for an environment where > > > > > there are 3 clusters, 9 nodes total, he is seeing ~1400 API > > > > > > > > Haven't looked at the internals of fence_vmware_rest but > > > > sounds like 2-3 API-calls per monitoring (or around 10 API- > > > > calls > > > > if it is just one monitored instance per cluster - what the > > > > config > > > > snippet from above looks like). > > > > Have you tried to increase the 60s monitoring interval? > > > > > > > > Klaus > > > > > calls per hour as this user). I do not see anything logged > > > > > in corosync.log about why this would be, and my limited > > > > > understanding was that the fence agent would only be calling > > > > > the power off and reboot API's when pacemaker couldn't get a > > > > > response from a node in the cluster. I thought that using a > > > > > static-list for the host_check would prevent any API calls > > > > > for getting a list of hosts, although even if that were going > > > > > on I would think it would be a rare event. His concern is > > > > > that this amount of load on the vmware hosts isn't > > > > > sustainable. > > > > > > > > > > Unfortunately the logging available from vmWare doesn't give > > > > > a lot of information - it just says the number of API calls, > > > > > not which API(s) were called. > > > > > > > > > > Any ideas what might be going on? Is there a way to get > > > > > increased logging for the fence agent? > > > > > > > > > > Thanks in advance, > > > > > > _______________________________________________ > > > Users mailing list: [email protected] > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra > > > tch.pdf > > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <[email protected]> _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
