Re: [ClusterLabs] Doing reload right
On 07/20/2016 11:47 AM, Adam Spiers wrote: > Ken Gaillotwrote: >> Hello all, >> >> I've been meaning to address the implementation of "reload" in Pacemaker >> for a while now, and I think the next release will be a good time, as it >> seems to be coming up more frequently. > > [snipped] > > I don't want to comment directly on any of the excellent points which > have been raised in this thread, but it seems like a good time to make > a plea for easier reload / restart of individual instances of cloned > services, one node at a time. Currently, if nodes are all managed by > a configuration management system (such as Chef in our case), when the > system wants to perform a configuration run on that node (e.g. when > updating a service's configuration file from a template), it is > necessary to place the entire node in maintenance mode before > reloading or restarting that service on that node. It works OK, but > can result in ugly effects such as the node getting stuck in > maintenance mode if the chef-client run failed, without any easy way > to track down the original cause. > > I went through several design iterations before settling on this > approach, and they are detailed in a lengthy comment here, which may > help you better understand the challenges we encountered: > > > https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 Wow, that is a lot of hard-earned wisdom. :-) I don't think the problem is restarting individual clone instances. You can already restart an individual clone instance, by unmanaging the resource and disabling any monitors on it, then using crm_resource --force-* on the desired node. The problem (for your use case) is that is-managed is cluster-wide for the given resource. I suspect coming up with a per-node interface/implementation for is-managed would be difficult. If we implement --force-reload, there won't be a problem with reloads, since unmanaging shouldn't be necessary. FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13. > Similar challenges are posed during upgrade of Pacemaker-managed > OpenStack infrastructure. > > Cheers, > Adam > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
Actually, according to http://linux-iscsi.org/wiki/Lio-utils lio-utils has been deprecated and replaced by targetcli. -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path On 7/20/16, 12:09 PM, "Andrei Borzenkov"wrote: 20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: > > # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s > > # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s > > > Failed Actions: > * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', tcm_node is part of lio-utils. I am not familiar with RedHat packages, but I presume that searching for "lio" should reveal something. > last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms > > This is with the following installed: > > pacemaker-cli-1.1.13-10.el7.x86_64 > pacemaker-1.1.13-10.el7.x86_64 > pacemaker-libs-1.1.13-10.el7.x86_64 > pacemaker-cluster-libs-1.1.13-10.el7.x86_64 > corosynclib-2.3.4-7.el7.x86_64 > corosync-2.3.4-7.el7.x86_64 > > Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! > > -- > > [ jR ] > @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
On Wed, Jul 20, 2016 at 10:09 AM, Andrei Borzenkovwrote: > tcm_node is part of lio-utils. I am not familiar with RedHat packages, > but I presume that searching for "lio" should reveal something. > I checked on both Fedora and CentOS, and there is no such package and no package provides a file called "tcm_node". I also looked at rpmfind.net and the only RPMs I found are for various versions of OpenSUSE. Looks like something slipped in that is SuSE-specific. --Greg ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for > literally weeks. I cannot, for whatever reason, get pacemaker to create an > iSCSILogicalUnit resource. The error message that I’m seeing leads me to > believe that I’m missing something on the systems (“tcm_node”). Here are my > setup commands leading up to seeing this error message: > > # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget > iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s > > # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit > target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" > path=/dev/drbd1 implementation="lio" op monitor interval=15s > > > Failed Actions: > * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, > status=complete, exitreason='Setup problem: couldn't find command: tcm_node', tcm_node is part of lio-utils. I am not familiar with RedHat packages, but I presume that searching for "lio" should reveal something. > last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms > > This is with the following installed: > > pacemaker-cli-1.1.13-10.el7.x86_64 > pacemaker-1.1.13-10.el7.x86_64 > pacemaker-libs-1.1.13-10.el7.x86_64 > pacemaker-cluster-libs-1.1.13-10.el7.x86_64 > corosynclib-2.3.4-7.el7.x86_64 > corosync-2.3.4-7.el7.x86_64 > > Please please please…any ideas are appreciated. I’ve exhausted all avenues of > investigation at this point and don’t know what to do. Thank you! > > -- > > [ jR ] > @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Setup problem: couldn't find command: tcm_node
I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s Failed Actions: * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms This is with the following installed: pacemaker-cli-1.1.13-10.el7.x86_64 pacemaker-1.1.13-10.el7.x86_64 pacemaker-libs-1.1.13-10.el7.x86_64 pacemaker-cluster-libs-1.1.13-10.el7.x86_64 corosynclib-2.3.4-7.el7.x86_64 corosync-2.3.4-7.el7.x86_64 Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not always selecting the right stonith device
On 07/19/2016 06:54 PM, Andrei Borzenkov wrote: > 19.07.2016 19:01, Andrei Borzenkov пишет: >> 19.07.2016 18:24, Klaus Wenninger пишет: >>> On 07/19/2016 04:17 PM, Ken Gaillot wrote: On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: > On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillotwrote: > ... >>> primitive p_ston_pg1 stonith:external/ipmi \ >>> params hostname=pg1 ipaddr=10.148.128.35 userid=root >>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >>> passwd_method=file interface=lan priv=OPERATOR >>> > ... >> These constraints prevent each device from running on its intended >> target, but they don't limit which nodes each device can fence. For >> that, each device needs a pcmk_host_list or pcmk_host_map entry, for >> example: >> >>primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com >> >> Use pcmk_host_list if the fence device needs the node name as known to >> the cluster, and pcmk_host_map if you need to translate a node name to >> an address the device understands. >> > Is not pacemaker expected by default to query stonith agent instance > (sorry I do not know proper name for it) for a list of hosts it can > manage? And external/ipmi should return value of "hostname" patameter > here? So the question is why it does not work? You're right -- if not told otherwise, Pacemaker will query the device for the target list. In this case, the output of "stonith_admin -l" suggests it's not returning the desired information. I'm not familiar with the external agents, so I don't know why that would be. I mistakenly assumed it worked similarly to fence_ipmilan ... >>> guess it worked at the times when pacemaker did fencing via >>> cluster-glue-code... >>> A grep for "gethosts" doesn't return much for current pacemaker-sources >>> apart >>> from some leftovers in cts. >> Oh oh ... this sounds like a bug, no? >> > Apparently of all cluster-glue agents only ec2 supports both old and new > variants > > gethosts|hostlist|list) > # List of names we know about > > all others use gethosts. Not sure whether it is something to fix in > pacemaker or cluster-glue. Haven't dealt with legacy-fencing for a while so degradation of in-memory information + development in pacemaker create a portion of uncertainty in what I'm saying ;-) What you could try is adding "" to /usr/sbin/fence_legacy to convince pacemaker to even try asking the external Linux-HA stonith plugin. Unfortunately I currently don't have a setup (no cluster-glue stuff) I could quickly experiment with legacy-fencing. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not always selecting the right stonith device
On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegelwrote: >> > [...] >> > >> > primitive p_ston_pg1 stonith:external/ipmi \ >> > params hostname=pg1 ipaddr=10.148.128.35 userid=root >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >> > passwd_method=file interface=lan priv=OPERATOR >> > >> > primitive p_ston_pg2 stonith:external/ipmi \ >> > params hostname=pg2 ipaddr=10.148.128.19 userid=root >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass" >> > passwd_method=file interface=lan priv=OPERATOR >> > >> > primitive p_ston_pg3 stonith:external/ipmi \ >> > params hostname=pg3 ipaddr=10.148.128.59 userid=root >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass" >> > passwd_method=file interface=lan priv=OPERATOR >> > >> > location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 } >> > resource-discovery=exclusive \ >> > rule #uname eq pg1 \ >> > rule #uname eq pg2 \ >> > rule #uname eq pg3 >> > >> > location l_ston_pg1 p_ston_pg1 -inf: pg1 >> > location l_ston_pg2 p_ston_pg2 -inf: pg2 >> > location l_ston_pg3 p_ston_pg3 -inf: pg3 >> >> These constraints prevent each device from running on its intended >> target, but they don't limit which nodes each device can fence. For >> that, each device needs a pcmk_host_list or pcmk_host_map entry, for >> example: >> >> primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com >> >> Use pcmk_host_list if the fence device needs the node name as known to >> the cluster, and pcmk_host_map if you need to translate a node name to >> an address the device understands. > > > We used the parameter "hostname". What does it do if not that ? hostname is resource parameter. From pacemaker point of view this is opaque string and only resource agent knows how to interpret it. See discussion in another part of this thread. Agent is supposed to return information based on "hostname" parameter but apparently it does not understand when pacemaker asks it. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org