Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Ken Gaillot
On 07/20/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
> 
> [snipped]
> 
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case), when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
> 
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
> 
>   
> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Wow, that is a lot of hard-earned wisdom. :-)

I don't think the problem is restarting individual clone instances. You
can already restart an individual clone instance, by unmanaging the
resource and disabling any monitors on it, then using crm_resource
--force-* on the desired node.

The problem (for your use case) is that is-managed is cluster-wide for
the given resource. I suspect coming up with a per-node
interface/implementation for is-managed would be difficult.

If we implement --force-reload, there won't be a problem with reloads,
since unmanaging shouldn't be necessary.

FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
> 
> Cheers,
> Adam
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Jason A Ramsey
Actually, according to http://linux-iscsi.org/wiki/Lio-utils lio-utils has been 
deprecated and replaced by targetcli.

--
 
[ jR ]
@: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/20/16, 12:09 PM, "Andrei Borzenkov"  wrote:

20.07.2016 18:08, Jason A Ramsey пишет:
> I have been struggling getting a HA iSCSI Target cluster in place for 
literally weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:
> 
> # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s
> 
> # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s
> 
> 
> Failed Actions:
> * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',

tcm_node is part of lio-utils. I am not familiar with RedHat packages,
but I presume that searching for "lio" should reveal something.

> last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms
> 
> This is with the following installed:
> 
> pacemaker-cli-1.1.13-10.el7.x86_64
> pacemaker-1.1.13-10.el7.x86_64
> pacemaker-libs-1.1.13-10.el7.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7.x86_64
> corosynclib-2.3.4-7.el7.x86_64
> corosync-2.3.4-7.el7.x86_64
> 
> Please please please…any ideas are appreciated. I’ve exhausted all 
avenues of investigation at this point and don’t know what to do. Thank you!
> 
> --
>  
> [ jR ]
> @: ja...@eramsey.org
>  
>   there is no path to greatness; greatness is the path
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Greg Woods
On Wed, Jul 20, 2016 at 10:09 AM, Andrei Borzenkov 
wrote:

> tcm_node is part of lio-utils. I am not familiar with RedHat packages,
> but I presume that searching for "lio" should reveal something.
>

I checked on both Fedora and CentOS, and there is no such package and no
package provides a file called "tcm_node".  I also looked at rpmfind.net
and the only RPMs I found are for various versions of OpenSUSE. Looks like
something slipped in that is SuSE-specific.

--Greg
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Andrei Borzenkov
20.07.2016 18:08, Jason A Ramsey пишет:
> I have been struggling getting a HA iSCSI Target cluster in place for 
> literally weeks. I cannot, for whatever reason, get pacemaker to create an 
> iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
> believe that I’m missing something on the systems (“tcm_node”). Here are my 
> setup commands leading up to seeing this error message:
> 
> # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
> iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s
> 
> # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
> target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
> path=/dev/drbd1 implementation="lio" op monitor interval=15s
> 
> 
> Failed Actions:
> * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
> status=complete, exitreason='Setup problem: couldn't find command: tcm_node',

tcm_node is part of lio-utils. I am not familiar with RedHat packages,
but I presume that searching for "lio" should reveal something.

> last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms
> 
> This is with the following installed:
> 
> pacemaker-cli-1.1.13-10.el7.x86_64
> pacemaker-1.1.13-10.el7.x86_64
> pacemaker-libs-1.1.13-10.el7.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7.x86_64
> corosynclib-2.3.4-7.el7.x86_64
> corosync-2.3.4-7.el7.x86_64
> 
> Please please please…any ideas are appreciated. I’ve exhausted all avenues of 
> investigation at this point and don’t know what to do. Thank you!
> 
> --
>  
> [ jR ]
> @: ja...@eramsey.org
>  
>   there is no path to greatness; greatness is the path
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Jason A Ramsey
I have been struggling getting a HA iSCSI Target cluster in place for literally 
weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:

# pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s

# pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s


Failed Actions:
* hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',
last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms

This is with the following installed:

pacemaker-cli-1.1.13-10.el7.x86_64
pacemaker-1.1.13-10.el7.x86_64
pacemaker-libs-1.1.13-10.el7.x86_64
pacemaker-cluster-libs-1.1.13-10.el7.x86_64
corosynclib-2.3.4-7.el7.x86_64
corosync-2.3.4-7.el7.x86_64

Please please please…any ideas are appreciated. I’ve exhausted all avenues of 
investigation at this point and don’t know what to do. Thank you!

--
 
[ jR ]
@: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Klaus Wenninger
On 07/19/2016 06:54 PM, Andrei Borzenkov wrote:
> 19.07.2016 19:01, Andrei Borzenkov пишет:
>> 19.07.2016 18:24, Klaus Wenninger пишет:
>>> On 07/19/2016 04:17 PM, Ken Gaillot wrote:
 On 07/19/2016 09:00 AM, Andrei Borzenkov wrote:
> On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot  wrote:
> ...
>>> primitive p_ston_pg1 stonith:external/ipmi \
>>>  params hostname=pg1 ipaddr=10.148.128.35 userid=root
>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
>>> passwd_method=file interface=lan priv=OPERATOR
>>>
> ...
>> These constraints prevent each device from running on its intended
>> target, but they don't limit which nodes each device can fence. For
>> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
>> example:
>>
>>primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
>>
>> Use pcmk_host_list if the fence device needs the node name as known to
>> the cluster, and pcmk_host_map if you need to translate a node name to
>> an address the device understands.
>>
> Is not pacemaker expected by default to query stonith agent instance
> (sorry I do not know proper name for it) for a list of hosts it can
> manage? And external/ipmi should return value of "hostname" patameter
> here? So the question is why it does not work?
 You're right -- if not told otherwise, Pacemaker will query the device
 for the target list. In this case, the output of "stonith_admin -l"
 suggests it's not returning the desired information. I'm not familiar
 with the external agents, so I don't know why that would be. I
 mistakenly assumed it worked similarly to fence_ipmilan ...
>>> guess it worked at the times when pacemaker did fencing via
>>> cluster-glue-code...
>>> A grep for "gethosts" doesn't return much for current pacemaker-sources
>>> apart
>>> from some leftovers in cts.
>> Oh oh ... this sounds like a bug, no?
>>
> Apparently of all cluster-glue agents only ec2 supports both old and new
> variants
>
> gethosts|hostlist|list)
> # List of names we know about
>
> all others use gethosts. Not sure whether it is something to fix in
> pacemaker or cluster-glue.
Haven't dealt with legacy-fencing for a while so degradation of in-memory
information + development in pacemaker create a portion of uncertainty
in what I'm saying ;-)
What you could try is adding "" to
/usr/sbin/fence_legacy
to convince pacemaker to even try asking the external Linux-HA stonith
plugin.
Unfortunately I currently don't have a setup (no cluster-glue stuff) I could
quickly experiment with legacy-fencing.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Andrei Borzenkov
On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel  wrote:
>> > [...]
>> >
>> > primitive p_ston_pg1 stonith:external/ipmi \
>> >  params hostname=pg1 ipaddr=10.148.128.35 userid=root
>> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
>> > passwd_method=file interface=lan priv=OPERATOR
>> >
>> > primitive p_ston_pg2 stonith:external/ipmi \
>> >  params hostname=pg2 ipaddr=10.148.128.19 userid=root
>> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
>> > passwd_method=file interface=lan priv=OPERATOR
>> >
>> > primitive p_ston_pg3 stonith:external/ipmi \
>> >  params hostname=pg3 ipaddr=10.148.128.59 userid=root
>> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
>> > passwd_method=file interface=lan priv=OPERATOR
>> >
>> > location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 }
>> > resource-discovery=exclusive \
>> >  rule #uname eq pg1 \
>> >  rule #uname eq pg2 \
>> >  rule #uname eq pg3
>> >
>> > location l_ston_pg1 p_ston_pg1 -inf: pg1
>> > location l_ston_pg2 p_ston_pg2 -inf: pg2
>> > location l_ston_pg3 p_ston_pg3 -inf: pg3
>>
>> These constraints prevent each device from running on its intended
>> target, but they don't limit which nodes each device can fence. For
>> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
>> example:
>>
>>  primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
>>
>> Use pcmk_host_list if the fence device needs the node name as known to
>> the cluster, and pcmk_host_map if you need to translate a node name to
>> an address the device understands.
>
>
> We used the parameter "hostname". What does it do if not that ?

hostname is resource parameter. From pacemaker point of view this is
opaque string and only resource agent knows how to interpret it.

See discussion in another part of this thread. Agent is supposed to
return information based on "hostname" parameter but apparently it
does not understand when pacemaker asks it.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org