from:"\"Ken Gaillot\""

Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-07 Thread Ken Gaillot

On 03/07/2016 07:31 AM, Ferenc Wágner wrote:
> Hi,
> 
> A couple of days ago the nodes of our Pacemaker 1.1.14 cluster
> (vhbl0[3-7]) experienced temporary storage outage, leading to processes
> stucking randomly for a couple of minutes and big load spikes.  There
> were 30 monitor operation timeouts altogether on vhbl05, and an internal
> error on the DC.  What follows is my longish analysis of the logs, which
> may be wrong, which I'd be glad to learn about.  Knowledgeable people
> may skip to the end for the main question and a short mention of the
> side questions.  So, Pacemaker logs start as:
> 
> 12:53:51 vhbl05 lrmd[9442]:  warning: vm-niifdc_monitor_6 process (PID 
> 1867) timed out
> 12:53:51 vhbl05 lrmd[9442]:  warning: vm-niiffs_monitor_6 process (PID 
> 1868) timed out
> 12:53:51 vhbl05 lrmd[9442]:  warning: vm-niifdc_monitor_6:1867 - timed 
> out after 2ms
> 12:53:51 vhbl05 lrmd[9442]:  warning: vm-niiffs_monitor_6:1868 - timed 
> out after 2ms
> 12:53:51 vhbl05 crmd[9445]:error: Operation vm-niifdc_monitor_6: 
> Timed Out (node=vhbl05, call=720, timeout=2ms)
> 12:53:52 vhbl05 crmd[9445]:error: Operation vm-niiffs_monitor_6: 
> Timed Out (node=vhbl05, call=717, timeout=2ms)
> 
> (precise interleaving is impossible, as the vhbl05 logs arrived at the
> log server with a delay of 78 s -- probably the syslog daemon was stuck)
> 
> 12:53:51 vhbl03 crmd[8530]:   notice: State transition S_IDLE -> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> 12:53:52 vhbl03 pengine[8529]:  warning: Processing failed op monitor for 
> vm-niifdc on vhbl05: unknown error (1)
> 12:53:52 vhbl03 pengine[8529]:   notice: Recover vm-niifdc#011(Started vhbl05)
> 12:53:52 vhbl03 pengine[8529]:   notice: Calculated Transition 909: 
> /var/lib/pacemaker/pengine/pe-input-262.bz2
> 
> The other nodes report in:
> 
> 12:53:57 vhbl04 crmd[9031]:   notice: High CPU load detected: 74.949997
> 12:54:16 vhbl06 crmd[8676]:   notice: High CPU load detected: 93.540001
> 
> while monitor operations keep timing out on vhbl05:
> 
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-FCcontroller_monitor_6 process 
> (PID 1976) timed out
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-FCcontroller_monitor_6:1976 - 
> timed out after 2ms
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-dwdm_monitor_6 process (PID 
> 1977) timed out
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-dwdm_monitor_6:1977 - timed out 
> after 2ms
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-eiffel_monitor_6 process (PID 
> 1978) timed out
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-eiffel_monitor_6:1978 - timed 
> out after 2ms
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-web7_monitor_6 process (PID 
> 2015) timed out
> 12:54:13 vhbl05 lrmd[9442]:  warning: vm-web7_monitor_6:2015 - timed out 
> after 2ms
> 12:54:13 vhbl05 crmd[9445]:error: Operation 
> vm-FCcontroller_monitor_6: Timed Out (node=vhbl05, call=640, 
> timeout=2ms)
> 12:54:13 vhbl05 crmd[9445]:error: Operation vm-dwdm_monitor_6: Timed 
> Out (node=vhbl05, call=636, timeout=2ms)
> 12:54:13 vhbl05 crmd[9445]:error: Operation vm-eiffel_monitor_6: 
> Timed Out (node=vhbl05, call=633, timeout=2ms)
> 12:54:13 vhbl05 crmd[9445]:error: Operation vm-web7_monitor_6: Timed 
> Out (node=vhbl05, call=638, timeout=2ms)
> 12:54:17 vhbl05 lrmd[9442]:  warning: vm-ftp.pws_monitor_6 process (PID 
> 2101) timed out
> 12:54:17 vhbl05 lrmd[9442]:  warning: vm-ftp.pws_monitor_6:2101 - timed 
> out after 2ms
> 12:54:17 vhbl05 crmd[9445]:error: Operation vm-ftp.pws_monitor_6: 
> Timed Out (node=vhbl05, call=637, timeout=2ms)
> 12:54:17 vhbl05 lrmd[9442]:  warning: vm-cirkusz_monitor_6 process (PID 
> 2104) timed out
> 12:54:17 vhbl05 lrmd[9442]:  warning: vm-cirkusz_monitor_6:2104 - timed 
> out after 2ms
> 12:54:17 vhbl05 crmd[9445]:error: Operation vm-cirkusz_monitor_6: 
> Timed Out (node=vhbl05, call=650, timeout=2ms)
> 
> Back on the DC:
> 
> 12:54:22 vhbl03 crmd[8530]:  warning: Request 3308 to pengine 
> (0x7f88810214a0) failed: Resource temporarily unavailable (-11)
> 12:54:22 vhbl03 crmd[8530]:error: Could not contact the pengine: -11
> 12:54:22 vhbl03 crmd[8530]:error: FSA: Input I_ERROR from 
> do_pe_invoke_callback() received in state S_POLICY_ENGINE
> 12:54:22 vhbl03 crmd[8530]:  warning: State transition S_POLICY_ENGINE -> 
> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=do_pe_invoke_callback ]
> 12:54:22 vhbl03 crmd[8530]:  warning: Fast-tracking shutdown in response to 
> errors
> 12:54:22 vhbl03 crmd[8530]:  warning: Not voting in election, we're in state 
> S_RECOVERY
> 12:54:22 vhbl03 crmd[8530]:error: FSA: Input I_TERMINATE from 
> do_recover() received in state S_RECOVERY
> 12:54:22 vhbl03 crmd[8530]:   notice: Stopped 0 recurring operations at 
> shutdown (32 ops remaining)
> 12:5

Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-07 Thread Ken Gaillot

On 03/07/2016 02:03 PM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> On 03/07/2016 07:31 AM, Ferenc Wágner wrote:
>>
>>> 12:55:13 vhbl07 crmd[8484]: notice: Transition aborted by 
>>> vm-eiffel_monitor_6 'create' on vhbl05: Foreign event 
>>> (magic=0:0;521:0:0:634eef05-39c1-4093-94d4-8d624b423bb7, cib=0.613.98, 
>>> source=process_graph_event:600, 0)
>>
>> That means the action was initiated by a different node (the previous DC
>> presumably), so the new DC wants to recalculate everything.
> 
> Time travel was sort of possible in that situation, and recurring
> monitor operations are not logged, so this is indeed possible.  The main
> thing is that it wasn't mishandled.
> 
>>> recovery actions turned into start actions for the resources stopped
>>> during the previous transition.  However, almost all other recovery
>>> actions just disappeared without any comment.  This was actually
>>> correct, but I really wonder why the cluster decided to paper over
>>> the previous monitor operation timeouts.  Maybe the operations
>>> finished meanwhile and got accounted somehow, just not logged?
>>
>> I'm not sure why the PE decided recovery was not necessary. Operation
>> results wouldn't be accepted without being logged.
> 
> At which logging level?  I can't see recurring monitor operation logs in
> syslog (at default logging level: notice) nor in /var/log/pacemaker.log
> (which contains info level messages as well).
> 
> However, the info level logs contain more "Transition aborted" lines, as
> if only the first of them got logged with notice level.  This would make
> sense, since the later ones don't make any difference on an already
> aborted transition, so they aren't that important.  And in fact such
> lines were suppressed from the syslog I checked first, for example:
> 
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: Diff: --- 
> 0.613.120 2
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: Diff: +++ 
> 0.613.121 (null)
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: +  /cib:  
> @num_updates=121
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: ++ 
> /cib/status/node_state[@id='167773707']/lrm[@id='167773707']/lrm_resources/lrm_resource[@id='vm-elm']:
>operation="monitor" crm-debug-origin="do_update_resource" 
> crm_feature_set="3.0.10" 
> transition-key="473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7" 
> transition-magic="0:0;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7" 
> on_node="vhbl05" call-id="645" rc-code="0" op-st
> 12:55:39 [8479] vhbl07cib: info: cib_process_request:
> Completed cib_modify operation for section status: OK (rc=0, 
> origin=vhbl05/crmd/362, version=0.613.121)
> 12:55:39 [8484] vhbl07   crmd: info: abort_transition_graph: 
> Transition aborted by vm-elm_monitor_6 'create' on vhbl05: Foreign event 
> (magic=0:0;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7, cib=0.613.121, 
> source=process_graph_event:600, 0)
> 12:55:39 [8484] vhbl07   crmd: info: process_graph_event:
> Detected action (0.473) vm-elm_monitor_6.645=ok: initiated by a different 
> node
> 
> I can very much imagine this cancelling the FAILED state induced by a
> monitor timeout like:
> 
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++   
>  type="TransientDomain" class="ocf" provider="niif">
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++   
>operation_key="vm-elm_monitor_6" operation="monitor" 
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" 
> transition-key="473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7" 
> transition-magic="2:1;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7" 
> on_node="vhbl05" call-id="645" rc-code="1" op-status="2" interval="6" 
> last-rc-change="1456833279" exe
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++   
>operation_key="vm-elm_start_0" operation="start" 
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" 
> transition-key="472:0:0:634eef05-39c1-4093-94d4-8d624b423bb7" 
> transition-magic="0:0;472:0:0:634eef05-39c1-4093-94d4-8d624b4

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-08 Thread Ken Gaillot

On 03/07/2016 09:10 PM, Сергей Филатов wrote:
> Thanks for an answer. Turned out the problem was not in ipv6.
> Remote node is listening on 3121 port and it’s name is resolving fine.
> Got authkey file at /etc/pacemaker on both remote and cluster nodes.
> What can I check in addition? Is there any walkthrough for ubuntu?

Nothing specific to ubuntu, but there's not much distro-specific to it.

If you "ssh -p 3121" to the remote node from a cluster node, what do you
get?

pacemaker_remote will use the usual log settings for pacemaker (probably
/var/log/pacemaker.log, probably configured in /etc/default/pacemaker on
ubuntu). You should see "New remote connection" in the remote node's log
when the cluster tries to connect, and "LRMD client connection
established" if it's successful.

As always, check for firewall and SELinux issues.

> 
>> On 07 Mar 2016, at 09:40, Ken Gaillot  wrote:
>>
>> On 03/06/2016 07:43 PM, Сергей Филатов wrote:
>>> Hi,
>>> I’m trying to set up pacemaker_remote resource on ubuntu 14.04
>>> I followed "remote node walkthrough” guide 
>>> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280
>>>  
>>> <http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280>)
>>> After creating ocf:pacemaker:remote resource on cluster node, remote node 
>>> doesn’t show up as online.
>>> I guess I need to configure remote agent to listen on ipv4, where can I 
>>> configure it?
>>> Or is there any other steps to set up remote node besides the ones 
>>> mentioned in guide?
>>> tcp6   0  0 :::3121 :::*LISTEN  
>>> 21620/pacemaker_rem off (0.00/0/0)
>>>
>>> pacemaker and pacemaker_remote are 1.12 version
>>
>>
>> pacemaker_remote will try to bind to IPv6 addresses first, and only if
>> that fails, will it bind to IPv4. There is no way to configure this
>> behavior currently, though it obviously would be nice to have.
>>
>> The only workarounds I can think of are to make IPv6 connections work
>> between the cluster and the remote node, or disable IPv6 on the remote
>> node. Using IPv6, there could be an issue if your name resolution
>> returns both IPv4 and IPv6 addresses for the remote host; you could
>> potentially work around that by adding an IPv6-only name for it, and
>> using that as the server option to the remote resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-09 Thread Ken Gaillot

On 03/08/2016 11:38 PM, Сергей Филатов wrote:
> ssh -p 3121 compute-1
> ssh_exchange_identification: read: Connection reset by peer
> 
> That’s what I get in /var/log/pacemaker.log after restarting pacemaker_remote:
> Mar 09 05:30:27 [28031] compute-1.domain.com   lrmd: info: 
> crm_signal_dispatch:  Invoking handler for signal 15: Terminated
> Mar 09 05:30:27 [28031] compute-1.domain.com   lrmd: info: 
> lrmd_shutdown:Terminating with  0 clients
> Mar 09 05:30:27 [28031] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_withdraw:  withdrawing server sockets
> Mar 09 05:30:27 [28031] compute-1.domain.com   lrmd: info: 
> crm_xml_cleanup:  Cleaning up memory from libxml2
> Mar 09 05:30:27 [28193] compute-1.domain.com   lrmd: info: 
> crm_log_init: Changed active directory to 
> /var/lib/heartbeat/cores/root
> Mar 09 05:30:27 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: lrmd
> Mar 09 05:30:27 [28193] compute-1.domain.com   lrmd:   notice: 
> lrmd_init_remote_tls_server:  Starting a tls listener on port 3121.
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd:   notice: 
> bind_and_listen:  Listening on address ::
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: cib_ro
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: cib_rw
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: cib_shm
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: attrd
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: stonith-ng
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: 
> qb_ipcs_us_publish:   server name: crmd
> Mar 09 05:30:28 [28193] compute-1.domain.com   lrmd: info: main:  
>Starting

It looks like the cluster is not even trying to connect to the remote
node. pacemaker_remote here is binding only to IPv6, so the cluster will
need to contact it on that address.

What is your ocf:pacemaker:remote resource configuration?

Check your cluster node logs for the start action -- if your resource is
named R, the start action will be R_start_0. There will be two nodes of
interest: the node assigned the remote node resource, and the DC.

> I got only pacemaker-remote resource-agents pcs installed, so no 
> /etc/default/pacemaker file on remote node
> selinux is disabled and I specifically opened firewall on 2224, 3121 and 
> 21064 tcp and 5405 udp
> 
>> On 08 Mar 2016, at 08:51, Ken Gaillot  wrote:
>>
>> On 03/07/2016 09:10 PM, Сергей Филатов wrote:
>>> Thanks for an answer. Turned out the problem was not in ipv6.
>>> Remote node is listening on 3121 port and it’s name is resolving fine.
>>> Got authkey file at /etc/pacemaker on both remote and cluster nodes.
>>> What can I check in addition? Is there any walkthrough for ubuntu?
>>
>> Nothing specific to ubuntu, but there's not much distro-specific to it.
>>
>> If you "ssh -p 3121" to the remote node from a cluster node, what do you
>> get?
>>
>> pacemaker_remote will use the usual log settings for pacemaker (probably
>> /var/log/pacemaker.log, probably configured in /etc/default/pacemaker on
>> ubuntu). You should see "New remote connection" in the remote node's log
>> when the cluster tries to connect, and "LRMD client connection
>> established" if it's successful.
>>
>> As always, check for firewall and SELinux issues.
>>
>>>
>>>> On 07 Mar 2016, at 09:40, Ken Gaillot  wrote:
>>>>
>>>> On 03/06/2016 07:43 PM, Сергей Филатов wrote:
>>>>> Hi,
>>>>> I’m trying to set up pacemaker_remote resource on ubuntu 14.04
>>>>> I followed "remote node walkthrough” guide 
>>>>> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280
>>>>>  
>>>>> <http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280>)
>>>>> After creating ocf:pacemaker:remote resource on cluster node, remote node 
>>>>> doesn’t show up as online.
>>>>> I guess I need to configure remote agent to listen on ipv4, where can I 
>>>>> configure it?
>>>>> Or is there any other steps to set up remote node besides the ones 
>>>>> mentioned in guide?
>>>>> tcp6   0  0 :::3121

Re: [ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

2016-03-10 Thread Ken Gaillot

On 03/10/2016 08:48 AM, Bernie Jones wrote:
> A bit more info..
> 
>  
> 
> If, after I restart the failed dirsrv instance, I then perform a "pcs
> resource cleanup dirsrv-daemon" to clear the FAIL messages then the failover
> will work OK.
> 
> So it's as if the cleanup is changing the status in some way..
> 
>  
> 
> From: Bernie Jones [mailto:ber...@securityconsulting.ltd.uk] 
> Sent: 10 March 2016 08:47
> To: 'Cluster Labs - All topics related to open-source clustering welcomed'
> Subject: [ClusterLabs] FLoating IP failing over but not failing back with
> active/active LDAP (dirsrv)
> 
>  
> 
> Hi all, could you advise please?
> 
>  
> 
> I'm trying to configure a floating IP with an active/active deployment of
> 389 directory server. I don't want pacemaker to manage LDAP but just to
> monitor and switch the IP as required to provide resilience. I've seen some
> other similar threads and based my solution on those.
> 
>  
> 
> I've amended the ocf for slapd to work with 389 DS and this tests out OK
> (dirsrv).
> 
>  
> 
> I've then created my resources as below:
> 
>  
> 
> pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100"
> cidr_netmask="32" op monitor timeout="20s" interval="5s" op start
> interval="0" timeout="20" op stop interval="0" timeout="20"
> 
> pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor
> interval="10" timeout="5" op start interval="0" timeout="5" op stop
> interval="0" timeout="5" meta "is-managed=false"

is-managed=false means the cluster will not try to start or stop the
service. It should never be used in regular production, only when doing
maintenance on the service.

> pcs resource clone dirsrv-daemon meta globally-unique="false"
> interleave="true" target-role="Started" "master-max=2"
> 
> pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip
> score=INFINITY

This constraint means that dirsrv is only allowed to run where dirsrv-ip
is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone,
which means keep the IP with a working dirsrv instance.

> pcs property set no-quorum-policy=ignore

If you're using corosync 2, you generally don't need or want this.
Instead, ensure corosync.conf has two_node: 1 (which will be done
automatically if you used pcs cluster setup).

> pcs resource defaults migration-threshold=1
> 
> pcs property set stonith-enabled=false
> 
>  
> 
> On startup all looks well:
> 
> 
> 
> 
>  
> 
> Last updated: Thu Mar 10 08:28:03 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2): Started ga1.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):Started ga2.idam.com
> (unmanaged)
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):Started ga1.idam.com
> (unmanaged)
> 
>  
> 
>  
> 
> 
> 
> 
>  
> 
> Stop dirsrv on ga1:
> 
>  
> 
> Last updated: Thu Mar 10 08:28:43 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2): Started ga2.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):Started ga2.idam.com
> (unmanaged)
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):FAILED ga1.idam.com
> (unmanaged)
> 
>  
> 
> Failed actions:
> 
> dirsrv-daemon_monitor_1 on ga1.idam.com 'not running' (7): call=12,
> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms,
> exec=0ms
> 
>  
> 
> IP fails over to ga2 OK:
> 
>  
> 
> 
> 
> 
>  
> 
> Restart dirsrv on ga1
> 
>  
> 
> Last updated: Thu Mar 10 08:30:01 2016
> 
> Last change: Thu Mar 10 08:26:14 2016
> 
> Stack: cman
> 
> Current DC: ga2.idam.com - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured
> 
> 3 Resources configured
> 
>  
> 
>  
> 
> Online: [ ga1.idam.com ga2.idam.com ]
> 
>  
> 
> dirsrv-ip   (ocf::heartbeat:IPaddr2): Started ga2.idam.com
> 
>  Clone Set: dirsrv-daemon-clone [dirsrv-daemon]
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):Started ga2.idam.com
> (unmanaged)
> 
>  dirsrv-daemon  (ocf::heartbeat:dirsrv):Started ga1.idam.com
> (unmanaged)
> 
>  
> 
> Failed actions:
> 
> dirsrv-daemon_monitor_1 on ga1.idam.com 'not running' (7)

Re: [ClusterLabs] Stonith ignores resource stop errors

2016-03-10 Thread Ken Gaillot

On 03/10/2016 04:42 AM, Klechomir wrote:
> Hi List
> 
> I'm testing stonith now (pacemaker 1.1.8), and noticed that it properly kills 
> a node with stopped pacemaker, but ignores resource stop errors.
> 
> I'm pretty sure that the same version worked properly with stonith before.
> Maybe I'm missing some setting?
> 
> Rgards,
> Klecho

The only setting that should be relevant there is on-fail for the
resource's stop operation, which defaults to fence but can be set to
other actions.

That said, 1.1.8 is pretty old at this point, so I'm not sure of its
behavior.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] FLoating IP failing over but not failing back with active/active LDAP (dirsrv)

2016-03-10 Thread Ken Gaillot

On 03/10/2016 09:38 AM, Bernie Jones wrote:
> Hi Ken,
> Thanks for your response, I've now corrected the constraint order but the
> behaviour is still the same, the IP does not fail over (after the first
> time) unless I issue a pcs resource cleanup command on dirsrv-daemon.
> 
> Also, I'm not sure why you advise against using is-managed=false in
> production. We are trying to use pacemaker purely to fail over on detection
> of a failure and not to control starting or stopping of the instances. It is
> essential that in normal operation we have both instances up as we are using
> MMR.
> 
> Thanks,
> Bernie

I think you misunderstand is-managed. It is used to be able to perform
maintenance on a service without pacemaker fencing the node when the
service is stopped/restarted. Failover won't work with is-managed=false,
because failover involves stopping and starting the service.

Your goal is already accomplished by using a clone with master-max=2.
With the clone, pacemaker will run the service on both nodes, and with
master-max=2, it will be master/master.

> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 10 March 2016 15:01
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] FLoating IP failing over but not failing back
> with active/active LDAP (dirsrv)
> 
> On 03/10/2016 08:48 AM, Bernie Jones wrote:
>> A bit more info..
>>
>>  
>>
>> If, after I restart the failed dirsrv instance, I then perform a "pcs
>> resource cleanup dirsrv-daemon" to clear the FAIL messages then the
> failover
>> will work OK.
>>
>> So it's as if the cleanup is changing the status in some way..
>>
>>  
>>
>> From: Bernie Jones [mailto:ber...@securityconsulting.ltd.uk] 
>> Sent: 10 March 2016 08:47
>> To: 'Cluster Labs - All topics related to open-source clustering welcomed'
>> Subject: [ClusterLabs] FLoating IP failing over but not failing back with
>> active/active LDAP (dirsrv)
>>
>>  
>>
>> Hi all, could you advise please?
>>
>>  
>>
>> I'm trying to configure a floating IP with an active/active deployment of
>> 389 directory server. I don't want pacemaker to manage LDAP but just to
>> monitor and switch the IP as required to provide resilience. I've seen
> some
>> other similar threads and based my solution on those.
>>
>>  
>>
>> I've amended the ocf for slapd to work with 389 DS and this tests out OK
>> (dirsrv).
>>
>>  
>>
>> I've then created my resources as below:
>>
>>  
>>
>> pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100"
>> cidr_netmask="32" op monitor timeout="20s" interval="5s" op start
>> interval="0" timeout="20" op stop interval="0" timeout="20"
>>
>> pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor
>> interval="10" timeout="5" op start interval="0" timeout="5" op stop
>> interval="0" timeout="5" meta "is-managed=false"
> 
> is-managed=false means the cluster will not try to start or stop the
> service. It should never be used in regular production, only when doing
> maintenance on the service.
> 
>> pcs resource clone dirsrv-daemon meta globally-unique="false"
>> interleave="true" target-role="Started" "master-max=2"
>>
>> pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip
>> score=INFINITY
> 
> This constraint means that dirsrv is only allowed to run where dirsrv-ip
> is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone,
> which means keep the IP with a working dirsrv instance.
> 
>> pcs property set no-quorum-policy=ignore
> 
> If you're using corosync 2, you generally don't need or want this.
> Instead, ensure corosync.conf has two_node: 1 (which will be done
> automatically if you used pcs cluster setup).
> 
>> pcs resource defaults migration-threshold=1
>>
>> pcs property set stonith-enabled=false
>>
>>  
>>
>> On startup all looks well:
>>
>>
> 
>> 
>>
>>  
>>
>> Last updated: Thu Mar 10 08:28:03 2016
>>
>> Last change: Thu Mar 10 08:26:14 2016
>>
>> Stack: cman
>>
>> Current DC: ga2.idam.com - partition with quorum
>>
>> Version: 1.1.11-97629de
>>
>>

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-11 Thread Ken Gaillot

On 03/10/2016 11:36 PM, Сергей Филатов wrote:
> This one is the right log

Something in the cluster configuration and state (for example, an
unsatisfied constraint) is preventing the cluster from starting the
resource:

Mar 10 04:00:53 [11785] controller-1.domain.compengine: info:
native_print: compute-1   (ocf::pacemaker:remote):Stopped
Mar 10 04:00:53 [11785] controller-1.domain.compengine: info:
native_color: Resource compute-1 cannot run anywhere


> 
> 
> 
>> On 10 Mar 2016, at 08:17, Сергей Филатов > <mailto:filat...@gmail.com>> wrote:
>>
>> pcs resource show compute-1
>>
>>  Resource: compute-1 (class=ocf provider=pacemaker type=remote)
>>  Operations: monitor interval=60s (compute-1-monitor-interval-60s)
>>
>> Can’t find _start_0 template in pacemaker logs
>> I don’t have ipv6 address for remote node, but I guess it should be 
>> listening 
>> on both
>>
>> attached pacemaker.log for cluster node
>> 
>>
>>
>>> On 09 Mar 2016, at 10:23, Ken Gaillot >> <mailto:kgail...@redhat.com>> wrote:
>>>
>>> On 03/08/2016 11:38 PM, Сергей Филатов wrote:
>>>> ssh -p 3121 compute-1
>>>> ssh_exchange_identification: read: Connection reset by peer
>>>>
>>>> That’s what I get in /var/log/pacemaker.log after restarting 
>>>> pacemaker_remote:
>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: crm_signal_dispatch:  Invoking handler for signal 
>>>> 15: 
>>>> Terminated
>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: lrmd_shutdown:Terminating with  0 clients
>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_withdraw:  withdrawing server sockets
>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: crm_xml_cleanup:  Cleaning up memory from libxml2
>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: crm_log_init: Changed active directory to 
>>>> /var/lib/heartbeat/cores/root
>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: lrmd
>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd:   notice: lrmd_init_remote_tls_server:  Starting a tls 
>>>> listener 
>>>> on port 3121.
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd:   notice: bind_and_listen:  Listening on address ::
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: cib_ro
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: cib_rw
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: cib_shm
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: attrd
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: stonith-ng
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: qb_ipcs_us_publish:   server name: crmd
>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>> <http://compute-1.domain.com/> 
>>>>   lrmd: info: main: Starting
>>>
>>> It looks like the cluster is not even trying to connect to the remote
>>> node. pacemaker_remote here is binding only to IPv6, so the cluster will
>>> need to contact it on that address.
>>>
>>> What is your ocf:pacemaker:remote resource configuratio

Re: [ClusterLabs] Unable to create HAProxy resource: no such resource agent

2016-03-11 Thread Ken Gaillot

On 03/11/2016 02:18 PM, Matthew Mucker wrote:
> I've created a Pacemaker cluster and have created a virtual IP address 
> resource that works properly. I am now attempting to add HAProxy as a 
> resource and I'm having problems.
> 
> 
> - I installed HAProxy on all nodes of the cluster
> 
> - I downloaded http://github.com/russki/cluster-agents/raw/master/haproxy to 
> /usr/lib/ocf/resource.d/heartbeat/haproxy on each node
> 
> - I ran chmod 755 on /usr/lib/ocf/resource.d/heartbeat/haproxy on each node
> 
> - I configured HAProxy.cfg on each node
> 
> - I edited /etc/default/haproxy to enable haproxy

FYI, this file will be ignored when the service is managed by the
cluster (unless the RA specifically reads it, which I've rarely seen).
That won't cause any problems, but any desired settings should be made
in the resource's cluster configuration rather than here.

> - I've tested and confirmed that HAProxy will start as a service on the node 
> hosting the virtual IP address

So far, so good. Good prep work.

> 
> However, when I run the command:
> 
> 
> crm configure primitive haproxy ocf:heartbeat:haproxy op monitor interval=15s
> 
> 
> I get output:
> 
> 
> ERROR: None
> ERROR: ocf:heartbeat:haproxy: meta-data contains no resource-agent element
> ERROR: None
> ERROR: ocf:heartbeat:haproxy: meta-data contains no resource-agent element
> ERROR: ocf:heartbeat:haproxy: no such resource agent
> 
> 
> I've been unable to find a solution to this problem in the searching I've 
> done online. Does anyone have any idea what the cause and the solution to 
> this problem might be?

What does this command return:

crm_resource --show-metadata ocf:heartbeat:haproxy

> 
> Thanks,
> 
> 
> -Matthew


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to create HAProxy resource: no such resource agent

2016-03-11 Thread Ken Gaillot

On 03/11/2016 03:25 PM, Matthew Mucker wrote:
> I found the problem. When I used wget to retrieve the file, I was actually 
> downloading an HTML error page from my proxy server instead of the intended 
> file.
> 
> 
> Oops.

:-) I've done that before too ...

> 
> 
> 
> 
> I've created a Pacemaker cluster and have created a virtual IP address 
> resource that works properly. I am now attempting to add HAProxy as a 
> resource and I'm having problems.
> 
> 
> - I installed HAProxy on all nodes of the cluster
> 
> - I downloaded http://github.com/russki/cluster-agents/raw/master/haproxy to 
> /usr/lib/ocf/resource.d/heartbeat/haproxy on each node
> 
> - I ran chmod 755 on /usr/lib/ocf/resource.d/heartbeat/haproxy on each node
> 
> - I configured HAProxy.cfg on each node
> 
> - I edited /etc/default/haproxy to enable haproxy
> 
> - I've tested and confirmed that HAProxy will start as a service on the node 
> hosting the virtual IP address
> 
> 
> However, when I run the command:
> 
> 
> crm configure primitive haproxy ocf:heartbeat:haproxy op monitor interval=15s
> 
> 
> I get output:
> 
> 
> ERROR: None
> ERROR: ocf:heartbeat:haproxy: meta-data contains no resource-agent element
> ERROR: None
> ERROR: ocf:heartbeat:haproxy: meta-data contains no resource-agent element
> ERROR: ocf:heartbeat:haproxy: no such resource agent
> 
> 
> I've been unable to find a solution to this problem in the searching I've 
> done online. Does anyone have any idea what the cause and the solution to 
> this problem might be?
> 
> 
> Thanks,
> 
> 
> -Matthew


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] documentation on STONITH with remote nodes?

2016-03-14 Thread Ken Gaillot

On 03/12/2016 05:07 AM, Adam Spiers wrote:
> Is there any documentation on how STONITH works on remote nodes?  I
> couldn't find any on clusterlabs.org, and it's conspicuously missing
> from:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/
> 
> I'm guessing the answer is more or less "it works exactly the same as
> for corosync nodes", however I expect there are nuances which might be
> worth documenting.  In particular I'm looking for confirmation that
> STONITH resources for remote nodes will only run on the corosync
> nodes, and can't run on (other) remote nodes.  My empirical tests seem
> to confirm this, but reassurance from an expert would be appreciated :-)

The above link does have some information -- search it for "fencing".

You are correct, only full cluster nodes can run fence devices or
initiate fencing actions.

Fencing of remote nodes (configured via ocf:pacemaker:remote resource)
is indeed identical to fencing of full cluster nodes. You can configure
fence devices for them the same way, and the cluster fences them the
same way.

Fencing of guest nodes (configured via remote-node property of a
resource such as VirtualDomain) is different. For those, fence devices
are ignored, and the cluster fences them by stopping and starting the
resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ClusterIP location constraint reappears after reboot

2016-03-14 Thread Ken Gaillot

net addr add 
> 172.20.240.123/24 brd 172.20.240.255 dev eth0
> Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13282]: INFO: ip link set eth0 up
> Feb 21 23:10:42 g5se-dea2b1 IPaddr2[13282]: INFO: 
> /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p 
> /var/run/heartbeat/rsctmp/send_arp-172.20.240.123 eth0 172.20.240.123 auto 
> not_used not_used
> Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
> operation ClusterIP_start_0 (call=68, rc=0, cib-update=76, confirmed=true) ok
> Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
> action 7: monitor ClusterIP_monitor_3 on g5se-dea2b1 (local)
> Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
> operation ClusterIP_monitor_3 (call=71, rc=0, cib-update=77, 
> confirmed=false) ok
> Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 36 
> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-138.bz2): Complete
> Feb 21 23:10:42 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Feb 21 23:10:43 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On 
> loss of CCM Quorum: Ignore
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.71.3
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.72.1 
> 4e5a3b6259a59f84bcfec6d0f16ad3ba
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: --  admin_epoch="0" epoch="71" num_updates="3"/>
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++   
>  role="Started" node="g5se-dea2b1" score="-INFINITY"/>
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss 
> of CCM Quorum: Ignore
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: LogActions: Stop
> ClusterIP#011(g5se-dea2b1)
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
> Calculated Transition 37: /var/lib/pacemaker/pengine/pe-input-139.bz2
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
> action 7: stop ClusterIP_stop_0 on g5se-dea2b1 (local)
> Feb 21 23:10:43 g5se-dea2b1 IPaddr2[13372]: INFO: IP status = ok, IP_CIP=
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: process_lrm_event: LRM 
> operation ClusterIP_stop_0 (call=75, rc=0, cib-update=79, confirmed=true) ok
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: run_graph: Transition 37 
> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-139.bz2): Complete
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Feb 21 23:10:43 g5se-dea2b1 stonith-ng[1558]:   notice: unpack_config: On 
> loss of CCM Quorum: Ignore
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: --- 0.72.2
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: Diff: +++ 0.73.1 
> 93f902fd51a6750b828144d42f8c7a6e
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: --   
>  role="Started" node="g5se-dea2b1" score="-INFINITY"/>
> Feb 21 23:10:43 g5se-dea2b1 cib[1557]:   notice: cib:diff: ++  num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" 
> cib-last-written="Sun Feb 21 23:10:43 2016" update-origin="g5se-dea2b1" 
> update-client="crm_resource" crm_feature_set="3.0.7" have-quorum="1" 
> dc-uuid="g5se-dea2b1"/>
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: unpack_config: On loss 
> of CCM Quorum: Ignore
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: LogActions: Start   
> ClusterIP#011(g5se-dea2b1)
> Feb 21 23:10:43 g5se-dea2b1 pengine[1561]:   notice: process_pe_message: 
> Calculated Transition 38: /var/lib/pacemaker/pengine/pe-input-140.bz2
> Feb 21 23:10:43 g5se-dea2b1 crmd[1562]:   notice: te_rsc_command: Initiating 
> action 6: start ClusterIP_start_0 on g5se-dea2b1 (local)
> 
> 
> 
> -Original Message-
> From: users-requ...@clusterlabs.org [mailto:users-requ...@clusterlabs.org] 
>

Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2016-03-14 Thread Ken Gaillot

On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
> I am back to this question =)
> 
> I am still trying to understand the impact of "High CPU load detected"
> messages in the log.
> Looking in the code I figured out that setting "load-threshold" parameter
> to something higher than 100% solves the problem.
> And actually for 8 cores (12 with Hyper Threading) load-threshold=400% kind
> of works.
> 
> Also I noticed that this parameter may have an impact on the number of "the
> maximum number of jobs that can be scheduled per node". As there is a
> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.
> 
> Is my understanding correct that the impact of setting "load-threshold"
> high enough (so there is no noisy messages) will lead only to the
> "throttle_job_max" and nothing more.
> Also, if I got it correct, than "throttle_job_max" is a number of allowed
> parallel actions per node in lrmd.
> And a child of the lrmd is actually an RA process running some actions
> (monitor, start, etc).
> 
> So there is no impact on how many RA (resources) can run on a node, but how
> Pacemaker will operate with them in parallel (I am not sure I understand
> this part correct).

I believe that is an accurate description. I think the job limit applies
to fence actions as well as lrmd actions.

Note that if /proc/cpuinfo exists, pacemaker will figure out the number
of cores from there, and divide the actual reported load by that number
before comparing against load-threshold.

> Thank you,
> Kostia
> 
> On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof  wrote:
> 
>>
>>> On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <
>> konstantin.ponomare...@gmail.com> wrote:
>>>
>>> I think I wasn't precise in my questions.
>>> So I will try to ask more precise questions.
>>> 1. why the default value for "load-threshold" is 80%?
>>
>> Experimentation showed it better to begin throttling before the node
>> became saturated.
>>
>>> 2. what would be the impact to the cluster in case of
>> "load-threshold=100%”?
>>
>> Your nodes will be busier.  Will they be able to handle your load or will
>> it result in additional recovery actions (creating more load and more
>> failures)?  Only you will know when you try.
>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomare...@gmail.com> wrote:
>>> Guys, please, if anyone can help me to understand this parameter better,
>> I would be appreciated.
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomare...@gmail.com> wrote:
>>> Another question - is it crmd specific to measure CPU usage by "I/O
>> wait"?
>>> And if I need to get the most performance of the running resources in
>> cluster, should I set "load-threshold=95%" (or even 100%)?
>>> Will it impact the cluster behavior in any ways?
>>> The man page for crmd says that it will "The cluster will slow down its
>> recovery process when the amount of system resources used (currently CPU)
>> approaches this limit".
>>> Does it mean there will be delays in cluster in moving resources in case
>> a node goes down, or something else?
>>> I just want to understand in better.
>>>
>>> That you in advance for the help =)
>>>
>>> P.S.: The main resource does a lot of disk I/Os.
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomare...@gmail.com> wrote:
>>> I didn't know that.
>>> You mentioned "as opposed to other Linuxes", but I am using Debian Linux.
>>> Does it also measure CPU usage by I/O waits?
>>> You are right about "I/O waits" (a screenshot of "top" is attached).
>>> But why it shows 50% of CPU usage for a single process (that is the main
>> one) while "I/O waits" shows a bigger number?
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl <
>> ulrich.wi...@rz.uni-regensburg.de> wrote:
>> "Ulrich Windl"  schrieb am
>> 22.05.2015 um
>>> 08:36 in Nachricht <555eea7202a10001a...@gwsmtp1.uni-regensburg.de>:
 Hi!

 I Linux I/O waits are considered for load (as opposed to other
>> Linuxes) Thus
>>> ^^ "In"
>> s/Linux/UNIX/
>>>
>>> (I should have my coffee now to awake ;-) Sorry.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-14 Thread Ken Gaillot

On 03/10/2016 09:49 AM, Lorand Kelemen wrote:
> Dear List,
> 
> After the creation and testing of a simple 2 node active-passive
> drbd+postfix cluster nearly everything works flawlessly (standby, failure
> of a filesystem resource + failover, splitbrain + manual recovery) however
> when delibarately killing the postfix instance, after reaching the
> migration threshold failover does not occur and resources revert to the
> Stopped state (except the master-slave drbd resource, which works as
> expected).
> 
> Ordering and colocation is configured, STONITH and quorum disabled, the
> goal is to always have one node running all the resources and at any sign
> of error it should fail over to the passive node, nothing fancy.
> 
> Is my configuration wrong or am I hitting a bug?
> 
> All software from centos 7 + elrepo repositories.

With these versions, you can set "two_node: 1" in
/etc/corosync/corosync.conf (which will be done automatically if you
used "pcs cluster setup" initially), and then you don't need to ignore
quorum in pacemaker.

> Regarding STONITH: the machines are running on free ESXi instances on
> separate machines, so the Vmware fencing agents won't work because in the
> free version the API is read-only.
> Still trying to figure out a way to go, until then manual recovery + huge
> arp cache times on the upstream firewall...
> 
> Please find pe-input*.bz files attached, logs and config below. The
> situation: on node mail1 postfix was killed 3 times (migration threshold),
> it should have failed over to mail2.
> When killing a filesystem resource three times this happens flawlessly.
> 
> Thanks for your input!
> 
> Best regards,
> Lorand
> 
> 
> Cluster Name: mailcluster
> Corosync Nodes:
>  mail1 mail2
> Pacemaker Nodes:
>  mail1 mail2
> 
> Resources:
>  Group: network-services
>   Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0
>Operations: start interval=0s timeout=20s (virtualip-1-start-interval-0s)
>stop interval=0s timeout=20s (virtualip-1-stop-interval-0s)
>monitor interval=30s (virtualip-1-monitor-interval-30s)
>  Master: spool-clone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
>   Resource: spool (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=spool
>Operations: start interval=0s timeout=240 (spool-start-interval-0s)
>promote interval=0s timeout=90 (spool-promote-interval-0s)
>demote interval=0s timeout=90 (spool-demote-interval-0s)
>stop interval=0s timeout=100 (spool-stop-interval-0s)
>monitor interval=10s (spool-monitor-interval-10s)
>  Master: mail-clone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
>   Resource: mail (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=mail
>Operations: start interval=0s timeout=240 (mail-start-interval-0s)
>promote interval=0s timeout=90 (mail-promote-interval-0s)
>demote interval=0s timeout=90 (mail-demote-interval-0s)
>stop interval=0s timeout=100 (mail-stop-interval-0s)
>monitor interval=10s (mail-monitor-interval-10s)
>  Group: fs-services
>   Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)
>Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4
> options=nodev,nosuid,noexec
>Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s)
>stop interval=0s timeout=60 (fs-spool-stop-interval-0s)
>monitor interval=20 timeout=40 (fs-spool-monitor-interval-20)
>   Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)
>Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4
> options=nodev,nosuid,noexec
>Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s)
>stop interval=0s timeout=60 (fs-mail-stop-interval-0s)
>monitor interval=20 timeout=40 (fs-mail-monitor-interval-20)
>  Group: mail-services
>   Resource: postfix (class=ocf provider=heartbeat type=postfix)
>Operations: start interval=0s timeout=20s (postfix-start-interval-0s)
>stop interval=0s timeout=20s (postfix-stop-interval-0s)
>monitor interval=45s (postfix-monitor-interval-45s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start network-services then promote mail-clone (kind:Mandatory)
> (id:order-network-services-mail-clone-mandatory)
>   promote mail-clone then promote spool-clone (kind:Mandatory)
> (id:order-mail-clone-spool-clone-mandatory)
>   promote spool-clone then start fs-services (kind:Mandatory)
> (id:order-spool-clone-fs-services-mandatory)
>   start fs-services then start mail-services (kind:Mandatory)
> (id:order-fs-services-mail-services-mandatory)
> Colocation Constr

Re: [ClusterLabs] fence_scsi no such device

2016-03-15 Thread Ken Gaillot

On 03/15/2016 09:10 AM, marvin wrote:
> Hi,
> 
> I'm trying to get fence_scsi working, but i get "no such device" error.
> It's a two node cluster with nodes called "node01" and "node03". The OS
> is RHEL 7.2.
> 
> here is some relevant info:
> 
> # pcs status
> Cluster name: testrhel7cluster
> Last updated: Tue Mar 15 15:05:40 2016  Last change: Tue Mar 15
> 14:33:39 2016 by root via cibadmin on node01
> Stack: corosync
> Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
> 2 nodes and 23 resources configured
> 
> Online: [ node01 node03 ]
> 
> Full list of resources:
> 
>  Clone Set: dlm-clone [dlm]
>  Started: [ node01 node03 ]
>  Clone Set: clvmd-clone [clvmd]
>  Started: [ node01 node03 ]
>  fence-node1(stonith:fence_ipmilan):Started node03
>  fence-node3(stonith:fence_ipmilan):Started node01
>  Resource Group: test_grupa
>  test_ip(ocf::heartbeat:IPaddr):Started node01
>  lv_testdbcl(ocf::heartbeat:LVM):   Started node01
>  fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
>  oracle11_baza  (ocf::heartbeat:oracle):Started node01
>  oracle11_lsnr  (ocf::heartbeat:oralsnr):   Started node01
>  fence-scsi-node1   (stonith:fence_scsi):   Started node03
>  fence-scsi-node3   (stonith:fence_scsi):   Started node01
> 
> PCSD Status:
>   node01: Online
>   node03: Online
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> # pcs stonith show
>  fence-node1(stonith:fence_ipmilan):Started node03
>  fence-node3(stonith:fence_ipmilan):Started node01
>  fence-scsi-node1   (stonith:fence_scsi):   Started node03
>  fence-scsi-node3   (stonith:fence_scsi):   Started node01
>  Node: node01
>   Level 1 - fence-scsi-node3
>   Level 2 - fence-node3
>  Node: node03
>   Level 1 - fence-scsi-node1
>   Level 2 - fence-node1
> 
> # pcs stonith show fence-scsi-node1 --all
>  Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
>   Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata
> pcmk_reboot_action=off
>   Meta Attrs: provides=unfencing
>   Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s)
> 
> # pcs stonith show fence-scsi-node3 --all
>  Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
>   Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata
> pcmk_reboot_action=off
>   Meta Attrs: provides=unfencing
>   Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s)
> 
> node01 # pcs stonith fence node03
> Error: unable to fence 'node03'
> Command failed: No such device
> 
> node01 # tail /var/log/messages
> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client
> stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with
> device '(any)'
> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote
> operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can
> fence (reboot) node03: static-list
> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence
> (reboot) node03: static-list
> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options
> to fence node03 for stonith_admin.29191@node01.d1df9201 failed

The above line is the key. Both of the devices registered for node03
returned failure. Pacemaker then looked for any other device capable of
fencing node03 and there is none, so that's why it reported "No such
device" (an admittedly obscure message).

It looks like the fence agents require more configuration options than
you have set. If you run "/path/to/fence/agent -o metadata", you can see
the available options. It's a good idea to first get the agent running
successfully manually on the command line ("status" command is usually
sufficient), then put those same options in the cluster configuration.

> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Couldn't find anyone
> to fence (reboot) node03 with fence-node1
> Mar 15 14:54:04 node01 stonith-ng[20024]:   error: Operation reboot of
> node03 by  for stonith_admin.29191@node01.d1df9201: No such device
> Mar 15 14:54:04 node01 crmd[20028]:  notice: Peer node03 was not
> terminated (reboot) by  for node01: No such device
> (ref=d1df9201-5bb1-447f-9b40-d3d7235c3d0a) by client stonith_admin.29191
> 
> node03 # tail /var/log/messages
> Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: fence-scsi-node1 can
> not fence (reboot) node03: static-list
> Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: fence-node1 can not
> fence (reboot) node03: static-list
> Mar 15 14:54:04 node03 stonith-ng[2601]:  notice: Operation reboot of
> node03 by  for stonith_admin.29191@node01.d1df9201: No such device
> Mar 15 14:54:04 node03 crmd[2605]:  notice: Peer node03 was not
> terminated (reboot) by  for node01: No such device
> (ref=d1df9201-5bb1-447f-9b

Re: [ClusterLabs] Problems with pcs/corosync/pacemaker/drbd/vip/nfs

2016-03-15 Thread Ken Gaillot

On 03/14/2016 12:47 PM, Todd Hebert wrote:
> Hello,
> 
> I'm working on setting up a test-system that can handle NFS failover.
> 
> The base is CentOS 7.
> I'm using ZVOL block devices out of ZFS to back DRBD replicated volumes.
> 
> I have four DRBD resources (r0, r1, r2, r3, which are /dev/drbd1 drbd2 drbd3 
> and drbd4 respectively)
> 
> These all have XFS filesystems on them that mount properly and serve content 
> etc..
> 
> I tried using corosync/pacemaker/drbd/lsb:nfs-kernel-server on Ubuntu, and it 
> would serve content on the primary server without issue.  Any attempt to 
> failover or migrate services.. .everything would say it migrated fine, and 
> the filesystems would be mounted and readable/writable etc.., but the NFS 
> clients access to them would "pause"
> 
> This appears to be an issue with the nfs-kernel-server in Ubuntu, where it 
> simply would not recognise the NFS session information, which was on a 
> replicated volume.
> 
> If the primary node is put back online, everything migrates back "perfect" 
> and traffic that had "paused" on failover to the secondary system resumes, 
> even if it's been sitting there for 15-20 minutes not working.
> 
> There is no difference in behaviour between offlining the primary node, and 
> migrating lsb:nfs-kernel-server to another node (by it's primitive name, not 
> as lsb:nfs-kernel-server, obviously)
> 
> If I create new connections into NFS while test-sanb is active, they work, 
> only to "freeze" as above with an offline or migrate away from test-sanb, so 
> symptoms are the same in both "directions"
> 
> 
> 
> After not being able to get the lsb:nfs-kernel-server working properly in 
> Ubuntu, and reading similar stories from other users after a series of 
> googles, I switched over to CentOS 7.
> CentOS 7, instead of lsb:nfs-kernel-server, I am trying to use 
> systemd:nfs-server, since CentOS 7 uses systemd, rather than sysinitv for 
> managing services.

I'm not very familiar with NFS in a cluster, but there is an
ocf:heartbeat:nfsserver resource agent in the resource-agents package.
OCF agents are generally preferable to lsb/systemd because they give
more detailed information to the cluster, and it looks like in this
case, the RA does some RPC commands that the system scripts don't.

I'd give it a shot and see if it helps.

> Pretty much everything in the configuration except lsb:nfs-kernel-server came 
> right over.
> 
> Now, everything will run properly on the primary node (as was the case with 
> Ubuntu) but...
> If I put the "test-sana" node into standby, first NFS stops, then the VIP 
> stops, then the three NFS-shared filesystems get umounted (perfect so far)
> Then.. it appears that parts of the NFS service, either idmapd or rpcbind 
> haven't released their hold on the rpc_pipefs filesystem, so it's still 
> mounted... it's mounted inside /var/lib/nfs, which is on the last drbd volume.
> Pacemaker, or some other element detects that rpc_pipefs was still mounted, 
> umounts it, then umounts /var/lib/nfs, which should clear the way for 
> everything else to work.. but that's not what happens.
> 
> At this point, the ms_drbd_r should demote to "Secondary" on the primary 
> mode, allowing for the secondary node to promote to "Primary" and services to 
> start on "test-sanb", but instead the drbd processes on "test-sana" end up 
> marked as "Stopped" and checking `cat /proc/drbd` shows that the volumes are 
> still Primary/Secondary UpToDate/UpToDate on test-sana (and the opposite on 
> test-sanb)
> 
> It takes AGES (several minutes) for things to reach this state.
> 
> They stay this way indefinitely.  If I manually demote DRBD resources on 
> test-sana, they end up listed as "Master" in a "crm status" or "pcs status" 
> again, and eventually the status changes to "Primary/Secondary" in /proc/drbd 
> as well.
> 
> If I put node test-sana back online (node online test-sana) it takes a few 
> seconds for services to start back up and serve content again.
> 
> Since I cannot get services to run on test-sanb at all thus far, I don't know 
> if the symptoms would be the same in both directions.
> I can't find any differences in the two nodes that should account for this.
> 
> ---
> 
> In any case, what I need to arrive at is a working solution for NFS failure 
> across two nodes.
> 
> I have several systems where I'm using just heartbeat in order to failover IP 
> address, drbd, and nfs, but for single instances of drbd/nfs
> 
> I cannot find any working examples for either Ubuntu 14.04, nor CentOS 7 for 
> this scenario.  (There are some out there for Ubuntu, but they do not appear 
> to actually work with modern pacemaker et al.)
> 
> Does anyone have an example of working configurations for this?
> 
> My existing pacemaker configuration can be found here:  
> http://paste.ie/view/c766a4ff
> 
> As I mentioned, the configurations are nearly identical for both the Ubuntu 
> 14.04 and CentOS 7 setups, and the hardware used is the same i

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-18 Thread Ken Gaillot

ns:   Leave   mail:0  (Master mail1)
> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine: info:
> LogActions:   Leave   mail:1  (Slave mail2)
> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine: info:
> LogActions:   Leave   fs-spool(Started mail1)
> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine: info:
> LogActions:   Leave   fs-mail (Started mail1)
> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine:   notice:
> LogActions:   Stoppostfix (mail1)
> Mar 16 11:38:06 [7420] HWJ-626.domain.local   crmd: info:
> do_state_transition:  State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response ]
> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine:   notice:
> process_pe_message:   Calculated Transition 2964:
> /var/lib/pacemaker/pengine/pe-input-331.bz2
> Mar 16 11:38:06 [7420] HWJ-626.domain.local   crmd: info:
> do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542)
> derived from /var/lib/pacemaker/pengine/pe-input-331.bz2
> Mar 16 11:38:06 [7420] HWJ-626.domain.local   crmd:   notice:
> te_rsc_command:   Initiating action 5: stop postfix_stop_0 on mail1
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   Diff: --- 0.215.10 2
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   Diff: +++ 0.215.11 (null)
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   +  /cib:  @num_updates=11
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   +
>  
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
>  @operation_key=postfix_stop_0, @operation=stop,
> @transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> @transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> @call-id=1335, @last-run=1458124686, @last-rc-change=1458124686,
> @exec-time=435
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd: info:
> match_graph_event:Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_process_request:  Completed cib_modify operation for section status: OK
> (rc=0, origin=mail1/crmd/254, version=0.215.11)
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd:   notice:
> te_rsc_command:   Initiating action 12: stop virtualip-1_stop_0 on mail1
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   Diff: --- 0.215.11 2
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   Diff: +++ 0.215.12 (null)
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   +  /cib:  @num_updates=12
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_perform_op:   +
>  
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
>  @operation_key=virtualip-1_stop_0, @operation=stop,
> @transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> @transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> @call-id=1337, @last-run=1458124687, @last-rc-change=1458124687,
> @exec-time=56
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd: info:
> match_graph_event:Action virtualip-1_stop_0 (12) confirmed on mail1
> (rc=0)
> Mar 16 11:38:07 [7415] HWJ-626.domain.localcib: info:
> cib_process_request:  Completed cib_modify operation for section status: OK
> (rc=0, origin=mail1/crmd/255, version=0.215.12)
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd:   notice:
> run_graph:Transition 2964 (Complete=7, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2): Complete
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd: info: do_log:
> FSA: Input I_TE_SUCCESS from notify_crmd() received in state
> S_TRANSITION_ENGINE
> Mar 16 11:38:07 [7420] HWJ-626.domain.local   crmd:   notice:
> do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE [
> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Mar 16 11:38:12 [7415] HWJ-626.domain.localcib: info:
> cib_process_ping: Reporting our current digest to mail2:
> ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0)
> 
> 
> On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot  wrote:
> 
>> On 03/10/2016 09:49 AM, Lorand Kelemen wrote:
>>> Dear List,
&g

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-19 Thread Ken Gaillot

On 03/16/2016 11:20 AM, Lorand Kelemen wrote:
> Dear Ken,
> 
> I already modified the startup as suggested during testing, thanks! I
> swapped the postfix ocf resource to the amavisd systemd resource, as latter
> controls postfix startup also as it turns out and having both resouces in
> the mail-services group causes conflicts (postfix is detected as not
> running).
> 
> Still experiencing the same behaviour, killing amavisd returns an rc=7 for
> the monitoring operation on the "victim" node, this soungs logical, but the
> logs contain the same: amavisd and virtualip cannot run anywhere.
> 
> I made sure systemd is clean (amavisd = inactive, not running instead of
> failed) and also reset the failcount on all resources before killing
> amavisd.
> 
> How can I make sure to have a clean state for the resources beside above
> actions?

What you did is fine. I'm not sure why amavisd and virtualip can't run.
Can you show the output of "cibadmin -Q" when the cluster is in that state?

> Also note: when causing a filesystem resource to fail (e.g. with unmout),
> the failover happens successfully, all resources are started on the
> "survivor" node.
> 
> Best regards,
> Lorand
> 
> 
> On Wed, Mar 16, 2016 at 4:34 PM, Ken Gaillot  wrote:
> 
>> On 03/16/2016 05:49 AM, Lorand Kelemen wrote:
>>> Dear Ken,
>>>
>>> Thanks for the reply! I lowered migration-threshold to 1 and rearranged
>>> contraints like you suggested:
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>>   promote mail-clone then start fs-services (kind:Mandatory)
>>>   promote spool-clone then start fs-services (kind:Mandatory)
>>>   start fs-services then start network-services (kind:Mandatory)
>>
>> Certainly not a big deal, but I would change the above constraint to
>> start fs-services then start mail-services. The IP doesn't care whether
>> the filesystems are up yet or not, but postfix does.
>>
>>>   start network-services then start mail-services (kind:Mandatory)
>>> Colocation Constraints:
>>>   fs-services with spool-clone (score:INFINITY) (rsc-role:Started)
>>> (with-rsc-role:Master)
>>>   fs-services with mail-clone (score:INFINITY) (rsc-role:Started)
>>> (with-rsc-role:Master)
>>>   network-services with mail-services (score:INFINITY)
>>>   mail-services with fs-services (score:INFINITY)
>>>
>>> Now virtualip and postfix becomes stopped, I guess these are relevant
>> but I
>>> attach also full logs:
>>>
>>> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine: info:
>>> native_color: Resource postfix cannot run anywhere
>>> Mar 16 11:38:06 [7419] HWJ-626.domain.localpengine: info:
>>> native_color: Resource virtualip-1 cannot run anywhere
>>>
>>> Interesting, will try to play around with ordering - colocation, the
>>> solution must be in these settings...
>>>
>>> Best regards,
>>> Lorand
>>>
>>> Mar 16 11:38:06 [7415] HWJ-626.domain.localcib: info:
>>> cib_perform_op:   Diff: --- 0.215.7 2
>>> Mar 16 11:38:06 [7415] HWJ-626.domain.localcib: info:
>>> cib_perform_op:   Diff: +++ 0.215.8 (null)
>>> Mar 16 11:38:06 [7415] HWJ-626.domain.localcib: info:
>>> cib_perform_op:   +  /cib:  @num_updates=8
>>> Mar 16 11:38:06 [7415] HWJ-626.domain.localcib: info:
>>> cib_perform_op:   ++
>>>
>> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']:
>>>  >> operation_key="postfix_monitor_45000" operation="monitor"
>>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
>>> transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
>>> transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
>>> on_node="mail1" call-id="1333" rc-code="7"
>>> Mar 16 11:38:06 [7420] HWJ-626.domain.local   crmd: info:
>>> abort_transition_graph:   Transition aborted by postfix_monitor_45000
>>> 'create' on mail1: Inactive graph
>>> (magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8,
>>> source=process_graph_event:598, 1)
>>> Mar 16 11:38:06 [7420] HWJ-626.domain.local   crmd: info:
>>> update_failcount: Updating failcount for postfix on mail1 after
>> failed
>>> monitor: rc=7 (update=

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Ken Gaillot

On 03/17/2016 05:10 PM, Christopher Harvey wrote:
> If I ignore pacemaker's existence, and just run corosync, corosync
> disagrees about node membership in the situation presented in the first
> email. While it's true that stonith just happens to quickly correct the
> situation after it occurs it still smells like a bug in the case where
> corosync in used in isolation. Corosync is after all a membership and
> total ordering protocol, and the nodes in the cluster are unable to
> agree on membership.
> 
> The Totem protocol specifies a ring_id in the token passed in a ring.
> Since all of the 3 nodes but one have formed a new ring with a new id
> how is it that the single node can survive in a ring with no other
> members passing a token with the old ring_id?
> 
> Are there network failure situations that can fool the Totem membership
> protocol or is this an implementation problem? I don't see how it could
> not be one or the other, and it's bad either way.

Neither, really. In a split brain situation, there simply is not enough
information for any protocol or implementation to reliably decide what
to do. That's what fencing is meant to solve -- it provides the
information that certain nodes are definitely not active.

There's no way for either side of the split to know whether the opposite
side is down, or merely unable to communicate properly. If the latter,
it's possible that they are still accessing shared resources, which
without proper communication, can lead to serious problems (e.g. data
corruption of a shared volume).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind

2016-03-19 Thread Ken Gaillot

On 03/15/2016 06:47 PM, Mike Bernhardt wrote:
> Not sure if this is a BIND question or a PCS/Corosync question, but
> hopefully someone has done this before:
> 
>  
> 
> I'm setting up a new CentOS 7 DNS server cluster to replace our very old
> CentOS 4 cluster. The old one uses heartbeat which is no longer supported,
> so I'm now using pcs, corosync, and pacemaker.  The new one is running the
> latest 9.10.x production release of BIND. I want BIND to listen on, query
> from, etc on a particular IP address, which is virtualized with pacemaker. 
> 
>  
> 
> This worked fine on the old cluster. But whereas heartbeat would create a
> virtual subinterface (i.e. eth0:0) to support the virtual IP, corosync does
> not do that; at least it doesn't by default. So although the virtual IP
> exists and is pingable, it is not tied to a "physical" interface- ifconfig
> does not find it. And when BIND tries to start up, it fails because it can't
> find the virtual IP it's configured to run on, even though it is reachable.
> I only need IPv4, not IPv6.

The old subinterfaces are no longer neded in linux-land for "virtual"
IPs, which are now actually full-class citizens, just one of multiple
IPs assigned to the interface. ifconfig (or its fashionably new
alternative, "ip addr") should show both addresses on the same interface.

BIND shouldn't have any problems finding the IP. Can you show the error
messages that come up, and your pacemaker configuration?

>  
> 
> So, I'm hoping that there is a way to tell corosync (hopefully using pcsd)
> to create a virtual interface, not just a virtual address, so BIND can find
> it.
> 
>  
> 
> Thanks in advance!


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Ken Gaillot

On 03/16/2016 05:22 AM, Nikhil Utane wrote:
> I see following info gets updated in CIB. Can I use this or there is better
> way?
> 
>  crm-debug-origin="peer_update_callback" join="*down*" expected="member">

in_ccm/crmd/join reflect the current state of the node (as known by the
partition that you're looking at the CIB on), so if the node went down
and came back up, it won't tell you anything about being down.

- in_ccm indicates that the node is part of the underlying cluster layer
(heartbeat/cman/corosync)

- crmd indicates that the node is communicating at the pacemaker layer

- join indicates what phase of the join process the node is at

There's not a direct way to see what node went down after the fact.
There are ways however:

- if the node was running resources, those will be failed, and those
failures (including node) will be shown in the cluster status

- the logs show all node membership events; you can search for logs such
as "state is now lost" and "left us"

- "stonith -H $NODE_NAME" will show the fence history for a given node,
so if the node went down due to fencing, it will show up there

- you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon
periodically and run a script for node events, and you can write the
script to do whatever you want (email you, etc.) (in the upcoming 1.1.15
release, built-in notifications will make this more reliable and easier,
but any script you use with ClusterMon will still be usable with the new
method)

> On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane 
> wrote:
> 
>> Hi Ken,
>>
>> Sorry about the long delay. This activity was de-focussed but now it's
>> back on track.
>>
>> One part of question that is still not answered is on the newly active
>> node, how to find out which was the node that went down?
>> Anything that gets updated in the status section that can be read and
>> figured out?
>>
>> Thanks.
>> Nikhil
>>
>> On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot  wrote:
>>
>>> On 01/08/2016 11:13 AM, Nikhil Utane wrote:
>>>>> I think stickiness will do what you want here. Set a stickiness higher
>>>>> than the original node's preference, and the resource will want to stay
>>>>> where it is.
>>>>
>>>> Not sure I understand this. Stickiness will ensure that resources don't
>>>> move back when original node comes back up, isn't it?
>>>> But in my case, I want the newly standby node to become the backup node
>>> for
>>>> all other nodes. i.e. it should now be able to run all my resource
>>> groups
>>>> albeit with a lower score. How do I achieve that?
>>>
>>> Oh right. I forgot to ask whether you had an opt-out
>>> (symmetric-cluster=true, the default) or opt-in
>>> (symmetric-cluster=false) cluster. If you're opt-out, every node can run
>>> every resource unless you give it a negative preference.
>>>
>>> Partly it depends on whether there is a good reason to give each
>>> instance a "home" node. Often, there's not. If you just want to balance
>>> resources across nodes, the cluster will do that by default.
>>>
>>> If you prefer to put certain resources on certain nodes because the
>>> resources require more physical resources (RAM/CPU/whatever), you can
>>> set node attributes for that and use rules to set node preferences.
>>>
>>> Either way, you can decide whether you want stickiness with it.
>>>
>>>> Also can you answer, how to get the values of node that goes active and
>>> the
>>>> node that goes down inside the OCF agent?  Do I need to use
>>> notification or
>>>> some simpler alternative is available?
>>>> Thanks.
>>>>
>>>>
>>>> On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot 
>>> wrote:
>>>>
>>>>> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>>>>>> Would like to validate my final config.
>>>>>>
>>>>>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>>>>>> standby server.
>>>>>> The standby server should take up the role of active that went down.
>>> Each
>>>>>> active has some unique configuration that needs to be preserved.
>>>>>>
>>>>>> 1) So I will create total 5 groups. Each group has a
>>> "heartbeat::IPaddr2
>>>>>> resource (for virtual IP) and my custom resource.
>>>>

Re: [ClusterLabs] attrd: Fix sigsegv on exit if initialization failed

2016-03-19 Thread Ken Gaillot

On 10/12/2015 06:08 AM, Vladislav Bogdanov wrote:
> Hi,
> 
> This was caught with 0.17.1 libqb, which didn't play well with long pids.
> 
> commit 180a943846b6d94c27b9b984b039ac0465df64da
> Author: Vladislav Bogdanov 
> Date:   Mon Oct 12 11:05:29 2015 +
> 
> attrd: Fix sigsegv on exit if initialization failed
> 
> diff --git a/attrd/main.c b/attrd/main.c
> index 069e9fa..94e9212 100644
> --- a/attrd/main.c
> +++ b/attrd/main.c
> @@ -368,8 +368,12 @@ main(int argc, char **argv)
>  crm_notice("Cleaning up before exit");
> 
>  election_fini(writer);
> -crm_client_disconnect_all(ipcs);
> -qb_ipcs_destroy(ipcs);
> +
> +if (ipcs) {
> +crm_client_disconnect_all(ipcs);
> +qb_ipcs_destroy(ipcs);
> +}
> +
>  g_hash_table_destroy(attributes);
> 
>  if (the_cib) {

I set aside this message to merge it, then promptly lost it ... finally
ran across it again. It's merged into master now. Thanks for reporting
the problem and a patch.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Ken Gaillot

On 03/16/2016 03:04 PM, Christopher Harvey wrote:
> On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote:
>> On 16/03/16 03:59 PM, Christopher Harvey wrote:
>>> I am able to create a split brain situation in corosync 1.1.13 using
>>> iptables in a 3 node cluster.
>>>
>>> I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5
>>>
>>> All nodes are operational and form a 3 node cluster with all nodes are
>>> members of that ring.
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> so far so good.
>>>
>>> running the following on vmr-132-4 drops all incoming (but not outgoing)
>>> packets from vmr-132-3:
>>> # iptables -I INPUT -s 192.168.132.3 -j DROP
>>> # iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source   destination
>>> DROP   all  --  192.168.132.3anywhere
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>>
>>> vmr-132-3 thinks everything is normal and continues to provide service,
>>> vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
>>> service. Splitting the link between 3 and 4 in both directions isolates
>>> vmr 3 from the rest of the cluster and everything fails over normally,
>>> so only a unidirectional failure causes problems.
>>>
>>> I don't have stonith enabled right now, and looking over the
>>> pacemaker.log file closely to see if 4 and 5 would normally have fenced
>>> 3, but I didn't see any fencing or stonith logs.
>>>
>>> Would stonith solve this problem, or does this look like a bug?
>>
>> It should, that is its job.
> 
> is there some log I can enable that would say
> "ERROR: hey, I would use stonith here, but you have it disabled! your
> warranty is void past this point! do not pass go, do not file a bug"?

Enable fencing, and create a fence device with a static host list that
doesn't match any of your nodes. Pacemaker will think fencing is
configured, but when it tries to actually fence a node, no devices will
be capable of it, and there will be errors to that effect (including "No
such device"). The cluster will block at that point. You can use
stonith_admin --confirm to manually indicate the node is down and
unblock the cluster (but be absolutely sure the node really is down!).

>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Moving resources and implicit bans - please explain?

2016-03-19 Thread Ken Gaillot

On 03/16/2016 02:38 PM, Matthew Mucker wrote:
> I have set up my first three-node Pacemaker cluster and was doing some 
> testing by using "crm resource move" commands. I found that once I moved a 
> resource off a particular node, it would not come back up on that node. I 
> spent a while troubleshooting and eventually gave up and rebuilt the node.
> 
> After rebuild, the same thing happened. I then found in the documentation for 
> the crm_resource command under the move command "NOTE: This may prevent the 
> resource from running on the previous location node until the implicit 
> constraints expire or are removed with −−unban"
> 
> This is a regrettably vague note. What dictates the conditions for "may 
> prevent?" How do I determine what implicit constraints are present on my 
> resources and when they'll expire?
> 
> I did find that explicitly removing bans with crm_resource -U solved my 
> problem. However, I'd like to understand this further. Any explanation would 
> be appreciated. A Google search on "pacemaker move resource ban" didn't find 
> me anything that was obviously authoritative.
> 
> I'd appreciate any expertise the community could share with me!
> 
> Thanks,
> 
> -Matthew

Hi Matthew,

Sorry for that unpleasant detour. It catches a lot of people (at least
once ...).

Pacemaker dynamically chooses what node to run a resource on based on
many factors in the configuration and current state of the cluster. For
example, you can specify utilization values for each resource (RAM, CPU,
whatever) and a placement-strategy to balance them across nodes. You can
configure rules to change cluster behavior based on the time of day. You
can specify location constraints to say a resource prefers to run on or
not to run on a particular node, and colocation constraints to say a
resource prefers to run with another particular resource.

Given that dynamism, there is really no concept of "moving" a resource
in pacemaker. The ideal location is recalculated constantly based on
changing conditions.

To get around that, the various cluster tools emulate "moving" by
creating location constraints -- either a constraint pinning a resource
to a particular destination node, or a constraint banning the resource
from the original node. That forces pacemaker to accept a certain idea
of "best" location.

Those constraints stay in place permanently until you explicitly remove
them. Only you know why the resource needed to be moved, so only you
will know when it's safe to let it float again. The various tools all
offer an option to do this (like crm_resource -U, or pcs resource clear,
etc.).

What happens when you clear the constraints? Pacemaker recalculates the
ideal location without the constraint. Maybe the ideal location is where
it already is, in which case nothing happens, or maybe the ideal
location is now somewhere else, in which case it will move again.
"Stickiness" is a resource option that allows you to tell pacemaker to
give a preference to a resource's current location, to make it less
likely to move.

Hope that saves you some more detours ...

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster goes to unusable state if fencing resource is down

2016-03-20 Thread Ken Gaillot

On 03/18/2016 02:58 AM, Arjun Pandey wrote:
> Hi
> 
> I am running a 2 node cluster with this config on centos 6.6  where i
> have a multi-state resource foo being run in master/slave mode and  a
> bunch of floating IP addresses configured. Additionally i have a
> collocation constraint for the IP addr to be collocated with the
> master.
> 
> When i configure fencing using fence_ilo4 agents things work fine.
> However during testing i was trying out a case where the ilo cable is
> plugged out. In this case the entire cluster is brought down.
> 
> I understand that this seems to be a safer solution to ensure
> correctness and consistency of the systems. However my requirement was

Exactly. Without working fencing, the cluster can't know whether the
node is really down, or just malfunctioning and possibly still accessing
shared resources.

> to still keep it operational since the application and the floating ip
> are still up. Is there a way to acheive this ?

If fencing fails, and the node is really down, you'd be fine ignoring
the failure. But if the node is actually up, ignoring the failure means
both nodes will activate the floating IP, which will not be operational
(packets will sometimes go to one node, sometimes the other, disrupting
any reliable communication).

> Also considering a case where there is a multi node cluster ( more
> than 10 nodes )  and one of the machines just goes down along with the
> ilo resource for that node. Does it really make sense to bring the
> services down even when the rest of nodes are up ?

It makes sense if data integrity is your highest priority. Imagine a
cluster used by a bank for customer's account balances -- it's far
better to lock up the entire cluster than risk corrupting that data.

The best solution that pacemaker offers in this situation is fencing
topology. You can have multiple fence devices, and if one fails,
pacemaker will try the next.

One common deployment is IPMI as the first level (as you have now), with
an intelligent power switch as the second (backup) level. If IPMI
doesn't respond, the cluster will cut power to the host. Another
possibility is to use an intelligent network switch to cut off network
access to the failed node (if that is sufficient to prevent the node
from accessing any shared resources). If the services being offered are
important enough to require high availability, the relatively small cost
of an intelligent power switch should be easily justified, serving as a
type of insurance.

Not having fencing has such a high chance of making a huge mess that no
company I know of that supports clusters will support a cluster without it.

That said, if you are supporting your own clusters, understand the
risks, and are willing to deal with the worst-case scenario manually,
pacemaker does offer the option to disable stonith. There is no built-in
option to try stonith but ignore any failures. However, it is possible
to configure a fencing topology that does the same thing, if the second
level simply pretends that the fencing succeeded. I'm not going to
encourage that by describing how ;)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Reload operation for multi-state resource agent

2016-03-21 Thread Ken Gaillot

On 03/19/2016 03:35 AM, Michael Lychkov wrote:
> Hello everyone,
> 
> Is there way to initiate reload operation call of master instance of
> multi-state resource agent?
> 
> I have an ocf multi-state resource agent for a daemon service and I
> added reload op into this resource agent:
> 
> * two parameters of resource agent:
> 
> 
> ...
> 
> 
> ...
> 
> 
> * reload op declaration in meta-data:
> 
> 
> 
> * reload op processing:
> 
> case "$1" in
> 
> monitor)svc_monitor
> exit $?;;
> ...
> reload) svc_reload
> exit $?;;
> ...
> 
> When I change *init_svc_reload *parameter to a different value, reload
> operation is executed only for slave instance of resource agent.
> The only impact on master instance is early execution of monitor
> operation, but I'd rather prefer reload execution for this instance.

This sounds like a bug. I'd try upgrading to RHEL 6.7 first (even if
only on a test machine), as it has additional bugfixes. If it still
occurs after that, I'd recommend opening a bug at
https://bugzilla.redhat.com/ with your resource agent and the output of
crm_report.

> [root@vm1 ~]# rpm -qi pacemaker
> Name: pacemakerRelocations: (not relocatable)
> Version : 1.1.12Vendor: Red Hat, Inc.
> Release : 4.el6 Build Date: Thu 03 Jul
> 2014 04:05:56 PM MSK
> ...
> 
> [root@vm1 ~]# rpm -qi corosync
> Name: corosync Relocations: (not relocatable)
> Version : 1.4.7 Vendor: Red Hat, Inc.
> Release : 2.el6 Build Date: Mon 02 Mar
> 2015 08:21:24 PM MSK
> ...
> 
> [root@pershing-vm5 ~]# rpm -qi resource-agents
> Name: resource-agents  Relocations: (not relocatable)
> Version : 3.9.5 Vendor: Red Hat, Inc.
> Release : 12.el6_6.4Build Date: Thu 12 Feb
> 2015 01:13:26 AM MSK
> ...
> 
> [root@vm1 ~]# lsb_release -d
> Description:Red Hat Enterprise Linux Server release 6.6 (Santiago)
> 
> ---
> 
> Best regards, Mike Lychkov.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] no clone for pcs-based cluster fencing?

2016-03-21 Thread Ken Gaillot

On 03/20/2016 06:20 PM, Devin Reade wrote:
> I'm looking at a new pcs-style two node cluster running on CentOS 7
> (pacemaker 1.1.13, corosync 2.3.4) and crm_mon shows this line
> for my fencing resource, that is the resource running on only one of
> the two nodes:
> 
>fence_cl2 (stonith:fence_apc_snmp):Started nodeB
> 
> On an older CentOS 5 cluster (pacemaker 1.0.12, corosync 1.2.7),
> crm_mon shows as the comparable line the following, that is the
> fencing agent cloned across both nodes:
> 
>Clone Set: pdu-fence
>Started: [ node1 node2 ]
> 
> With the new pcs-based clusters, are the (APC SNMP) fencing agents not
> supposed to be cloned anymore, or do I just have things misconfigured?

It's actually newer pacemaker versions rather than pcs itself. Fence
agents do not need to be cloned, or even running -- as long as they're
configured and enabled, any node can use the resource. The node that's
"running" the fence device will be the one that monitors it, and as a
result it will be preferred to execute the fencing if possible.

For fence_apc_snmp in particular, you want to use pcmk_host_map instead
of pcmk_host_list/pcmk_host_check, to map node names to APC ports. For
example, pcmk_host_map="nodeA:1;nodeB:2" if those are the relevant APC
ports.

> Looking at the pcs man page, it looks like cloning a fencing agent isn't
> supported.
> 
> I was pretty sure I tested fencing on both nodes when I first configured
> the cluster, but I don't have a record of where the fencing agent was
> running at the time.  I was using the RHEL7 cluster configuration guide
> and I don't seem to see anything in there talking about cloning the fencing
> agent, so I'm wondering if it's no longer a recommended/required procedure.
> 
> New cluster config:
> 
>  Stonith Devices:
>Resource: fence_cl2 (class=stonith type=fence_apc_snmp)
> Attributes: pcmk_host_list=nodeA,nodeB ipaddr=192.168.1.200
> community=somepass pcmk_host_check=static-list
> Operations: monitor interval=60s (fence_cl2-monitor-interval-60s)
> 
> Old cluster config:
> 
>   primitive st-pdu stonith:apcmastersnmp \
>params ipaddr="192.168.1.200" community="somepass" port="161" \
>op start interval="0" timeout="60s" \
>op stop interval="0" timeout="15s" \
>op monitor interval="3600" timeout="60s"
>   clone pdu-fence st-pdu
> 
> 
> Devin


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread Ken Gaillot

On 03/21/2016 08:39 AM, marvin wrote:
> 
> 
> On 03/15/2016 03:39 PM, Ken Gaillot wrote:
>> On 03/15/2016 09:10 AM, marvin wrote:
>>> Hi,
>>>
>>> I'm trying to get fence_scsi working, but i get "no such device" error.
>>> It's a two node cluster with nodes called "node01" and "node03". The OS
>>> is RHEL 7.2.
>>>
>>> here is some relevant info:
>>>
>>> # pcs status
>>> Cluster name: testrhel7cluster
>>> Last updated: Tue Mar 15 15:05:40 2016  Last change: Tue Mar 15
>>> 14:33:39 2016 by root via cibadmin on node01
>>> Stack: corosync
>>> Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with
>>> quorum
>>> 2 nodes and 23 resources configured
>>>
>>> Online: [ node01 node03 ]
>>>
>>> Full list of resources:
>>>
>>>   Clone Set: dlm-clone [dlm]
>>>   Started: [ node01 node03 ]
>>>   Clone Set: clvmd-clone [clvmd]
>>>   Started: [ node01 node03 ]
>>>   fence-node1(stonith:fence_ipmilan):Started node03
>>>   fence-node3(stonith:fence_ipmilan):Started node01
>>>   Resource Group: test_grupa
>>>   test_ip(ocf::heartbeat:IPaddr):Started node01
>>>   lv_testdbcl(ocf::heartbeat:LVM):   Started node01
>>>   fs_testdbcl(ocf::heartbeat:Filesystem):Started node01
>>>   oracle11_baza  (ocf::heartbeat:oracle):Started node01
>>>   oracle11_lsnr  (ocf::heartbeat:oralsnr):   Started node01
>>>   fence-scsi-node1   (stonith:fence_scsi):   Started node03
>>>   fence-scsi-node3   (stonith:fence_scsi):   Started node01
>>>
>>> PCSD Status:
>>>node01: Online
>>>node03: Online
>>>
>>> Daemon Status:
>>>corosync: active/enabled
>>>pacemaker: active/enabled
>>>pcsd: active/enabled
>>>
>>> # pcs stonith show
>>>   fence-node1(stonith:fence_ipmilan):Started node03
>>>   fence-node3(stonith:fence_ipmilan):Started node01
>>>   fence-scsi-node1   (stonith:fence_scsi):   Started node03
>>>   fence-scsi-node3   (stonith:fence_scsi):   Started node01
>>>   Node: node01
>>>Level 1 - fence-scsi-node3
>>>Level 2 - fence-node3
>>>   Node: node03
>>>Level 1 - fence-scsi-node1
>>>Level 2 - fence-node1
>>>
>>> # pcs stonith show fence-scsi-node1 --all
>>>   Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
>>>Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata
>>> pcmk_reboot_action=off
>>>Meta Attrs: provides=unfencing
>>>Operations: monitor interval=60s
>>> (fence-scsi-node1-monitor-interval-60s)
>>>
>>> # pcs stonith show fence-scsi-node3 --all
>>>   Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
>>>Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata
>>> pcmk_reboot_action=off
>>>Meta Attrs: provides=unfencing
>>>Operations: monitor interval=60s
>>> (fence-scsi-node3-monitor-interval-60s)
>>>
>>> node01 # pcs stonith fence node03
>>> Error: unable to fence 'node03'
>>> Command failed: No such device
>>>
>>> node01 # tail /var/log/messages
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client
>>> stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with
>>> device '(any)'
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote
>>> operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can
>>> fence (reboot) node03: static-list
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence
>>> (reboot) node03: static-list
>>> Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options
>>> to fence node03 for stonith_admin.29191@node01.d1df9201 failed
>> The above line is the key. Both of the devices registered for node03
>> returned failure. Pacemaker then looked for any other device capable of
>> fencing node03 and there is none, so that's why it reported "No such
>> device" (an admittedly obscure message).
>>
>> It looks like the fence agents require more configuration options than
>> you have set. If you run &q

Re: [ClusterLabs] Antw: Re: no clone for pcs-based cluster fencing?

2016-03-21 Thread Ken Gaillot

On 03/21/2016 09:34 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 21.03.2016 um 15:22 in 
>>>> Nachricht
> <56f003b0.4020...@redhat.com>:
> 
> [...]
>> It's actually newer pacemaker versions rather than pcs itself. Fence
>> agents do not need to be cloned, or even running -- as long as they're
>> configured and enabled, any node can use the resource. The node that's
>> "running" the fence device will be the one that monitors it, and as a
>> result it will be preferred to execute the fencing if possible.
> 
> Q: How can I find out which node is running the fencing device when not 
> having configured a monitor?

Fence devices show up in status output like any other resource. If
there's no monitor configured, "running" a fence device has no real
effect on anything.

> 
> [...]
> 
> Regards,
> Ulrich
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-22 Thread Ken Gaillot

On 03/22/2016 06:32 AM, Stanislav Kopp wrote:
> Hi,
> 
> I have problem with using "fence_pve" agent with pacemaker, the agent
> works fine from command line, but if I simulate stonith action or use
> "crm node fence ", it doesn't work:
> 
>  Mar 22 10:38:06 [675] redis2 stonith-ng: debug:
> xml_patch_version_check: Can apply patch 0.50.22 to 0.50.21
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug: stonith_command:
> Processing st_query 0 from redis1 ( 0)
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug: stonith_query: Query
>  st_async_id="da507055-afc3-4b5c-bcc0-f887dd1736d3" st_op="st_query"
> st_callid="2" st_callopt="0"
> st_remote_op="da507055-afc3-4b5c-bcc0-f887dd1736d3" st_target="redis1"
> st_device_action="reboot" st_origin="redis1"
> st_clientid="95f8f271-e698-4a9b-91ae-950c55c230a5"
> st_clientname="crmd.674" st_timeout="60" src="redis1"/>
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug: get_capable_devices:
> Searching through 1 devices to see what is capable of action (reboot)
> for target redis1
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug:
> schedule_stonith_command: Scheduling list on stonith-redis1 for
> stonith-ng (timeout=60s)
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug: stonith_command:
> Processed st_query from redis1: OK (0)
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug: stonith_action_create:
> Initiating action list for agent fence_pve (target=(null))
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug:
> internal_stonith_action_execute: forking
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug:
> internal_stonith_action_execute: sending args
> Mar 22 10:38:06 [675] redis2 stonith-ng: debug:
> stonith_device_execute: Operation list on stonith-redis1 now running
> with pid=8707, timeout=60s
> Mar 22 10:38:07 [675] redis2 stonith-ng: debug:
> stonith_action_async_done: Child process 8707 performing action 'list'
> exited with rc 0

There is a fence parameter pcmk_host_check that specifies how pacemaker
determines which fence devices can fence which nodes. The default is
dynamic-list, which means to run the fence agent's list command to get
the nodes. So that's what we're seeing above ...

> Mar 22 10:38:07 [675] redis2 stonith-ng: info: dynamic_list_search_cb:
> Refreshing port list for stonith-redis1
> Mar 22 10:38:07 [675] redis2 stonith-ng: debug:
> search_devices_record_result: Finished Search. 0 devices can perform
> action (reboot) on node redis1

... however not all fence agents can figure out their targets
dynamically. Above, we can see either that's the case, or the device
really can't fence redis1.

> Mar 22 10:38:07 [675] redis2 stonith-ng: debug:
> stonith_query_capable_device_cb: Found 0 matching devices for 'redis1'
> Mar 22 10:38:07 [675] redis2 stonith-ng: debug: stonith_command:
> Processing st_notify reply 0 from redis1 ( 0)
> Mar 22 10:38:07 [675] redis2 stonith-ng: debug:
> process_remote_stonith_exec: Marking call to reboot for redis1 on
> behalf of crmd.674@da507055-afc3-4b5c-bcc0-f887dd1736d3.redis1: No
> such device (-19)
> Mar 22 10:38:07 [675] redis2 stonith-ng: notice: remote_op_done:
> Operation reboot of redis1 by  for crmd.674@redis1.da507055:
> No such device
> Mar 22 10:38:07 [675] redis2 stonith-ng: debug: stonith_command:
> Processed st_notify reply from redis1: OK (0)
> Mar 22 10:38:07 [679] redis2 crmd: notice: tengine_stonith_notify:
> Peer redis1 was not terminated (reboot) by  for redis1: No
> such device (ref=da507055-afc3-4b5c-bcc0-f887dd1736d3) by client
> crmd.674
> Connection to 192.168.122.137 closed by remote host.
> 
> I already read similar thread ("fence_scsi no such device"), but
> didn't find anything what can help me. I'm using pacemaker
> 1.1.14-2~bpo8+1 with corosync 2.3.5-3~bpo8+1 on Debian jessie.
> 
> some info:
> 
> redis1:~# stonith_admin -L
>  stonith-redis2
> 1 devices found
> 
> redis1:~# crm configure show
> node 3232266889: redis2
> node 3232266923: redis1
> primitive ClusterIP IPaddr2 \
> params ip=192.168.122.10 nic=eth0 \
> op monitor interval=10s \
> meta is-managed=true
> primitive stonith-redis1 stonith:fence_pve \
> params ipaddr=192.168.122.6 \
> params login="root@pam" passwd=secret port=100 \
> op start interval=0 timeout=60s \
> meta target-role=Started is-managed=true

You can specify pcmk_host_list or pcmk_host_map to use a static target
list for the device. For example pcmk_host_list=redis1 would say this
fence device can target redis1 only. pcmk_host_map is the same but lets
you specify a different name for the target when calling the device --
for example, pcmk_host_map=redis1:1 would target redis1, but send just
"1" to the device.

> primitive stonith-redis2 stonith:fence_pve \
> params ipaddr=192.168.122.7 \
> params login="root@pam" passwd=secret port=101 \
> op start interval=0 timeout=60s \
> meta target-role=Started is-managed=true
> location loc_stonith-redis1 stonith-redis1 -inf: redis1
> location loc

Re: [ClusterLabs] Resource creation fails

2016-03-22 Thread Ken Gaillot

On 03/22/2016 07:37 AM, Nagorny, Dimitry wrote:
> Hi,
> 
> Forgot to attach the file, sorry.
> 
> 
> Respectfully
> Dimitry Nagorny
> Trainee
> 
> Von: Nagorny, Dimitry [mailto:dimitry.nago...@robot5.de]
> Gesendet: Dienstag, 22. März 2016 13:32
> An: users@clusterlabs.org
> Betreff: [ClusterLabs] Resource creation fails
> 
> Good afternoon all,
> 
> My setup is that I have successfully running two CentOS 7.2 Server nodes in a 
> pacemaker cluster with failover on a virtual IP. Both Servers act as 
> OpenSIPS. I wrote my own ocf script to surveil that the opensips pid is still 
> there so I can assume opensips is still running (very basic but its all I 
> want for the moment).
> 
> My problem: If I want to create the resource I am getting messages that I 
> can't interpret and can't find anything about it with the help of google:
> 
> Traceback (most recent call last):
>   File "/usr/sbin/pcs", line 219, in 
> main(sys.argv[1:])
>   File "/usr/sbin/pcs", line 159, in main
> cmd_map[command](argv)
>   File "/usr/lib/python2.7/site-packages/pcs/resource.py", line 42, in 
> resource_cmd
> resource_create(res_id, res_type, ra_values, op_values, meta_values, 
> clone_opts)
>   File "/usr/lib/python2.7/site-packages/pcs/resource.py", line 538, in 
> resource_create
> bad_opts, missing_req_opts = utils.validInstanceAttributes(ra_id, params 
> , get_full_ra_type(ra_type, True))
>   File "/usr/lib/python2.7/site-packages/pcs/utils.py", line 1770, in 
> validInstanceAttributes
> for action in actions.findall("parameter"):
> AttributeError: 'NoneType' object has no attribute 'findall'
> 
> I get this after entering: pcs resource create opensips ocf:opensips:opensips 
> op monitor interval=15s
> The ocf is stored in /usr/lib/ocf/resource.d/opensips/opensips.
> Please find attached the ocf script. If anything else is needed please ask.
> 
> 
> 1.   What wants the traceback tell me?

It indicates a bug in pcs, most likely in response to unexpected
behavior from the resource agent.

> 2.   Is there any way to check the ocf on centOS7? ocf-tester is not 
> available.

I'm not sure why it isn't, maybe something distro-specific in it, but it
would be nice. However you can test it like this:

OCF_ROOT=/usr/lib/ocf [OCF_RESKEY_= ...] /path/to/agent
$ACTION

where the param/value pairs are whatever parameters the agent accepts
and $ACTION is whatever action you want to test.

> 3.   Any hint on how to solve this?

The major things that jump out at me in your agent:
* missing a  section (I'm not sure, but it may be required,
even if empty)
* missing the validate-all command (I believe it is required, even if it
does nothing)
* start action must check whether the process is already running, and
return success if so
* making stop do nothing seems like a bad idea to me
* monitor must distinguish between cleanly stopped (OCF_NOT_RUNNING) and
failed/unknown (OCF_ERR_GENERIC)

> 
> Very Respectfully
> Dimitry Nagorny
> Trainee


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-22 Thread Ken Gaillot

interval=0 timeout=60 \
> params plugin_config="/etc/neutron/l3_agent.ini" 
> remove_artifacts_on_stop_start=true
> primitive p_neutron-metadata-agent ocf:fuel:ocf-neutron-metadata-agent \
> op monitor interval=60 timeout=10 \
> op start interval=0 timeout=30 \
> op stop interval=0 timeout=30
> primitive p_neutron-plugin-openvswitch-agent ocf:fuel:ocf-neutron-ovs-agent \
> op monitor interval=20 timeout=10 \
> 
>> On 11 Mar 2016, at 14:11, Ken Gaillot  wrote:
>>
>> On 03/10/2016 11:36 PM, Сергей Филатов wrote:
>>> This one is the right log
>>
>> Something in the cluster configuration and state (for example, an
>> unsatisfied constraint) is preventing the cluster from starting the
>> resource:
>>
>> Mar 10 04:00:53 [11785] controller-1.domain.compengine: info:
>> native_print: compute-1   (ocf::pacemaker:remote):Stopped
>> Mar 10 04:00:53 [11785] controller-1.domain.compengine: info:
>> native_color: Resource compute-1 cannot run anywhere
>>
>>
>>>
>>>
>>>
>>>> On 10 Mar 2016, at 08:17, Сергей Филатов >>> <mailto:filat...@gmail.com>> wrote:
>>>>
>>>> pcs resource show compute-1
>>>>
>>>> Resource: compute-1 (class=ocf provider=pacemaker type=remote)
>>>> Operations: monitor interval=60s (compute-1-monitor-interval-60s)
>>>>
>>>> Can’t find _start_0 template in pacemaker logs
>>>> I don’t have ipv6 address for remote node, but I guess it should be 
>>>> listening 
>>>> on both
>>>>
>>>> attached pacemaker.log for cluster node
>>>> 
>>>>
>>>>
>>>>> On 09 Mar 2016, at 10:23, Ken Gaillot >>>> <mailto:kgail...@redhat.com>> wrote:
>>>>>
>>>>> On 03/08/2016 11:38 PM, Сергей Филатов wrote:
>>>>>> ssh -p 3121 compute-1
>>>>>> ssh_exchange_identification: read: Connection reset by peer
>>>>>>
>>>>>> That’s what I get in /var/log/pacemaker.log after restarting 
>>>>>> pacemaker_remote:
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: crm_signal_dispatch:  Invoking handler for signal 
>>>>>> 15: 
>>>>>> Terminated
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: lrmd_shutdown:Terminating with  0 clients
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: qb_ipcs_us_withdraw:  withdrawing server sockets
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: crm_xml_cleanup:  Cleaning up memory from 
>>>>>> libxml2
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: crm_log_init: Changed active directory to 
>>>>>> /var/lib/heartbeat/cores/root
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: qb_ipcs_us_publish:   server name: lrmd
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd:   notice: lrmd_init_remote_tls_server:  Starting a tls 
>>>>>> listener 
>>>>>> on port 3121.
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd:   notice: bind_and_listen:  Listening on address ::
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: qb_ipcs_us_publish:   server name: cib_ro
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>>>> <http://compute-1.domain.com/> 
>>>>>>  lrmd: info: qb_ipcs_us_publish:   server name: cib_rw
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com 
>>>>>> <http:/

Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-23 Thread Ken Gaillot

On 03/23/2016 06:41 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> There is a fence parameter pcmk_host_check that specifies how pacemaker
>> determines which fence devices can fence which nodes. The default is
>> dynamic-list, which means to run the fence agent's list command to get
>> the nodes.  [...]
>>
>> You can specify pcmk_host_list or pcmk_host_map to use a static target
>> list for the device.
> 
> I meant to research this, but now that you brought it up: does the
> default of pcmk_host_check automatically change to static-list if
> pcmk_host_list is defined?

If pcmk_host_check is specified, it is used;

otherwise, if pcmk_host_list and/or pcmk_host_map are specified,
pcmk_host_check=static-list;

otherwise, if the device supports the list or status commands,
pcmk_host_check=dynamic-list;

otherwise, pcmk_host_check=none.


> Does pcmk_host_map override pcmk_host_list?  Does it play together with
> pcmk_host_check=dynamic-list?

If pcmk_host_check=none, all devices can fence all nodes (i.e. if you
explicitly specify none, pcmk_host_list/pcmk_host_map are ignored);

otherwise, if pcmk_host_check=static-list, the values of pcmk_host_list
and/or pcmk_host_map are allowed targets (i.e. they combine if both are
specified, though that would be poor practice as it's confusing to read);

otherwise, the device is queried (i.e. static and dynamic never combine).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] attrd does not clean per-node cache after node removal

2016-03-23 Thread Ken Gaillot

On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote:
> Hi!
> 
> It seems like atomic attrd in post-1.1.14 (eb89393) does not
> fully clean node cache after node is removed.

Is this a regression? Or have you only tried it with this version?

> After our QA guys remove node wa-test-server-ha-03 from a two-node cluster:
> * stop pacemaker and corosync on wa-test-server-ha-03
> * remove node wa-test-server-ha-03 from corosync nodelist on 
> wa-test-server-ha-04
> * tune votequorum settings
> * reload corosync on wa-test-server-ha-04
> * remove node from pacemaker on wa-test-server-ha-04
> * delete everything from /var/lib/pacemaker/cib on wa-test-server-ha-03
> , and then join it with the different corosync ID (but with the same node 
> name),
> we see the following in logs:
> 
> Leave node 1 (wa-test-server-ha-03):
> Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: 
> crm_update_peer_proc: Node wa-test-server-ha-03[1] - state is now lost (was 
> member)
> Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Removing all 
> wa-test-server-ha-03 (1) attributes for attrd_peer_change_cb
> Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Lost attribute 
> writer wa-test-server-ha-03
> Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Removing 
> wa-test-server-ha-03/1 from the membership list
> Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Purged 1 peers 
> with id=1 and/or uname=wa-test-server-ha-03 from the membership cache
> Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Processing 
> peer-remove from wa-test-server-ha-04: wa-test-server-ha-03 0
> Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Removing all 
> wa-test-server-ha-03 (0) attributes for wa-test-server-ha-04
> Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Removing 
> wa-test-server-ha-03/1 from the membership list
> Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Purged 1 peers 
> with id=0 and/or uname=wa-test-server-ha-03 from the membership cache
> 
> Join node 3 (the same one, wa-test-server-ha-03, but ID differs):
> Mar 23 04:21:23 wa-test-server-ha-04 attrd[25962]: notice: 
> crm_update_peer_proc: Node wa-test-server-ha-03[3] - state is now member (was 
> (null))
> Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
> Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
> Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 
> share the same name 'wa-test-server-ha-03'

It took me a while to understand the above combination of messages. This
is not node 3 joining. This is node 1 joining after node 3 has already
been seen.

The warnings are a complete dump of the peer cache. So you can see that
wa-test-server-ha-03 is listed only once, with id 3.

The critical message ("Node 1 and 3") lists the new id first and the
found ID second. So id 1 is what it's trying to add to the cache.

Did you update the node ID in corosync.conf on *both* nodes?

> Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:   notice: Node 
> 'wa-test-server-ha-03' has changed its ID from 1 to 3
> Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
> Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
> Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 
> share the same name 'wa-test-server-ha-03'
> Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:   notice: Node 
> 'wa-test-server-ha-03' has changed its ID from 1 to 3
> Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
> Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
> Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
> Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 
> share the same name 'wa-test-server-ha-03'
> Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:   notice: Node 
> 'wa-test-server-ha-03' has changed its ID from 3 to 1
> ...
> 
> On the node being joined:
> Mar 23 04:21:23 wa-test-server-ha-03 attrd[15260]:   notice: Connecting to 
> cluster infrastructure: corosync
> Mar 23 04:21:23 wa-test-server-ha-03 attrd[15260]:   notice: 
> crm_update_peer_proc: Node wa-test-server-ha-03[3] - state is now member (was 
> (null))
> Mar 23 04:21:24 wa-test-server-ha-03 attrd[15260]:   notice: 
> crm_update_peer_proc: Node wa-test-server-ha-04[2] - state is now member (was 
> (null))
> Mar 23 04:21:24 wa-test-server-ha-03 attrd[15260]:   notice: Recorded 
> attribu

[ClusterLabs] Regressions found in 1.1.14

2016-03-24 Thread Ken Gaillot

A couple of regressions have been found in the pacemaker 1.1.14 release
that may affect some users.

Commit 0fe7a4dd introduced a scalability regression. If the compressed
CIB is greater than 1MB in size, pacemaker previously took advantage of
libqb support for larger sizes if available. The regression
inadvertently blocked that capability. This has been fixed by commits
eff10246 and b2c591ed.

Commit 8b98a9b2 introduced a bug where if an unseen node is fenced, and
remains unseen after the fence, it will still (incorrectly) be
considered unclean, causing a fence loop. This has been fixed by commit
98457d16.

Packagers and users compiling from source are recommended to backport
those commits as patches, or use the latest master branch head, which
includes less significant bugfixes as well and is believed to be stable.

These will of course be fixed in the future 1.1.15 release.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Make sure either 0 or all resources in a group are running

2016-03-28 Thread Ken Gaillot

On 03/28/2016 02:19 PM, Sam Gardner wrote:
> Is there any way to modify the behavior of a resource group N of A, B, and C 
> so that either A, B, and C are running on the same node, or none of them are?
> 
> With Pacemaker 1.1.12 and Corosync 1.4.8, if a group N is defined via:
> pcs resource group N A B C
> 
> if resource C cannot run, A and B still do.
> 
> --
> Sam Gardner
> Trustwave | SMART SECURITY ON DEMAND

The problem with that model is that none of the resources can be placed
or started, because each depends on the others being placed and started
already.

I can think of two similar alternatives, though they would only work for
failures, not for any other reasons C might be stopped:

* Use on-fail=standby, so that if any resource fails, all resources are
forced off that node. The node must be manually taken out of standby to
be used again.

* Use rules to say that A cannot run on any node where fail-count-B gt 0
or fail-count-C gt 0, and B cannot run on any node where fail-count C gt
0. (The group should handle the rest of the dependencies.)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] cloned pingd resource problem

2016-03-30 Thread Ken Gaillot

On 03/30/2016 08:38 AM, fatcha...@gmx.de wrote:
> Hi,
> 
> I`m running a two node cluster on a fully updated CentOS 7 
> (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64) . I see on one 
> of our nodes a lot of this in the logfiles:
> 
> Mar 30 12:32:13 localhost crmd[12986]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
> origin=crm_timer_popped ]
> Mar 30 12:32:13 localhost pengine[12985]:  notice: On loss of CCM Quorum: 
> Ignore
> Mar 30 12:32:13 localhost pengine[12985]: warning: Processing failed op 
> monitor for ping_fw:0 on kathie2: unknown error (1)
> Mar 30 12:32:13 localhost pengine[12985]: warning: Processing failed op start 
> for ping_fw:1 on stacy2: unknown error (1)
> Mar 30 12:32:13 localhost pengine[12985]: warning: Forcing ping_fw-clone away 
> from stacy2 after 100 failures (max=100)
> Mar 30 12:32:13 localhost pengine[12985]: warning: Forcing ping_fw-clone away 
> from stacy2 after 100 failures (max=100)

Pacemaker monitors the resource by calling its resource agent's status
action every 45 seconds. The first warning above indicates that the
resource agent returned a generic error code on kathie2, which in this
case (ocf:pacemaker:ping) means that the specified IP (192.168.16.1) did
not respond to ping.

The second warning indicates that the instance on stacy2 failed to
start, which again in this case means that the IP did not respond to a
ping from that node. The last two warnings indicate that pacemaker
retried the start continuously and eventually gave up.

> Mar 30 12:32:13 localhost pengine[12985]:  notice: Calculated Transition 
> 1823: /var/lib/pacemaker/pengine/pe-input-355.bz2
> Mar 30 12:32:13 localhost crmd[12986]:  notice: Transition 1823 (Complete=0, 
> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-355.bz2): Complete
> Mar 30 12:32:13 localhost crmd[12986]:  notice: State transition 
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
> origin=notify_crmd ]
> 
> 
> The configuration looks like this:
> 
> Clone: ping_fw-clone
>   Resource: ping_fw (class=ocf provider=pacemaker type=ping)
>Attributes: dampen=5s multiplier=1000 host_list=192.168.16.1 timeout=60
>Operations: start interval=0s timeout=60 (ping_fw-start-interval-0s)
>stop interval=0s timeout=20 (ping_fw-stop-interval-0s)
>monitor interval=45 (ping_fw-monitor-interval-45)
> 
> 
> What can I do to resolve the problem ? 

The problem is that ping from the nodes to 192.168.16.1 does not always
work. This could be expected in your environment, or could indicate a
networking issue. But it's outside pacemaker's control; pacemaker is
simply monitoring it and reporting when there's a problem.

> Any suggestions are welcome
> 
> Kind regards
> 
> fatcharly


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker on-fail standby recovery does not start DRBD slave resource

2016-03-30 Thread Ken Gaillot

On 03/30/2016 11:20 AM, Sam Gardner wrote:
> I have configured some network resources to automatically standby their node 
> if the system detects a failure on them. However, the DRBD slave that I have 
> configured does not automatically restart after the node is "unstandby-ed" 
> after the failure-timeout expires.
> Is there any way to make the "stopped" DRBDSlave resource automatically start 
> again once the node is recovered?
> 
> See the  progression of events below:
> 
> Running cluster:
> Wed Mar 30 16:04:20 UTC 2016
> Cluster name:
> Last updated: Wed Mar 30 16:04:20 2016
> Last change: Wed Mar 30 16:03:24 2016
> Stack: classic openais (with plugin)
> Current DC: ha-d1.tw.com - partition with quorum
> Version: 1.1.12-561c4cf
> 2 Nodes configured, 2 expected votes
> 7 Resources configured
> 
> 
> Online: [ ha-d1.tw.com ha-d2.tw.com ]
> 
> Full list of resources:
> 
>  Resource Group: network
>  inif   (ocf::custom:ip.sh):   Started ha-d1.tw.com
>  outif  (ocf::custom:ip.sh):   Started ha-d1.tw.com
>  dmz1   (ocf::custom:ip.sh):   Started ha-d1.tw.com
>  Master/Slave Set: DRBDMaster [DRBDSlave]
>  Masters: [ ha-d1.tw.com ]
>  Slaves: [ ha-d2.tw.com ]
>  Resource Group: filesystem
>  DRBDFS (ocf::heartbeat:Filesystem):Started ha-d1.tw.com
>  Resource Group: application
>  service_failover   (ocf::custom:service_failover):Started 
> ha-d1.tw.com
> 
> 
> version: 8.4.5 (api:1/proto:86-101)
> srcversion: 315FB2BBD4B521D13C20BF4
> 
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
> ns:4 nr:0 dw:4 dr:757 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> [153766.565352] block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), 
> RLE 21(1), total 21; compression: 100.0%
> [153766.568303] block drbd1: receive bitmap stats [Bytes(packets)]: plain 
> 0(0), RLE 21(1), total 21; compression: 100.0%
> [153766.568316] block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1
> [153766.568356] block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1 exit code 255 (0xfffe)
> [153766.568363] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent 
> -> Inconsistent )
> [153766.568374] block drbd1: Began resync as SyncSource (will sync 4 KB [1 
> bits set]).
> [153766.568444] block drbd1: updated sync UUID 
> B0DA745C79C56591:36E0631B6F022952:36DF631B6F022952:133127197CF097C6
> [153766.577695] block drbd1: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
> [153766.577700] block drbd1: updated UUIDs 
> B0DA745C79C56591::36E0631B6F022952:36DF631B6F022952
> [153766.577705] block drbd1: conn( SyncSource -> Connected ) pdsk( 
> Inconsistent -> UpToDate )¯
> 
> Failure detected:
> Wed Mar 30 16:08:22 UTC 2016
> Cluster name:
> Last updated: Wed Mar 30 16:08:22 2016
> Last change: Wed Mar 30 16:03:24 2016
> Stack: classic openais (with plugin)
> Current DC: ha-d1.tw.com - partition with quorum
> Version: 1.1.12-561c4cf
> 2 Nodes configured, 2 expected votes
> 7 Resources configured
> 
> 
> Node ha-d1.tw.com: standby (on-fail)
> Online: [ ha-d2.tw.com ]
> 
> Full list of resources:
> 
>  Resource Group: network
>  inif   (ocf::custom:ip.sh):   Started ha-d1.tw.com
>  outif  (ocf::custom:ip.sh):   Started ha-d1.tw.com
>  dmz1   (ocf::custom:ip.sh):   FAILED ha-d1.tw.com
>  Master/Slave Set: DRBDMaster [DRBDSlave]
>  Masters: [ ha-d1.tw.com ]
>  Slaves: [ ha-d2.tw.com ]
>  Resource Group: filesystem
>  DRBDFS (ocf::heartbeat:Filesystem):Started ha-d1.tw.com
>  Resource Group: application
>  service_failover   (ocf::custom:service_failover):Started 
> ha-d1.tw.com
> 
> Failed actions:
> dmz1_monitor_7000 on ha-d1.tw.com 'not running' (7): call=156, 
> status=complete, last-rc-change='Wed Mar 30 16:08:19 2016', queued=0ms, 
> exec=0ms
> 
> 
> 
> version: 8.4.5 (api:1/proto:86-101)
> srcversion: 315FB2BBD4B521D13C20BF4
> 
>  1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
> ns:4 nr:0 dw:4 dr:765 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> [153766.568356] block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1 exit code 255 (0xfffe)
> [153766.568363] block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent 
> -> Inconsistent )
> [153766.568374] block drbd1: Began resync as SyncSource (will sync 4 KB [1 
> bits set]).
> [153766.568444] block drbd1: updated sync UUID 
> B0DA745C79C56591:36E0631B6F022952:36DF631B6F022952:133127197CF097C6
> [153766.577695] block drbd1: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
> [153766.577700] block drbd1: updated UUIDs 
> B0DA745C79C56591::36E0631B6F022952:36DF631B6F022952
> [153766.577705] block drbd1: conn( SyncSource -> Connected ) pdsk( 
> Inconsistent -> UpToDate )
> [154057.455270] e1000: eth2 NIC Link is Down
> [154057.455451] e1000 :02:02.0 eth2: Reset adapter
> 
> Failover complete:
> Wed Mar 30

Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2016-03-30 Thread Ken Gaillot

On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:
> Ken, thank you for the answer.
> 
> Every node in my cluster under normal conditions has "load average" of
> about 420. It is mainly connected to the high disk IO on the system.
> My system is designed to use almost 100% of its hardware (CPU/RAM/disks),
> so the situation when the system consumes almost all HW resources is
> normal.

420 suggests that HW resources are outstripped -- anything above the
system's number of cores means processes are waiting for some resource.
(Although with an I/O-bound workload like this, the number of cores
isn't very important -- most will be sitting idle despite the high
load.) And if that's during normal conditions, what will happen during a
usage spike? It sounds like a recipe for less-than-HA.

Under high load, there's a risk of negative feedback, where monitors
time out, causing pacemaker to schedule recovery actions, which cause
load to go higher and more monitors to time out, etc. That's why
throttling is there.

> I would like to get rid of "High CPU load detected" messages in the
> log, because
> they flood corosync.log as well as system journal.
> 
> Maybe you can give an advice what would be the best way do to it?
> 
> So far I came up with the idea of setting "load-threshold" to 1000% ,
> because of:
> 420(load average) / 24 (cores) = 17.5 (adjusted_load);
> 2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20
> 
> if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
> crm_notice("High %s detected: %f", desc, load);

That should work, as far as reducing the log messages, though of course
it also reduces the amount of throttling pacemaker will do.

> In this case do I need to set "node-action-limit" to something less than "2
> x cores" (which is default).

It's not necessary, but it would help compensate for the reduced
throttling by imposing a maximum number of actions run at one time.

I usually wouldn't recommend reducing log verbosity, because detailed
logs are often necessary for troubleshooting cluster issues, but if your
logs are on the same I/O controller that is overloaded, you might
consider logging only to syslog and not to an additional detail file.
That would cut back on the amount of I/O due to pacemaker itself. You
could even drop PCMK_logpriority to warning, but then you're losing even
more information.

> Because the logic is (crmd/throttle.c):
> 
> switch(r->mode) {
> case throttle_extreme:
> case throttle_high:
> jobs = 1; /* At least one job must always be allowed */
> break;
> case throttle_med:
> jobs = QB_MAX(1, r->max / 4);
> break;
> case throttle_low:
> jobs = QB_MAX(1, r->max / 2);
> break;
> case throttle_none:
> jobs = QB_MAX(1, r->max);
> break;
> default:
> crm_err("Unknown throttle mode %.4x on %s", r->mode, node);
> break;
> }
> return jobs;
> 
> 
> The thing is, I know that there is "High CPU load" and this is normal
> state, but I wont Pacemaker to not saying it to me and treat this state the
> best it can.

If you can't improve your I/O performance, what you suggested is
probably the best that can be done.

When I/O is that critical to you, there are many tweaks that can make a
big difference in performance. I'm not sure how familiar you are with
them already. Options depend on what your storage is (local or network,
hardware/software/no RAID, etc.) and what your I/O-bound application is
(database, etc.), but I'd look closely at cache/buffer settings at all
levels from hardware to application, RAID stripe alignment, filesystem
choice and tuning, log verbosity, etc.

> 
> Thank you,
> Kostia
> 
> On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot  wrote:
> 
>> On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
>>> I am back to this question =)
>>>
>>> I am still trying to understand the impact of "High CPU load detected"
>>> messages in the log.
>>> Looking in the code I figured out that setting "load-threshold" parameter
>>> to something higher than 100% solves the problem.
>>> And actually for 8 cores (12 with Hyper Threading) load-threshold=400%
>> kind
>>> of works.
>>>
>>> Also I noticed that this parameter may have an impact on the number of
>> "the
>>> maximum number of jobs that can be scheduled per node". As there is a
>>> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.
>

Re: [ClusterLabs] spread out resources

2016-04-01 Thread Ken Gaillot

On 03/30/2016 08:37 PM, Ferenc Wágner wrote:
> Hi,
> 
> I've got a couple of resources (A, B, C, D, ... more than cluster nodes)
> that I want to spread out to different nodes as much as possible.  They
> are all the same, there's no distinguished one amongst them.  I tried
> 
> 
>   
> 
> 
> 
> 
>   
>   
> 
> 
> But crm_simulate did not finish with the above in the CIB.
> What's a good way to get this working?

Per the docs, "A colocated set with sequential=false makes sense only if
there is another set in the constraint. Otherwise, the constraint has no
effect." Using sequential=false would allow another set to depend on all
these resources, without them depending on each other.

I haven't actually tried resource sets with negative scores, so I'm not
sure what happens there. With sequential=true, I'd guess that each
resource would avoid the resource listed before it, but not necessarily
any of the others.

By default, pacemaker does spread things out as evenly as possible, so I
don't think anything special is needed. If you want more control over
the assignment, you can look into placement strategies:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617282141104

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Coming in 1.1.15: recurring stop operations

2016-04-01 Thread Ken Gaillot

In current versions of Pacemaker, trying to configure a stop operation
with a time interval results in a warning about an "invalid configuration."

A popular request from the community has been to enable this feature,
and I am proud to announce recurring stop operations will be part of the
future 1.1.15 release.

A common situation where a recurring stop operation is required is when
a particular provided service is funded in the budget for only part of
the time. Now, configuring a stop operation with interval=36h allows you
to stop providing the service every day and a half.

Another common use case requested by users is to more evenly distribute
the staff utilization at 24-hour NOC facilities. With a interval=8h stop
operation, you can be sure that you will get your salaries' worth from
every NOC shift.

Lastly, some users have requested sysadmin training exercises. With this
new feature, it is possible to use rules to apply the interval only
during conditions of your choosing. For example, you can set a 2-hour
stop interval to apply only during full moons that occur on a Friday.
That will give a thorough disaster training workout to your new sysadmins.
-- 
Ken Gaillot 

P.S. Please check the date of this post before replying. ;)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-04 Thread Ken Gaillot

On 04/02/2016 01:16 AM, Jason Voorhees wrote:
> Hello guys:
> 
> I've been recently reading "Pacemaker - Clusters from scratch" and
> working on a CentOS 7 system with pacemaker 1.1.13, corosync-2.3.4 and
> drbd84-utils-8.9.5.
> 
> The PDF instructs how to create a DRBD resource that seems to be
> automatically started due to a symmetric-cluster setup.
> 
> However I want to setup an asymmetric-cluster/opt-in
> (symmetric-cluster=false) but I don't know how to configure a
> constraint to prefer node1 over node2 to start my DRBD resource as
> Master (Primary).

I thought location constraints supported role, but that isn't
documented, so I'm not sure. But it is documented with regard to rules,
which using pcs might look like:

pcs location clusterdataClone rule \
  role=master \
  score=50 \
  '#uname' eq nodo1

For a lower-level explanation of rules, see
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136

> So far this are my resources and constraints:
> 
> [root@nodo1 ~]# pcs resource
>  IPService  (ocf::heartbeat:IPaddr2):   Started nodo1
>  Web(systemd:httpd):Started nodo1
>  Master/Slave Set: clusterdataClone [clusterdata]
>  Stopped: [ nodo1 nodo2 ]
> 
> [root@nodo1 ~]# pcs constraint
> Location Constraints:
>   Resource: IPService
> Enabled on: nodo2 (score:50)
> Enabled on: nodo1 (score:100)
>   Resource: Web
> Enabled on: nodo2 (score:50)
> Enabled on: nodo1 (score:100)
> Ordering Constraints:
>   start IPService then start Web (kind:Mandatory)
> Colocation Constraints:
>   Web with IPService (score:INFINITY)
> 
> My current DRBD status:
> 
> [root@nodo1 ~]# drbdadm role clusterdb
> 0: Failure: (127) Device minor not allocated
> additional info from kernel:
> unknown minor
> Command 'drbdsetup-84 role 0' terminated with exit code 10
> 
> 
> [root@nodo2 ~]# drbdadm role clusterdb
> 0: Failure: (127) Device minor not allocated
> additional info from kernel:
> unknown minor
> Command 'drbdsetup-84 role 0' terminated with exit code 10
> 
> 
> I know that it's possible to configure my cluster as asymmetric and
> use constraints to avoid a resource running (or becoming master) on
> certain nodes, but this time I would like to learn how to do it with
> an opt-in scenario.
> 
> Thanks in advance for your help.
> 
> P.D. nodo1 & nodo2 are spanish names for node1 and node2
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker on-fail standby recovery does not start DRBD slave resource

2016-04-06 Thread Ken Gaillot

On 03/30/2016 12:18 PM, Sam Gardner wrote:
> I'll check about the cluster-recheck-interval. Attached is a crm_report.
> 
> In the meantime, what is all performed on that interval? The Red Hat docs
> say the following, which doesn't make much sense to me:

Normally, the cluster only recalculates what actions need to be taken
when an interesting event occurs -- node or resource failure,
configuration change, node attribute change, etc.

The cluster-recheck-interval allows that recalculation to happen
regardless of (the lack of) events. For example, let's say you have
rules that specify that certain constraints only apply between 9am and
5pm. If there are no events happening at 9am, the rules won't actually
be noticed or take effect. So the cluster-recheck-interval is the
granularity of such "time-based changes". A cluster-recheck-interval of
5m ensures the rules kick in no later than 9:05am.

Looking at the crm_report:

I see "Configuration ERRORs found during PE processing.  Please run
"crm_verify -L" to identify issues." The offending bit is described a
little earlier: "error: RecurringOp: Invalid recurring action
DRBDSlave-start-interval-30s wth name: 'start'". There was a discussion
on the mailing list recently about this -- a recurring start action is
meaningless.

That constraint will be ignored. If you want to set on-fail=standby for
DRBD starts, use an interval of 0.

I'd recommend running "crm_verify -L" to see if there are any other
issues, and take care of them. Once you have a clean crm_verify, run
"cibadmin --upgrade" to upgrade the XML of your configuration to the
latest schema. This is just good housekeeping when keeping an older
configuration after pacemaker upgrades.

I see "e1000: eth2 NIC Link is Down" shortly before the issue. If you're
using ifdown/ifup to test failure, be aware that corosync can't recover
from that particular scenario (known issue, nontrivial to fix). It's
recommended to simulate a network failure by blocking corosync traffic
via the local firewall (both inbound and outbound). Or of course you can
unplug a network cable.

Are you limited to the "classic openais (with plugin)" cluster stack?
Corosync 2 is preferred these days, and even corosync 1 + CMAN gets more
testing than the old plugin.

If it still happens after looking into those items, I'd need logs from
both nodes from the failure time to a couple minutes after the
unstandby. The other node will be the DC at this point and will have the
more interesting bits.

> Polling interval for time-based changes to options, resource 
> parameters
> and constraints. Allowed values: Zero disables polling, positive values
> are an interval in seconds (unless other SI units are specified, such as
> 5min).
> --
> Sam Gardner
> Trustwave | SMART SECURITY ON DEMAND
> 
> 
> 
> On 3/30/16, 11:46 AM, "Ken Gaillot"  wrote:
> 
>> On 03/30/2016 11:20 AM, Sam Gardner wrote:
>>> I have configured some network resources to automatically standby their
>>> node if the system detects a failure on them. However, the DRBD slave
>>> that I have configured does not automatically restart after the node is
>>> "unstandby-ed" after the failure-timeout expires.
>>> Is there any way to make the "stopped" DRBDSlave resource automatically
>>> start again once the node is recovered?
>>>
>>> See the  progression of events below:
>>>
>>> Running cluster:
>>> Wed Mar 30 16:04:20 UTC 2016
>>> Cluster name:
>>> Last updated: Wed Mar 30 16:04:20 2016
>>> Last change: Wed Mar 30 16:03:24 2016
>>> Stack: classic openais (with plugin)
>>> Current DC:
>>> http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>> AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom - partition with quorum
>>> Version: 1.1.12-561c4cf
>>> 2 Nodes configured, 2 expected votes
>>> 7 Resources configured
>>>
>>>
>>> Online: [
>>> http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>> AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>> http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>> FVqRWF9lw&s=5&u=http%3a%2f%2fha-d2%2etw%2ecom ]
>>>
>>> Full list of resources:
>>>
>>>  Resource Group: network
>>>  inif   (ocf::custom:ip.sh):   Started
>>> http://scanmail.trustwave.com/?c=4062&d=8oP81inGHG69ATJU-vrUMVGr-hM5L5fIh
>>> AFrSWgtww&s=5&u=http%3a%2f%2fha-d1%2etw%2ecom
>>>  outif  (ocf::custom:ip.sh):   Started

Re: [ClusterLabs] Freezing/Unfreezing in Pacemaker ?

2016-04-07 Thread Ken Gaillot

On 04/07/2016 06:40 AM, jaspal singla wrote:
> Hello,
> 
> As we have clusvcadm -U  and clusvcadm -Z 
>  to freeze and unfreeze resource in CMAN. Would really appreciate if
> someone please give some pointers for freezing/unfreezing a resource in
> Pacemaker (pcs) as well.
> 
> Thanks,
> Jaspal Singla

Hi,

The equivalent in pacemaker is "managed" and "unmanaged" resources.

The usage depends on what tools you are using. For pcs, it's "pcs
resource unmanage " to freeze, and "manage" to unfreeze.
At a lower level, it's setting the is-managed meta-attribute of the
resource.

It's also possible to set the maintenance-mode cluster property to
"freeze" all resources.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] resource user/group

2016-04-07 Thread Ken Gaillot

On 04/07/2016 09:34 AM, Andrey Rogovsky wrote:
> I was discovery problem. lrmd started resource as postgres:haclient
> instead postgres:postgres.
> I did not know - is this bug fixed or not, couse my pacemaker a bit oldest.

The lrmd runs all resource agents as root. It's up to the resource agent
(or whatever commands it calls) to change the user and group if desired.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-07 Thread Ken Gaillot

On 04/06/2016 09:29 PM, Jason Voorhees wrote:
> Hey guys:
> 
> I've been reading a little bit more about rules but there are certain
> things that are not so clear to me yet. First, I've created 3 normal
> resources and one master/slave resource (clusterdataClone). My
> resources and constraints look like this:
> 
> # pcs resource
>  MTA(systemd:postfix):  Started nodo1
>  Web(systemd:httpd):Started nodo1
>  IPService  (ocf::heartbeat:IPaddr2):   Started nodo1
>  Master/Slave Set: clusterdataClone [clusterdata]
>  Masters: [ nodo1 ]
>  Slaves: [ nodo2 ]
> 
> # pcs constraint show --full
> Location Constraints:
>   Resource: IPService
> Enabled on: nodo1 (score:10) (id:location-IPService-nodo1-10)
> Enabled on: nodo2 (score:9) (id:location-IPService-nodo2-9)
> Enabled on: nodo1 (score:INFINITY) (role: Started) 
> (id:cli-prefer-IPService)

FYI, commands that "move" a resource do so by adding location
constraints. The ID of these constraints will start with "cli-". They
override the normal behavior of the cluster, and stay in effect until
you explicitly remove them. (With pcs, you can remove them with "pcs
resource clear".)

>   Resource: MTA
> Enabled on: nodo1 (score:10) (id:location-MTA-nodo1-10)
> Enabled on: nodo2 (score:9) (id:location-MTA-nodo2-9)
>   Resource: Web
> Enabled on: nodo1 (score:10) (id:location-Web-nodo1-10)
> Enabled on: nodo2 (score:9) (id:location-Web-nodo2-9)
>   Resource: clusterdataClone
> Constraint: location-clusterdataClone
>   Rule: score=INFINITY boolean-op=or  (id:location-clusterdataClone-rule)
> Expression: #uname eq nodo1  (id:location-clusterdataClone-rule-expr)
> Expression: #uname eq nodo2  
> (id:location-clusterdataClone-rule-expr-1)
> Ordering Constraints:
> Colocation Constraints:
>   Web with IPService (score:INFINITY) (id:colocation-Web-IPService-INFINITY)
>   MTA with IPService (score:INFINITY) (id:colocation-MTA-IPService-INFINITY)
>   clusterdataClone with IPService (score:INFINITY) (rsc-role:Master)
> (with-rsc-role:Started)
> (id:colocation-clusterdataClone-IPService-INFINITY)

Note that colocation constraints only specify that the resources must
run together. It does not imply any order in which they must be started.
If Web and/or MTA should be started after clusterdataClone, configure
explicit ordering constraints for that.

> These are the commands I run to create the master/slave resource and
> its contraints:
> 
> # pcs cluster cib myfile
> # pcs -f myfile resource create clusterdata ocf:linbit:drbd
> drbd_resource=clusterdb op monitor interval=30s role=Master op monitor
> interval=31s role=Slave
> # pcs -f myfile resource master clusterdataClone clusterdata
> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> # pcs -f myfile constraint location clusterdataClone rule
> score=INFINITY \#uname eq nodo1 or \#uname eq nodo2

The above constraint as currently worded will have no effect. It says
that clusterdataClone must be located on either nodo1 or nodo2. Since
those are your only nodes, it doesn't really constrain anything.

If you want to prefer one node for the master role, you want to add
role=master, take out the node you don't want to prefer, and set score
to something less than INFINITY.

> # pcs -f myfile constraint colocation add master clusterdataClone with 
> IPService
> # pcs cluster cib-push myfile
> 
> So now, my master/slave resource is started as Master in the same node
> where IPService is already active. So far so good. But the problem is
> that I can't move IPService from nodo1 to nodo2. When I run...
> 
> # pcs resource move IPService nodo2
> 
> it does nothing but... IPService keeps active on nodo1.
> 
> Then I tried to remove all my clusterdataClone constraints and repeat
> the same commands shows lines above (# pcs -f myfile ...) but this
> time without creating a colocation constraint between clusterdataClone
> and IPService.  When I do some tests again running...
> 
> # pcs resource move IPService nodo2
> 
> well, IPService is moved to nodo2, but clusterdataClone keeps active
> as Master in node1. I thought it would be promoted as Master in nodo2
> and demoted to Slave in nodo1.
> 
> Do you know why my master/slave resource is not being "moved as
> master" between nodes?
> 
> How do I "move" the Master role from nodo1 to nodo2 for
> clusterdataClone? I want to make  nodo2 Primary and nodo1 Secondary
> but I have no idea how to do this manually (only for testing)
> 
> I hope someone can help :(
> 
> Thanks in advance
> 
> On Mon, Apr 4, 2016 at 4:50 PM, Jason Voorhees  wrote:
>> I

Re: [ClusterLabs] DRBD on asymmetric-cluster

2016-04-07 Thread Ken Gaillot

On 04/07/2016 10:30 AM, Jason Voorhees wrote:
>> FYI, commands that "move" a resource do so by adding location
>> constraints. The ID of these constraints will start with "cli-". They
>> override the normal behavior of the cluster, and stay in effect until
>> you explicitly remove them. (With pcs, you can remove them with "pcs
>> resource clear".)
> 
> Agree :)
> 
> 
>> Note that colocation constraints only specify that the resources must
>> run together. It does not imply any order in which they must be started.
>> If Web and/or MTA should be started after clusterdataClone, configure
>> explicit ordering constraints for that.
>>
> 
> Agree. So far I haven't created any ordering constraints because it
> isn't important to me, YET, the order for starting services. However I
> have a question... if I don't have any ordering constraints at all, am
> I still able to activate resources no matter the order?

Sort of, but not exactly.

With a colocation constraint "A with B", the cluster must assign B to a
node before it can place A. B does not have to be started, but it does
have to be *able* to be started, in order to be assigned to a node. So
if something prevents B from being started (disabled in config, not
allowed on any online node, etc.), it will not be assigned to a node,
and A will not run.

That doesn't mean that B will be started first, though. If the cluster
needs to start both A and B, it can start them in any order. With an
ordering constraint "B then A", B must be started first, and the start
must complete successfully, before A can be started.

>>> These are the commands I run to create the master/slave resource and
>>> its contraints:
>>>
>>> # pcs cluster cib myfile
>>> # pcs -f myfile resource create clusterdata ocf:linbit:drbd
>>> drbd_resource=clusterdb op monitor interval=30s role=Master op monitor
>>> interval=31s role=Slave
>>> # pcs -f myfile resource master clusterdataClone clusterdata
>>> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>>> notify=true
>>> # pcs -f myfile constraint location clusterdataClone rule
>>> score=INFINITY \#uname eq nodo1 or \#uname eq nodo2
>>
>> The above constraint as currently worded will have no effect. It says
>> that clusterdataClone must be located on either nodo1 or nodo2. Since
>> those are your only nodes, it doesn't really constrain anything.
> Ok, the last command (location with rule) was created to allow
> clusterdataClone start at both nodes, because without this rule the
> resource was always in "stopped" status in both nodes. Once I added
> this rule my clusterdataClone resource started automatically but I
> don't understand why it choosed a node to run as Master and the other
> one as Slave. Is it random?

I don't know why the resource would be stopped without this constraint.
Maybe you have an opt-in cluster? But in that case you can use a normal
location constraint, you don't need a rule.

It will choose one as master and one as slave because you have
master-max=1. The choice, as with everything else in pacemaker, is based
on scores, but these are transparent to the user and appears "random".

>> If you want to prefer one node for the master role, you want to add
>> role=master, take out the node you don't want to prefer, and set score
>> to something less than INFINITY.
> Well, I could add a rule to prefer nodo1 over nodo2 to run the Master
> role (in fact, I think I already did it) but what I want it's
> something different: I would like the Master role to follow IPService,
> I mean, clusterdataClone become Master where IPService was previously
> activated.
> 
> Is this possible? Or the only way to configure constraints is that my
> resources (IPService, Web, MTA) follow the Master role of
> clusterdataClone?

I think the latter approach makes more sense and is common. Storage is
more complicated than an IP and thus more likely to break, so it would
seem to be more reliable to follow where storage can successfully start.
The exception would be if the IP is much more important to you than the
storage and is useful without it.

You might want to look at resource sets. The syntax is a bit difficult
to follow but it's very flexible. See pcs constraint colocation/order set.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crmd error: Cannot route message to unknown node

2016-04-07 Thread Ken Gaillot

On 04/07/2016 03:22 PM, Ferenc Wágner wrote:
> Hi,
> 
> On a freshly rebooted cluster node (after crm_mon reports it as
> 'online'), I get the following:
> 
> wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup
> Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar
> Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar
> Cleaning up vm-cedar on vhbl05, removing fail-count-vm-cedar
> Cleaning up vm-cedar on vhbl06, removing fail-count-vm-cedar
> Cleaning up vm-cedar on vhbl07, removing fail-count-vm-cedar
> Cleaning up vm-cedar on vhbl08, removing fail-count-vm-cedar
> Waiting for 6 replies from the CRMd..No messages received in 60 seconds.. 
> aborting
> 
> Meanwhile, this is written into syslog (I can also provide info level
> logs if necessary):
> 
> 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
> vhbl03
> 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
> vhbl04
> 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
> vhbl06
> 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
> vhbl07

This message can only occur when the node name is not present in this
node's peer cache.

I'm guessing that since you don't have node names in corosync, the cache
entries only have node IDs at this point. I don't know offhand when
pacemaker would figure out the association, but I bet it would be
possible to ensure it by running some command beforehand, maybe crm_node -l?

> 22:03:04 vhbl08 crmd[8990]:   notice: Operation vm-cedar_monitor_0: not 
> running (node=vhbl08, call=626, rc=7, cib-update=169, confirmed=true)
> 
> For background:
> 
> wferi@vhbl08:~$ sudo cibadmin --scope=nodes -Q
> 
>   
> 
>value="124928"/>
> 
> 
>   
>   
> 
>value="124928"/>
> 
> 
>   
>   
> 
>value="124928"/>
> 
> 
>   
>   
> 
>value="124928"/>
> 
> 
>   
>   
> 
>value="124928"/>
> 
> 
>   
>   
> 
>value="124928"/>
> 
>   
> 
> 
> Why does this happen?  I've got no node names in corosync.conf, but
> Pacemaker defaults to uname -n all right.
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] HA meetup at OpenStack Summit

2016-04-12 Thread Ken Gaillot

Hi everybody,

The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some
regular ClusterLabs contributors are going, so I was wondering if anyone
would like to do an informal meetup sometime during the summit.

It looks like the best time would be that Wednesday, either lunch (at
the venue) or dinner (offsite). It might also be possible to reserve a
small (10-person) meeting room, or just meet informally in the expo hall.

Anyone interested? Preferences/conflicts?
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-13 Thread Ken Gaillot

On 04/12/2016 06:39 PM, Digimer wrote:
> On 12/04/16 07:09 PM, Ken Gaillot wrote:
>> Hi everybody,
>>
>> The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some
>> regular ClusterLabs contributors are going, so I was wondering if anyone
>> would like to do an informal meetup sometime during the summit.
>>
>> It looks like the best time would be that Wednesday, either lunch (at
>> the venue) or dinner (offsite). It might also be possible to reserve a
>> small (10-person) meeting room, or just meet informally in the expo hall.
>>
>> Anyone interested? Preferences/conflicts?
> 
> Informal meet-up, or to try and get work done?

Informal, though of course HA will be the likely topic of conversation :)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-13 Thread Ken Gaillot

On 04/13/2016 11:23 AM, Christopher Harvey wrote:
> I have a 3 node cluster (see the bottom of this email for 'pcs config'
> output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
> whenever a node joins or leaves the cluster. I trigger the leave and
> join with a pacemaker service start and stop on any node.

That's the default behavior of clones used in ordering constraints. If
you set interleave=true on your clones, each dependent clone instance
will only care about the depended-on instances on its own node, rather
than all nodes.

See
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_options

While the interleave=true behavior is much more commonly used,
interleave=false is the default because it's safer -- the cluster
doesn't know anything about the cloned service, so it can't assume the
service is OK with it. Since you know what your service does, you can
set interleave=true for services that can handle it.

> Here is the happy steady state setup:
> 
> 3 nodes and 4 resources configured
> 
> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> 
>  Clone Set: Router-clone [Router]
>  Started: [ vmr-132-3 vmr-132-4 ]
> MsgBB-Active(ocf::solace:MsgBB-Active): Started vmr-132-3
> AD-Active   (ocf::solace:AD-Active):Started vmr-132-3
> 
> [root@vmr-132-4 ~]# supervisorctl stop pacemaker
> no change, except vmr-132-4 goes offline
> [root@vmr-132-4 ~]# supervisorctl start pacemaker
> vmr-132-4 comes back online
> MsgBB-Active and AD-Active flap very quickly (<1s)
> Steady state is resumed.
> 
> Why should the fact that vmr-132-4 coming and going affect the service
> on any other node?
> 
> Thanks,
> Chris
> 
> Cluster Name:
> Corosync Nodes:
>  192.168.132.5 192.168.132.4 192.168.132.3
> Pacemaker Nodes:
>  vmr-132-3 vmr-132-4 vmr-132-5
> 
> Resources:
>  Clone: Router-clone
>   Meta Attrs: clone-max=2 clone-node-max=1
>   Resource: Router (class=ocf provider=solace type=Router)
>Meta Attrs: migration-threshold=1 failure-timeout=1s
>Operations: start interval=0s timeout=2 (Router-start-timeout-2)
>stop interval=0s timeout=2 (Router-stop-timeout-2)
>monitor interval=1s (Router-monitor-interval-1s)
>  Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
>   Meta Attrs: migration-threshold=2 failure-timeout=1s
>   Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
>   stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
>   monitor interval=1s (MsgBB-Active-monitor-interval-1s)
>  Resource: AD-Active (class=ocf provider=solace type=AD-Active)
>   Meta Attrs: migration-threshold=2 failure-timeout=1s
>   Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
>   stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
>   monitor interval=1s (AD-Active-monitor-interval-1s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
>   Resource: AD-Active
> Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
>   Resource: MsgBB-Active
> Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
> Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
> Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
>   Resource: Router-clone
> Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
> Ordering Constraints:
>   Resource Sets:
> set Router-clone MsgBB-Active sequential=true
> (id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_Router-clone_MsgBB-Active)
> set MsgBB-Active AD-Active sequential=true
> (id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_MsgBB-Active_AD-Active)
> Colocation Constraints:
>   MsgBB-Active with Router-clone (score:INFINITY)
>   (id:colocation-MsgBB-Active-Router-clone-INFINITY)
>   AD-Active with MsgBB-Active (score:1000)
>   (id:colocation-AD-Active-MsgBB-Active-1000)
> 
> Resources Defaults:
>  No defaults set
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-recheck-interval: 1s
>  dc-version: 1.1.13-10.el7_2.2-44eb2dd
>  have-watchdog: false
>  maintenance-mode: false
>  start-failure-is-fatal: false
>  stonith-enabled: false


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-14 Thread Ken Gaillot

On 04/14/2016 09:33 AM, Christopher Harvey wrote:
> MsgBB-Active is a dummy resource that simply returns OCF_SUCCESS on
> every operation and logs to a file.

That's a common mistake, and will confuse the cluster. The cluster
checks the status of resources both where they're supposed to be running
and where they're not. If status always returns success, the cluster
won't try to start it where it should,, and will continuously try to
stop it elsewhere, because it thinks it's already running everywhere.

It's essential that an RA distinguish between running
(OCF_SUCCESS/OCF_RUNNING_MASTER), cleanly not running (OCF_NOT_RUNNING),
and unknown/failed (OCF_ERR_*/OCF_FAILED_MASTER).

See pacemaker's Dummy agent as an example/template:

https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/Dummy

It touches a temporary file to know whether it is "running" or not.

ocf-shellfuncs has a ha_pseudo_resource() function that does the same
thing. See the ocf:heartbeat:Delay agent for example usage.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Moving Related Servers

2016-04-18 Thread Ken Gaillot

On 04/18/2016 02:34 AM, ‪H Yavari‬ ‪ wrote:
> Hi,
> 
> I have 4 CentOS servers (App1,App2.App3 and App4). I created a cluster
> for App1 and App2 with a IP float and it works well.
> In our infrastructure App1 works only with App3 and App2 only works with
> App4. I mean we have 2 server sets (App1 and App3) , (App2 and App4).
> So I want when server app1 is down and app2 will Online node, App3 will
> offline too and App4 will Online and vice versa, I mean when App3 is
> down and App4 will Online, App1 will offline too.
> 
> 
> How can I do with pacemaker ? we have our self service on servers so how
> can I user Pacemaker for monitoring these services?
> 
> Thanks for reply.
> 
> Regards.
> H.Yavari

I'm not sure I understand your requirements.

There's no way to tell one node to leave the cluster when another node
is down, and it would be a bad idea if you could: the nodes could never
start up, because each would wait to see the other before starting; and
in your cluster, two nodes shutting down would make the cluster lose
quorum, so the other nodes would refuse to run any resources.

However, it is usually possible to use constraints to enforce any
desired behavior. So even those the node might not leave the cluster,
you could make the cluster not place any resources on that node.

Can you give more information about your resources and what nodes they
are allowed to run on? What makes App1 and App3 dependent on each other?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Moving Related Servers

2016-04-19 Thread Ken Gaillot

On 04/18/2016 10:05 PM, ‪H Yavari‬ ‪ wrote:
> Hi,
> 
> This is servers maps:
> 
> App 3-> App 1(Active)
> 
> App 4 -> App 2   (Standby)
> 
> 
> Now App1 and App2 are in a cluster with IP failover.
> 
> I need when IP failover will run and App2 will be Active node, service
> "X" on server App3 will be stop and App 4 will be Active node.
> In the other words, App1 works only with App3 and App 2 works with App 4.
> 
> I have a web application on App1 and some services on App 3 (this is
> same for App2 and App 4)

This is a difficult situation to model. In particular, you could only
have a dependency one way -- so if we could get App 3 to fail over if
App 1 fails, we couldn't model the other direction (App 1 failing over
if App 3 fails). If each is dependent on the other, there's no way to
start one first.

Is there a technical reason App 3 can work only with App 1?

Is it possible for service "X" to stay running on both App 3 and App 4
all the time? If so, this becomes easier.

> 
> Sorry for heavy description.
> 
> 
> 
> *From:* Ken Gaillot 
> *To:* users@clusterlabs.org
> **
> On 04/18/2016 02:34 AM, ‪H Yavari‬ ‪ wrote:
> 
>> Hi,
>>
>> I have 4 CentOS servers (App1,App2.App3 and App4). I created a cluster
>> for App1 and App2 with a IP float and it works well.
>> In our infrastructure App1 works only with App3 and App2 only works with
>> App4. I mean we have 2 server sets (App1 and App3) , (App2 and App4).
>> So I want when server app1 is down and app2 will Online node, App3 will
>> offline too and App4 will Online and vice versa, I mean when App3 is
>> down and App4 will Online, App1 will offline too.
>>
>>
>> How can I do with pacemaker ? we have our self service on servers so how
>> can I user Pacemaker for monitoring these services?
>>
>> Thanks for reply.
>>
>> Regards.
>> H.Yavari
> 
> 
> I'm not sure I understand your requirements.
> 
> There's no way to tell one node to leave the cluster when another node
> is down, and it would be a bad idea if you could: the nodes could never
> start up, because each would wait to see the other before starting; and
> in your cluster, two nodes shutting down would make the cluster lose
> quorum, so the other nodes would refuse to run any resources.
> 
> However, it is usually possible to use constraints to enforce any
> desired behavior. So even those the node might not leave the cluster,
> you could make the cluster not place any resources on that node.
> 
> Can you give more information about your resources and what nodes they
> are allowed to run on? What makes App1 and App3 dependent on each other?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Moving Related Servers

2016-04-20 Thread Ken Gaillot

On 04/20/2016 12:44 AM, ‪H Yavari‬ ‪ wrote:
> You got my situation right. But I couldn't find any method to do this?
> 
> I should create one cluster with 4 node or 2 cluster with 2 node ? How I
> restrict the cluster nodes to each other!!?

Your last questions made me think of multi-site clustering using booth.
I think this might be the best solution for you.

You can configure two independent pacemaker clusters of 2 nodes each,
then use booth to ensure that one cluster has the resources at any time.
See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776

This is usually done with clusters at physically separate locations, but
there's no problem with using it with two clusters in one location.

Alternatively, going along more traditional lines such as what Klaus and
I have mentioned, you could use rules and node attributes to keep the
resources where desired. You could write a custom resource agent that
would set a custom node attribute for the matching node (the start
action should set the attribute to 1, and the stop action should set the
attribute to 0; if the resource was on App 1, you'd set the attribute
for App 3, and if the resource was on App 4, you'd set the attribute for
App 4). Colocate that resource with your floating IP, and use a rule to
locate service X where the custom node attribute is 1. See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136

> 
> 
> *From:* Klaus Wenninger 
> *To:* users@clusterlabs.org
> *Sent:* Wednesday, 20 April 2016, 9:56:05
> *Subject:* Re: [ClusterLabs] Moving Related Servers
> 
> On 04/19/2016 04:32 PM, Ken Gaillot wrote:
>> On 04/18/2016 10:05 PM, ‪H Yavari‬ ‪ wrote:
>>> Hi,
>>>
>>> This is servers maps:
>>>
>>> App 3-> App 1(Active)
>>>
>>> App 4 -> App 2  (Standby)
>>>
>>>
>>> Now App1 and App2 are in a cluster with IP failover.
>>>
>>> I need when IP failover will run and App2 will be Active node, service
>>> "X" on server App3 will be stop and App 4 will be Active node.
>>> In the other words, App1 works only with App3 and App 2 works with App 4.
>>>
>>> I have a web application on App1 and some services on App 3 (this is
>>> same for App2 and App 4)
>> This is a difficult situation to model. In particular, you could only
>> have a dependency one way -- so if we could get App 3 to fail over if
>> App 1 fails, we couldn't model the other direction (App 1 failing over
>> if App 3 fails). If each is dependent on the other, there's no way to
>> start one first.
>>
>> Is there a technical reason App 3 can work only with App 1?
>>
>> Is it possible for service "X" to stay running on both App 3 and App 4
>> all the time? If so, this becomes easier.
> Just another try to understand what you are aiming for:
> 
> You have a 2-node-cluster at the moment consisting of the nodes
> App1 & App2.
> You configured something like a master/slave-group to realize
> an active/standby scenario.
> 
> To get the servers App3 & App4 into the game we would make
> them additional pacemaker-nodes (App3 & App4).
> You now have a service X that could be running either on App3 or
> App4 (which is easy by e.g. making it dependent on a node attribute)
> and it should be running on App3 when the service-group is active
> (master in pacemaker terms) on App1 and on App4 when the
> service-group is active on App2.
> 
> The standard thing would be to collocate a service with the master-role
> (see all the DRBD examples for instance).
> We would now need a locate-x when master is located-y rule instead
> of collocation.
> I don't know any way to directly specify this.
> One - ugly though - way around I could imagine would be:
> 
> - locate service X1 on App3
> - locate service X2 on App4
> - dummy service Y1 is located App1 and collocated with master-role
> - dummy service Y2 is located App2 and collocated with master-role
> - service X1 depends on Y1
> - service X2 depends on Y2
> 
> If that somehow reflects your situation the key question now would
> probably be if pengine would make the group on App2 master
> if service X1 fails on App3. I would guess yes but I'm not sure.
> 
> Regards,
> Klaus
> 
>>> Sorry for heavy de

Re: [ClusterLabs] pacemaker apache and umask on CentOS 7

2016-04-20 Thread Ken Gaillot

On 04/20/2016 09:11 AM, fatcha...@gmx.de wrote:
> Hi,
> 
> I´m running a 2-node apache webcluster on a fully patched CentOS 7 
> (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64).
> Some files which are generated by the apache are created with a umask 137 but 
> I need this files created with a umask of 117.
> To change this I first tried to add a umask 117 to /etc/sysconfig/httpd & 
> rebooted the system. This had no effekt.
> So I found out (after some research) that this is not working under CentOS 7 
> and that this had to be changed via systemd.
> So I created a directory "/etc/systemd/system/httpd.service.d" and put there 
> a "umask.conf"-File with this content: 
> [Service]
> UMask=0117
> 
> Again I rebooted the system but no effekt.
> Is the pacemaker really starting the apache over the systemd ? And how can I 
> solve the problem ?
> 
> Any suggestions are welcome
> 
> Kind regards
> 
> fatcharly

It depends on the resource agent you're using for apache.

If you were using systemd:httpd, I'd expect /etc/sysconfig/httpd or the
httpd.service.d override to work.

Since they don't, I'll guess you're using ocf:heartbeat:apache. In that
case, the file specified by the resource's envfiles parameter (which
defaults to /etc/apache2/envvars) is the right spot. So, you could
configure envfiles=/etc/sysconfig/httpd, or you could keep it default
and add your umask to /etc/apache2/envvars.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Q: Resource balancing opration

2016-04-20 Thread Ken Gaillot

On 04/20/2016 01:17 AM, Ulrich Windl wrote:
> Hi!
> 
> I'm wondering: If you boot a node on a cluster, most resources will go to 
> another node (if possible). Due to stickiness configured, those resources 
> will stay there.
> So I'm wondering whether or how I could cause a rebalance of resources on the 
> cluster. I must admit that I don't understand the details of stickiness 
> related to other parameters. In my understanding stickiness should be related 
> to a percentage of utilization dynamically, so that a resource running on a 
> node that is "almost full" should dynamically lower its stickiness to allow 
> resource migration.
> 
> So if you are going to implement a manual resource rebalance operation, could 
> you dynamically lower the stickiness for each resource (by some amount or 
> some factor), wait if something happens, and then repeat the process until 
> resources look balanced. "Looking balanced" should be no worse as if all 
> resources are started when all cluster nodes are up.
> 
> Spontaneous pros and cons for "resource rebalancing"?
> 
> Regards,
> Ulrich

Pacemaker gives you a few levers to pull. Stickiness and utilization
attributes (with a placement strategy) are the main ones.

Normally, pacemaker *will* continually rebalance according to what nodes
are available. Stickiness tells the cluster not to do that.

Whether you should use stickiness (and how much) depends mainly on how
significant is the interruption that occurs when a service is moved. For
a large database supporting a high-traffic website, stopping and
starting can take a long time and cost a lot of business -- so maybe you
want an infinite stickiness in that case, and only rebalance manually
during a scheduled window. For a small VM that can live-migrate quickly
and doesn't affect any of your customer-facing services, maybe you don't
mind setting a small or zero stickiness.

You can also use rules to make the process intelligent. For example, for
a server that provides office services, you could set a rule that sets
infinite stickiness during business hours, and small or zero stickiness
otherwise. That way, you'd get no disruptions when people are actually
using the service during the day, and at night, it would automatically
rebalance.

Normally, pacemaker's idea of "balancing" is to simply distribute the
number of resources on each node as equally as possible. Utilization
attributes and placement strategies let you add more intelligence. For
example, you can define the number of cores per node or the amount of
RAM per node, along with how much each resource is expected to use, and
let pacemaker balance by that instead of just counting the number of
resources.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker apache and umask on CentOS 7

2016-04-20 Thread Ken Gaillot

On 04/20/2016 12:20 PM, Klaus Wenninger wrote:
> On 04/20/2016 05:35 PM, fatcha...@gmx.de wrote:
>>
>>> Gesendet: Mittwoch, 20. April 2016 um 16:31 Uhr
>>> Von: "Klaus Wenninger" 
>>> An: users@clusterlabs.org
>>> Betreff: Re: [ClusterLabs] pacemaker apache and umask on CentOS 7
>>>
>>> On 04/20/2016 04:11 PM, fatcha...@gmx.de wrote:
 Hi,

 I´m running a 2-node apache webcluster on a fully patched CentOS 7 
 (pacemaker-1.1.13-10.el7_2.2.x86_64 pcs-0.9.143-15.el7.x86_64).
 Some files which are generated by the apache are created with a umask 137 
 but I need this files created with a umask of 117.
 To change this I first tried to add a umask 117 to /etc/sysconfig/httpd & 
 rebooted the system. This had no effekt.
 So I found out (after some research) that this is not working under CentOS 
 7 and that this had to be changed via systemd.
 So I created a directory "/etc/systemd/system/httpd.service.d" and put 
 there a "umask.conf"-File with this content: 
 [Service]
 UMask=0117

 Again I rebooted the system but no effekt.
 Is the pacemaker really starting the apache over the systemd ? And how can 
 I solve the problem ?
>>> Didn't check with CentOS7 but on RHEL7 there is a
>>> /usr/lib/ocf/resource.d/heartbeat/apache.
>>> So it depends on how you defined the resource starting apache if systemd
>>> is used or if it being done by the ocf-ra.
>> MY configuration is:
>> Resource: apache (class=ocf provider=heartbeat type=apache)
>>   Attributes: configfile=/etc/httpd/conf/httpd.conf 
>> statusurl=http://127.0.0.1:8089/server-status
>>   Operations: start interval=0s timeout=40s (apache-start-timeout-40s)
>>   stop interval=0s timeout=60s (apache-stop-timeout-60s)
>>   monitor interval=1min (apache-monitor-interval-1min)
>>
>> So I quess it is ocf. But what will be the right way to do it ? I lack a bit 
>> of understandig about this /usr/lib/ocf/resource.d/heartbeat/apache file.  
>>
> There are the ocf-Resource-Agents (if there is none you can always
> create one for your service) which usually
> give you a little bit more control of the service from the cib. (You can
> set a couple of variables like in this example
> the pointer to the config-file)
> And of course you can always create resources referring the native
> services of your distro (systemd-units in
> this case).
>>
>>
>>
 Any suggestions are welcome

If you add envfiles="/etc/sysconfig/httpd" to your apache resource, it
should work.

 Kind regards

 fatcharly

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-20 Thread Ken Gaillot

Lunch on Wednesday it is!

Anyone planning to attend next week's OpenStack Summit in Austin is
cordially invited to an informal ClusterLabs meetup over lunch
(12:30pm-1:50pm by the summit schedule) Wednesday, April 27.

We'll meet at Expo Hall 5, the lunch room adjacent to the Marketplace
(vendor booths). I'll put a ClusterLabs sign on the table to help people
find it.

On 04/14/2016 09:53 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> Hi everybody,
>>
>> The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some
>> regular ClusterLabs contributors are going, so I was wondering if anyone
>> would like to do an informal meetup sometime during the summit.
>>
>> It looks like the best time would be that Wednesday, either lunch (at
>> the venue) or dinner (offsite). It might also be possible to reserve a
>> small (10-person) meeting room, or just meet informally in the expo hall.
>>
>> Anyone interested? Preferences/conflicts?
> 
> Yes, I'd be very interested!  I think lunch on Wednesday should work
> for me; dinner might too.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Q: Resource balancing opration

2016-04-21 Thread Ken Gaillot

On 04/21/2016 01:56 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 20.04.2016 um 16:44 in 
>>>> Nachricht
>> You can also use rules to make the process intelligent. For example, for
>> a server that provides office services, you could set a rule that sets
>> infinite stickiness during business hours, and small or zero stickiness
>> otherwise. That way, you'd get no disruptions when people are actually
>> using the service during the day, and at night, it would automatically
>> rebalance.
> 
> Could you give a concrete example for this?

Sure, looking at the example in:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_using_rules_to_control_cluster_options


Normally, the CIB's resource defaults section will have a single set of
meta-attributes:

  

 

 

  


If you have more than one, the cluster will use the one with the highest
score (in the example below, always the "core-hours" set with infinite
stickiness):

  

 

 

 

 

  

If you add a rule to a set, that set will only be considered when the
rule is true. So in this final result, we have infinite stickiness
during part of the day and no stickiness the rest of the time:

  

 

  

  


 

 

 

  


Higher-level tools may or may not provide a simpler interface; you may
have to dump, edit and push the XML.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-21 Thread Ken Gaillot

Hello everybody,

The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!

The most prominent feature will be Klaus Wenninger's new implementation
of event-driven alerts -- the ability to call scripts whenever
interesting events occur (nodes joining/leaving, resources
starting/stopping, etc.).

This is the improved successor to both the ClusterMon resource agent and
the experimental "notification-agent" feature that has been in the
upstream master branch.

The new feature was renamed to "alerts" to avoid confusion with the
unrelated "notify" resource action.

High-level tools such as crm and pcs should eventually provide an easy
way to configure this, but at the XML level, the cluster configuration
may now contain an alerts section:

   
  ...
  
 ...
  
   

The alerts section can have any number of alerts, which look like:

   

  

   

As always, id is simply a unique label for the entry. The path is an
arbitrary file path to an alert script. Existing external scripts used
with ClusterMon resources will work as alert scripts, because the
interface is compatible.

We intend to provide sample scripts in the extra/alerts source
directory. The existing pcmk_notify_sample.sh script has been moved
there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh.

Each alert may have any number of recipients configured. These values
will simply be passed to the script as arguments. The first recipient
will also be passed as the CRM_alert_recipient environment variable, for
compatibility with existing scripts that only support one recipient.
(All CRM_alert_* variables will also be passed as CRM_notify_* for
compatibility with existing ClusterMon scripts.)

An alert may also have instance attributes and meta-attributes, for example:

   

  
 
  

  


  

  

   

The meta-attributes are optional properties used by the cluster.
Currently, they include "timeout" (which defaults to 30s) and
"tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a
microsecond-resolution timestamp provided to the alert script as the
CRM_alert_timestamp environment variable).

The instance attributes are arbitrary values that will be passed as
environment variables to the alert script. This provides you a
convenient way to configure your scripts in the cluster, so you can
easily reuse them.

In the current implementation, meta-attributes and instance attributes
may also be specified within the  block, in which case they
override any values specified in the  block when sent to that
recipient. Whether this stays in the final 1.1.15 release or not depends
on whether people find this to be useful, or confusing.

Sometime during the 1.1.15 release cycle, the previous experimental
interface (the notification-agent and notification-recipient cluster
properties) will be disabled by default at compile-time. If you are
compiling the master branch from source and require that interface, you
can define RHEL7_COMPAT when building, to enable support.

This feature is already in the upstream master branch, and will be in
the forthcoming 1.1.15-rc1 release candidate. Everyone is encouraged to
try it out and give feedback.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-22 Thread Ken Gaillot

On 04/21/2016 06:09 PM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> Hello everybody,
>>
>> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
>>
>> The most prominent feature will be Klaus Wenninger's new implementation
>> of event-driven alerts -- the ability to call scripts whenever
>> interesting events occur (nodes joining/leaving, resources
>> starting/stopping, etc.).
> 
> Ooh, that sounds cool!  Can it call scripts after fencing has
> completed?  And how is it determined which node the script runs on,
> and can that be limited via constraints or similar?

Yes, it called after all "interesting" events (including fencing), and
the script can use the provided environment variables to determine what
type of event it was.

We don't notify before events, because at that moment we don't know
whether the event will really happen or not. We might try but fail.

> I'm wondering if it could replace the current fencing_topology hack we
> use to invoke fence_compute which starts the workflow for recovering
> VMs off dead OpenStack nova-compute nodes.

Yes, that is one of the reasons we did this!

The initial implementation only allowed for one script to be called (the
"notification-agent" property), but we quickly found out that someone
might need to email an administrator, notify nova-compute, and do other
types of handling as well. Making someone write one script that did
everything would be too complicated and error-prone (and unsupportable).
So we abandoned "notification-agent" and went with this new approach.

Coordinate with Andrew Beekhof for the nova-compute alert script, as he
already has some ideas for that.

> Although even if that's possible, maybe there are good reasons to stay
> with the fencing_topology approach?
> 
> Within the same OpenStack compute node HA scenario, it strikes me that
> this could be used to invoke "nova service-disable" when the
> nova-compute service crashes on a compute node and then fails to
> restart.  This would eliminate the window in between the crash and the
> nova server timing out the nova-compute service - during which it
> would otherwise be possible for nova-scheduler to attempt to schedule
> new VMs on the compute node with the crashed nova-compute service.
> 
> IIUC, this is one area where masakari is currently more sophisticated
> than the approach based on OCF RAs:
> 
> https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md#evacuation-patterns
> 
> Does that make sense?

Maybe. The script would need to be able to determine based on the
provided environment variables whether it's in that situation or not.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts

2016-04-22 Thread Ken Gaillot

On 04/22/2016 02:43 AM, Klaus Wenninger wrote:
> On 04/22/2016 08:16 AM, Ulrich Windl wrote:
>>>>> Ken Gaillot  schrieb am 21.04.2016 um 19:50 in 
>>>>> Nachricht
>> <571912f3.2060...@redhat.com>:
>>
>> [...]
>>> The alerts section can have any number of alerts, which look like:
>>>
>>>>>   path="/srv/pacemaker/pcmk_alert_sample.sh">
>>>
>>>   >>  value="/var/log/cluster-alerts.log" />
>>>
>>>
>> Are there any parameters supplied for the script? For the XML: I think 
>> "path" for the script to execute is somewhat generic: Why not call it "exec" 
>> or something like that? Likewise for "value": Isn't "logfile" a better name?
> exec has a certain appeal...
> but recipient can actually be anything like email-address, logfile, ... so
> keeping it general like value makes sense in my mind
>>
>>> As always, id is simply a unique label for the entry. The path is an
>>> arbitrary file path to an alert script. Existing external scripts used
>>> with ClusterMon resources will work as alert scripts, because the
>>> interface is compatible.
>>>
>>> We intend to provide sample scripts in the extra/alerts source
>>> directory. The existing pcmk_notify_sample.sh script has been moved
>>> there (as pcmk_alert_sample.sh), and so has pcmk_snmp_helper.sh.
>>>
>>> Each alert may have any number of recipients configured. These values
>> What I did not understand is how an "alert" is related to some cluster 
>> "event": By ID, or by some explict configuration?
> There are "node", "fencing" and "resource" (CRM_alert_kind tells you
> if you want to know inside a script) alerts and alerts was chosen
> as it is in sync with other frameworks like nagios, ... but you can choose
> it a synonym for event ... meaning it is not necessarily anything bad
> or good just something you might be interested in.
> 
> You get set a bunch of environment variables when your executable is
> called you can use to get more info and add intelligence if you like:
> 
> CRM_alert_node, CRM_alert_nodeid, CRM_alert_rsc, CRM_alert_task,
> CRM_alert_interval, CRM_alert_desc, CRM_alert_status,
> CRM_alert_target_rc, CRM_alert_rc, CRM_alert_kind,
> CRM_alert_version, CRM_alert_node_sequence
> CRM_alert_timestamp
> 
> Referencing is done via node-names, resource-ids as throughout
> the pacemaker-config in the cib.
> 
> 
>>
>>> will simply be passed to the script as arguments. The first recipient
>>> will also be passed as the CRM_alert_recipient environment variable, for
>>> compatibility with existing scripts that only support one recipient.
>>> (All CRM_alert_* variables will also be passed as CRM_notify_* for
>>> compatibility with existing ClusterMon scripts.)
>>>
>>> An alert may also have instance attributes and meta-attributes, for example:
>>>
>>>>>   path="/srv/pacemaker/pcmk_alert_sample.sh">
>>>
>>>   
>>>  
>>>   
>>>
>>>   
>>> 
>>> 
>>>   
>>>
>>>   >>  value="/var/log/cluster-alerts.log" />
>>>
>>>
>>>
>>> The meta-attributes are optional properties used by the cluster.
>>> Currently, they include "timeout" (which defaults to 30s) and
>>> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a
>>> microsecond-resolution timestamp provided to the alert script as the
>>> CRM_alert_timestamp environment variable).
>>>
>>> The instance attributes are arbitrary values that will be passed as
>>> environment variables to the alert script. This provides you a
>>> convenient way to configure your scripts in the cluster, so you can
>>> easily reuse them.
>> At the moment this sounds quite abstract, yet.
> meta-attributes and instance-attributes as used as with
> resources, where meta-attributes reflect config-parameters
> you pass rather to pacemaker like in this case for the timeout
> observation when the script is executed, and the format
> string that tells pacemaker in which style you would like
> CRM_alert_timestamp to be filled.
> By the way this timestamp is created immediately before all alerts
> are fired off in parallel so to be usable for analysis of what happened
> in which order in the cluster

Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts

2016-04-22 Thread Ken Gaillot

On 04/22/2016 05:19 AM, Klaus Wenninger wrote:
> On 04/22/2016 10:55 AM, Ferenc Wágner wrote:
>> Ken Gaillot  writes:
>>
>>> Each alert may have any number of recipients configured. These values
>>> will simply be passed to the script as arguments. The first recipient
>>> will also be passed as the CRM_alert_recipient environment variable,
>>> for compatibility with existing scripts that only support one
>>> recipient.
>>> [...]
>>> In the current implementation, meta-attributes and instance attributes
>>> may also be specified within the  block, in which case they
>>> override any values specified in the  block when sent to that
>>> recipient.
>> Sorry, I don't get this.  The first paragraph above tells me that for a
>> given cluster event each  is run once, with all recipients passed
>> as command line arguments to the alert executable.  But a single
>> invocation can only have a single set of environmental variables, so how
>> can you override instance attributes for individual recipients?
> The paragraph above is indeed confusing or at least it can
> be understood in a way that doesn't reflect how it is implemented.
> If the script would just be called once the parameter itself
> could already be a list.
> Anyway - as it is implemented at the moment - the script is called
> for each of the recipients in parallel.
> This has a couple of advantages as it simplifies the script
> implementation (if you have problems with concurrency use just
> one recipient and make it a list), the timeout can be observed
> on a per recipient basis and if delivering to one recipient fails
> it doesn't affect the others.
> And each of these calls inherits a global set of environment
> variables while each of them can be overwritten on a per
> recipient basis.

Sorry, that was my confusion -- an earlier design had all recipients
being passed to one call of the script, but we ended up doing one call
per recipient, as Klaus describes above.

I still have mixed feelings about that decision, since it's more
overhead spawning a process per recipient, but it's more compatible with
existing scripts that only handle one recipient, and it allows all
notifications to be fired off in parallel.

If anyone is concerned about the overhead, they can write their script
such that multiple recipients can be put in a single recipient field as
Klaus suggests above (for example, a comma-separated list of email
addresses).

>>> Whether this stays in the final 1.1.15 release or not depends on
>>> whether people find this to be useful, or confusing.
>> Now guess..:)
> Like this it finally even might lead to detection and avoidance of
> confusion ;-)

And to be clear, alerts will definitely be included in the final 1.1.15
-- the only question is whether to include per-recipient attributes.

One possibility is to support per-recipient attributes in the XML, but
not in the higher-level tools. That way, advanced users can configure it
if they want, without complicating the commands (and troubleshooting and
support) needed by most users.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-22 Thread Ken Gaillot

On 04/22/2016 08:57 AM, Klaus Wenninger wrote:
> On 04/22/2016 03:29 PM, John Gogu wrote:
>> Hello community,
>> I am facing following situation with a Pacemaker 2 nodes DB cluster 
>> (3 resources configured into the cluster - 1 MySQL DB resource, 1
>> Apache resource, 1 IP resource )
>> -at every 61 seconds an MySQL monitoring action is started and have a
>> 1200 sec timeout.
> You can increase the timeout for monitoring.
>>
>> In some situation due to high load on the machines, monitoring action
>> run into a timeout, and the cluster is performing a fail over even if
>> the DB is up and running. Do you have a hint how can  be prioritized
>> automatically monitoring actions?
>>
> Consider that monitoring - at least as part of the action - should check
> if what your service is actually providing
> is working according to some functional and nonfunctional constraints as
> to simulate the experience of the
> consumer of your services. So you probably don't want that to happen
> prioritized.
> So if you relaxed the timing requirements of your monitoring to
> something that would be acceptable in terms
> of the definition of the service you are providing and you are still
> running into troubles the service quality you
> are providing wouldn't be that spiffing either...

Also, you can provide multiple levels of monitoring:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multiple_monitor_operations

For example, you could provide a very simple check that just makes sure
MySQL is responding on its port, and run that frequently with a low
timeout. And your existing thorough monitor could be run less frequently
with a high timeout.

FYI there was a bug related to multiple monitors apparently introduced
in 1.1.10, such that a higher-level monitor failure might not trigger a
resource failure. It was recently fixed in the upstream master branch
(which will be in the soon-to-be-released 1.1.15-rc1).

>> Thank you and best regards,
>> John

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 1

2016-04-22 Thread Ken Gaillot

ClusterLabs is happy to announce the first release candidate for
Pacemaker version 1.1.15. Source code is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc1

The most significant enhancements in this release are:

* A new "alerts" section of the CIB allows you to configure scripts that
will be called after significant cluster events. (For details, see the
recent "Coming in 1.1.15: Event-driven alerts" thread on this mailing list.)

* A new pcmk_action_limit option for fence devices allows multiple fence
actions to be executed concurrently. It defaults to 1 to preserve
existing behavior (i.e. serial execution of fence actions).

* Pacemaker Remote support has been improved. Most noticeably, if
pacemaker_remote is stopped without disabling the remote resource first,
any resources will be moved off the node (previously, the node would get
fenced). This allows easier software updates on remote nodes, since
updates often involve restarting the daemon.

* You may notice some files have moved from the pacemaker package to
pacemaker-cli, including most ocf:pacemaker resource agents, the
logrotate configuration, the XML schemas and the SNMP MIB. This allows
Pacemaker Remote nodes to work better when the full pacemaker package is
not installed.

* Have you ever wondered why a resource is not starting when you think
it should? crm_mon will now show why a resource is stopped, for example,
because it is unmanaged, or disabled in the configuration.

* Three significant regressions have been fixed. Compressed CIBs larger
than 1MB are again supported (a regression since 1.1.14), fenced unseen
nodes properly are not marked as unclean (also a regression since
1.1.14), and failures of multiple-level monitor checks should again
cause the resource to fail (a regression since 1.1.10).

As usual, the release includes many bugfixes and minor enhancements. For
a more detailed list of changes, see the change log:

https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog

Everyone is encouraged to download, compile and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Andrew Beekhof, Bin Liu, David Shane Holden, Ferenc Wágner,
Gao Yan, Hideo Yamauchi, Jan Pokorný, Ken Gaillot, Klaus Wenninger,
Kristoffer Grönlund, Lars Ellenberg, Michal Koutný, Nakahira Kazutomo,
Ruben Kerkhof, and Yusuke Iida. Apologies if I have overlooked anyone.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-24 Thread Ken Gaillot

On 04/22/2016 01:13 PM, Dimitri Maziuk wrote:
> On 04/22/2016 12:58 PM, Ken Gaillot wrote:
> 
>>> Consider that monitoring - at least as part of the action -
>>> should check if what your service is actually providing is
>>> working according to some functional and nonfunctional
>>> constraints as to simulate the experience of the consumer of
>>> your services.
> 
> Goedel and Turing say the only one who can answer that is the
> actual consumer. So a simple check for what you *can* check would
> be very nice indeed.
> 
>> Also, you can provide multiple levels of monitoring:
>> 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multiple_monitor_operations
>>
>>
>> 
>> For example, you could provide a very simple check that just makes sure
>> MySQL is responding on its port, and run that frequently with a
>> low timeout. And your existing thorough monitor could be run less
>> frequently with a high timeout.
> 
> Looking at this, it seems you have to actually rewrite the RA to
> switch on $OCF_CHECK_LEVEL -- unless the stock RA already provides
> the "simple check" you need, is that correct?
> 
> E.g. this page:
> http://linux-ha.org/doc/man-pages/re-ra-apache.html suggests that
> apache RA does not and all you can do in practice is run the same
> curl http:/localhost/server-status check with different 
> frequencies. Would that be what we actually have ATM?

Correct, you would need to customize the RA. Given how long you said a
check can take, I assumed you already had a custom check that did
something more detailed than the stock mysql RA.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Ken Gaillot

On 04/25/2016 10:23 AM, Dmitri Maziuk wrote:
> On 2016-04-24 16:20, Ken Gaillot wrote:
> 
>> Correct, you would need to customize the RA.
> 
> Well, you wouldn't because your custom RA will be overwritten by the
> next RPM update.

Correct again :)

I should have mentioned that the convention is to copy the script to a
different name before editing it. The recommended approach is to create
a new provider for your organization. For example, copy the RA to a new
directory /usr/lib/ocf/resource.d/local, so it would be used in
pacemaker as ocf:local:mysql. You can use anything in place of "local".

> Dimitri
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] operation parallelism

2016-04-25 Thread Ken Gaillot

On 04/22/2016 09:05 AM, Ferenc Wágner wrote:
> Hi,
> 
> Are recurring monitor operations constrained by the batch-limit cluster
> option?  I ask because I'd like to limit the number of parallel start
> and stop operations (because they are resource hungry and potentially
> take long) without starving other operations, especially monitors.

No, they are not. The batch-limit only affects actions initiated by the
DC. The DC will initiate the first run of a monitor, so that will be
affected, but the local resource manager (lrmd) on the target node will
remember the monitor and run on it on schedule without further prompting
by the DC.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker

2016-04-27 Thread Ken Gaillot

On 04/27/2016 11:25 AM, emmanuel segura wrote:
> you need to use pcs to do everything, pcs cluster setup and pcs
> cluster start, try to use the redhat docs for more information.

Agreed -- pcs cluster setup will create a proper corosync.conf for you.
Your corosync.conf below uses corosync 1 syntax, and there were
significant changes in corosync 2. In particular, you don't need the
file created in step 4, because pacemaker is no longer launched via a
corosync plugin.

> 2016-04-27 17:28 GMT+02:00 Sriram :
>> Dear All,
>>
>> I m trying to use pacemaker and corosync for the clustering requirement that
>> came up recently.
>> We have cross compiled corosync, pacemaker and pcs(python) for ppc
>> environment (Target board where pacemaker and corosync are supposed to run)
>> I m having trouble bringing up pacemaker in that environment, though I could
>> successfully bring up corosync.
>> Any help is welcome.
>>
>> I m using these versions of pacemaker and corosync
>> [root@node_cu pacemaker]# corosync -v
>> Corosync Cluster Engine, version '2.3.5'
>> Copyright (c) 2006-2009 Red Hat, Inc.
>> [root@node_cu pacemaker]# pacemakerd -$
>> Pacemaker 1.1.14
>> Written by Andrew Beekhof
>>
>> For running corosync, I did the following.
>> 1. Created the following directories,
>> /var/lib/pacemaker
>> /var/lib/corosync
>> /var/lib/pacemaker
>> /var/lib/pacemaker/cores
>> /var/lib/pacemaker/pengine
>> /var/lib/pacemaker/blackbox
>> /var/lib/pacemaker/cib
>>
>>
>> 2. Created a file called corosync.conf under /etc/corosync folder with the
>> following contents
>>
>> totem {
>>
>> version: 2
>> token:  5000
>> token_retransmits_before_loss_const: 20
>> join:   1000
>> consensus:  7500
>> vsftype:none
>> max_messages:   20
>> secauth:off
>> cluster_name:   mycluster
>> transport:  udpu
>> threads:0
>> clear_node_high_bit: yes
>>
>> interface {
>> ringnumber: 0
>> # The following three values need to be set based on your
>> environment
>> bindnetaddr: 10.x.x.x
>> mcastaddr: 226.94.1.1
>> mcastport: 5405
>> }
>>  }
>>
>>  logging {
>> fileline: off
>> to_syslog: yes
>> to_stderr: no
>> to_syslog: yes
>> logfile: /var/log/corosync.log
>> syslog_facility: daemon
>> debug: on
>> timestamp: on
>>  }
>>
>>  amf {
>> mode: disabled
>>  }
>>
>>  quorum {
>> provider: corosync_votequorum
>>  }
>>
>> nodelist {
>>   node {
>> ring0_addr: node_cu
>> nodeid: 1
>>}
>> }
>>
>> 3.  Created authkey under /etc/corosync
>>
>> 4.  Created a file called pcmk under /etc/corosync/service.d and contents as
>> below,
>>   cat pcmk
>>   service {
>>  # Load the Pacemaker Cluster Resource Manager
>>  name: pacemaker
>>  ver:  1
>>   }
>>
>> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>>
>> 6. ./corosync -f -p & --> this step started corosync
>>
>> [root@node_cu pacemaker]# netstat -alpn | grep -i coros
>> udp0  0 10.X.X.X:61841 0.0.0.0:*
>> 9133/corosync
>> udp0  0 10.X.X.X:5405  0.0.0.0:*
>> 9133/corosync
>> unix  2  [ ACC ] STREAM LISTENING 14 9133/corosync
>> @quorum
>> unix  2  [ ACC ] STREAM LISTENING 148884 9133/corosync
>> @cmap
>> unix  2  [ ACC ] STREAM LISTENING 148887 9133/corosync
>> @votequorum
>> unix  2  [ ACC ] STREAM LISTENING 148885 9133/corosync
>> @cfg
>> unix  2  [ ACC ] STREAM LISTENING 148886 9133/corosync
>> @cpg
>> unix  2  [ ] DGRAM148840 9133/corosync
>>
>> 7. ./pacemakerd -f & gives the following error and exits.
>> [root@node_cu pacemaker]# pacemakerd -f
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 2s
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 3s
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>> Could not connect to Cluster Configuration Database API, error 6
>>
>> Can you please point me, what is missing in these steps ?
>>
>> Before trying these steps, I tried running "pcs cluster start", but that
>> command fails with "service" script not found. As the root filesystem
>> doesn't contain either /etc/init.d/ or /sbin/service
>>
>> So, the plan is to bring up corosync and pacemaker manually, later do the
>> cluster configuration using "pcs" commands.
>>
>> Regards,
>> Sriram
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> G

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-29 Thread Ken Gaillot

On 04/28/2016 10:24 AM, Lars Marowsky-Bree wrote:
> On 2016-04-27T12:10:10, Klaus Wenninger  wrote:
> 
>>> Having things in ARGV[] is always risky due to them being exposed more
>>> easily via ps. Environment variables or stdin appear better.
>> What made you assume the recipient is being passed as argument?
>>
>> The environment variable CRM_alert_recipient is being used to pass it.
> 
> Ah, excellent! But what made me think that this would be passed as
> arguments is that your announcement said: "Each alert may have any
> number of recipients configured. These values will simply be passed to
> the script as *arguments*." ;-)

Yes, that was my mistake in the original announcement. :-/

An early design had the script called once with all recipients as
arguments, but in the implementation we went with the script being
called once per recipient (and using only the environment variable).


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] libqb 0.17.1 - segfault at 1b8

2016-05-02 Thread Ken Gaillot

On 05/02/2016 03:45 PM, Jan Pokorný wrote:
> Hello Radoslaw,
> 
> On 02/05/16 11:47 -0500, Radoslaw Garbacz wrote:
>> When testing pacemaker I encountered a start error, which seems to be
>> related to reported libqb segmentation fault.
>> - cluster started and acquired quorum
>> - some nodes failed to connect to CIB, and lost membership as a result
>> - restart solved the problem
>>
>> Segmentation fault reports libqb library in version 0.17.1, a standard
>> package provided for CentOS.6.
> 
> Chances are that you are running into this nasty bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1114852
> 
>> Please let me know if the problem is known, and if  there is a remedy (e.g.
>> using the latest libqb).
> 
> Try libqb >= 0.17.2.
> 
> [...]
> 
>> Logs from /var/log/messages:
>>
>> Apr 22 15:46:41 (...) pacemakerd[90]:   notice: Additional logging
>> available in /var/log/pacemaker.log
>> Apr 22 15:46:41 (...) pacemakerd[90]:   notice: Configured corosync to
>> accept connections from group 498: Library error (2)
> 
> IIRC, that last line ^ was one of the symptoms.

Yes, that does look like the culprit. The root cause is libqb being
unable to handle 6-digit PIDs, which we can see in the above logs --
"[90]".

As a workaround, you can lower /proc/sys/kernel/pid_max (aka
kernel.pid_max sysctl variable), if you don't want to install a newer
libqb before CentOS 6.8 is released, which will have the fix.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Running several instances of a Corosync/Pacemaker cluster on a node

2016-05-02 Thread Ken Gaillot

On 04/26/2016 03:33 AM, Bogdan Dobrelya wrote:
> Is it possible to run several instances of a Corosync/Pacemaker clusters
> on a node? Can a node be a member of several clusters, so they could put
> resources there? I'm sure it's doable with separate nodes or containers,
> but that's not the case.
> 
> My case is to separate data-critical resources, like storage or VIPs,
> from the complex resources like DB or MQ clusters.
> 
> The latter should run with no-quorum-policy=ignore as they know how to
> deal with network partitions/split-brain, use own techniques to protect
> data and don't want external fencing from a Pacemaker, which
> no-quorum-policy/STONITH is.
> 
> The former must use STONITH (or a stop policy, if it's only a VIP), as
> they don't know how to deal with split-brain, for example.

I don't think it's possible, though I could be wrong, if separate
IPs/ports, chroots and node names are used (just shy of a container ...).

However I suspect it would not meet your goal in any case. DB and MQ
software generally do NOT have sufficient techniques to deal with a
split-brain situation -- either you lose high availability or you
corrupt data. Using no-quorum-policy=stop is fine for handling network
splits, but it does not help if a node becomes unresponsive.

Also note that pacemaker does provide the ability to treat different
resources differently with respect to quorum and fencing, without
needing to run separate clusters. See the "required" meta-attribute:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes

I suspect your motive for this is to be able to run a cluster without
fencing. There are certain failure scenarios that simply are not
recoverable without fencing, regardless of what the application software
can do. There is really only one case in which doing without fencing is
reasonable: when you're willing to lose your data and/or have downtime
when a situation arises that requires fencing.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: kvm live migration, resource moving

2016-05-02 Thread Ken Gaillot

On 04/27/2016 09:08 AM, Klechomir wrote:
> Hi List,
> I have two major problems related to the VirtualDomain live migration
> and failover in general.
> I'm using Pacemaker 1.1.8, which is very stable and used to do
> everything right to me.

I completely understand staying on an older version that works in your
use case -- but be aware that the people who can help you have moved on
to newer versions and won't be familiar with the differences.

> 1. Live migration during failover (node standby or shutdown) is ok, but
> during failback, the VMs (which prefer the returning node) are trying to
> migrate too early and ignore the order/colocation constraints.

What are the scores on the various constraints and stickiness? Pacemaker
will sum up all the scores relevant to each node and pick the node with
the highest score.

This might be the issue fixed in commit 96d6ecf, which was part of 1.1.12.

> 2. To be able to handle the internal VM shutdown, I have a libvirt hook,
> which simply does:
> [ "${2}" = release ] && crm resource stop ${1}
> The problem is that this hook always stops all the migrating/moving VMs
> in case of failover.

I'm not sure what your goal is. Do you want the cluster to ignore VMs
that shut down outside cluster control?

> Any suggestions are welcome.
> 
> Thanks in advance,
> Klecho

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-05-02 Thread Ken Gaillot

On 04/22/2016 05:55 PM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 04/21/2016 06:09 PM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
>>>> Hello everybody,
>>>>
>>>> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
>>>>
>>>> The most prominent feature will be Klaus Wenninger's new implementation
>>>> of event-driven alerts -- the ability to call scripts whenever
>>>> interesting events occur (nodes joining/leaving, resources
>>>> starting/stopping, etc.).
>>>
>>> Ooh, that sounds cool!  Can it call scripts after fencing has
>>> completed?  And how is it determined which node the script runs on,
>>> and can that be limited via constraints or similar?
>>
>> Yes, it called after all "interesting" events (including fencing), and
>> the script can use the provided environment variables to determine what
>> type of event it was.
> 
> Great.  Does the script run on the DC, or is that configurable somehow?

The script runs on all cluster nodes, to give maximum flexibility and
resiliency (during partitions etc.). Scripts must handle ordering and
de-duplication themselves, if needed.

A script that isn't too concerned about partitions might simply check
whether the local node is the DC, and only take action if so, to avoid
duplicates.

We're definitely interested in hearing how people approach these issues.
The possibilities for what an alert script might do are wide open, and
we want to be as flexible as possible at this stage. If the community
settles on certain approaches or finds certain gaps, we can enhance the
support in those areas as needed.

>> We don't notify before events, because at that moment we don't know
>> whether the event will really happen or not. We might try but fail.
> 
> You lost me here ;-)

We only call alert scripts after an event occurs, because we can't
predict the future. :-) For example, we don't know whether a node is
about to join or leave the cluster. Or for fencing, we might try to
fence but be unsuccessful -- and the part of pacemaker that calls the
alert scripts won't even know about fencing initiated outside cluster
control, such as by DLM or a human running stonith_admin.

>>> I'm wondering if it could replace the current fencing_topology hack we
>>> use to invoke fence_compute which starts the workflow for recovering
>>> VMs off dead OpenStack nova-compute nodes.
>>
>> Yes, that is one of the reasons we did this!
> 
> Haha, at this point can I say great minds think alike? ;-)
> 
>> The initial implementation only allowed for one script to be called (the
>> "notification-agent" property), but we quickly found out that someone
>> might need to email an administrator, notify nova-compute, and do other
>> types of handling as well. Making someone write one script that did
>> everything would be too complicated and error-prone (and unsupportable).
>> So we abandoned "notification-agent" and went with this new approach.
>>
>> Coordinate with Andrew Beekhof for the nova-compute alert script, as he
>> already has some ideas for that.
> 
> OK.  I'm sure we'll be able to talk about this more next week in Austin!
> 
>>> Although even if that's possible, maybe there are good reasons to stay
>>> with the fencing_topology approach?
>>>
>>> Within the same OpenStack compute node HA scenario, it strikes me that
>>> this could be used to invoke "nova service-disable" when the
>>> nova-compute service crashes on a compute node and then fails to
>>> restart.  This would eliminate the window in between the crash and the
>>> nova server timing out the nova-compute service - during which it
>>> would otherwise be possible for nova-scheduler to attempt to schedule
>>> new VMs on the compute node with the crashed nova-compute service.
>>>
>>> IIUC, this is one area where masakari is currently more sophisticated
>>> than the approach based on OCF RAs:
>>>
>>> https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md#evacuation-patterns
>>>
>>> Does that make sense?
>>
>> Maybe. The script would need to be able to determine based on the
>> provided environment variables whether it's in that situation or not.
> 
> Yep.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-05-02 Thread Ken Gaillot

On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote:
> Hi,
> 
> Just found an issue with node is silently unfenced.
> 
> That is quite large setup (2 cluster nodes and 8 remote ones) with
> a plenty of slowly starting resources (lustre filesystem).
> 
> Fencing was initiated due to resource stop failure.
> lustre often starts very slowly due to internal recovery, and some such
> resources were starting in that transition where another resource failed to 
> stop.
> And, as transition did not finish in time specified by the
> "failure-timeout" (set to 9 min), and was not aborted, that stop failure was 
> successfully cleaned.
> There were transition aborts due to attribute changes, after that stop 
> failure happened, but fencing
> was not initiated for some reason.

Unfortunately, that makes sense with the current code. Failure timeout
changes the node attribute, which aborts the transition, which causes a
recalculation based on the new state, and the fencing is no longer
needed. I'll make a note to investigate a fix, but feel free to file a
bug report at bugs.clusterlabs.org for tracking purposes.

> Node where stop failed was a DC.
> pacemaker is 1.1.14-5a6cdd1 (from fedora, built on EL7)
> 
> Here is log excerpt illustrating the above:
> Apr 19 14:57:56 mds1 pengine[3452]:   notice: Movemdt0-es03a-vg
> (Started mds1 -> mds0)
> Apr 19 14:58:06 mds1 pengine[3452]:   notice: Movemdt0-es03a-vg
> (Started mds1 -> mds0)
> Apr 19 14:58:10 mds1 crmd[3453]:   notice: Initiating action 81: monitor 
> mdt0-es03a-vg_monitor_0 on mds0
> Apr 19 14:58:11 mds1 crmd[3453]:   notice: Initiating action 2993: stop 
> mdt0-es03a-vg_stop_0 on mds1 (local)
> Apr 19 14:58:11 mds1 LVM(mdt0-es03a-vg)[6228]: INFO: Deactivating volume 
> group vg_mdt0_es03a
> Apr 19 14:58:12 mds1 LVM(mdt0-es03a-vg)[6541]: ERROR: Logical volume 
> vg_mdt0_es03a/mdt0 contains a filesystem in use. Can't deactivate volume 
> group "vg_mdt0_es03a" with 1 open logical volume(s)
> [...]
> Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9939]: ERROR: LVM: vg_mdt0_es03a did 
> not stop correctly
> Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9943]: WARNING: vg_mdt0_es03a still 
> Active
> Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9947]: INFO: Retry deactivating 
> volume group vg_mdt0_es03a
> Apr 19 14:58:31 mds1 lrmd[3450]:   notice: mdt0-es03a-vg_stop_0:5865:stderr [ 
> ocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctly ]
> [...]
> Apr 19 14:58:31 mds1 lrmd[3450]:   notice: mdt0-es03a-vg_stop_0:5865:stderr [ 
> ocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctly ]
> Apr 19 14:58:31 mds1 crmd[3453]:   notice: Operation mdt0-es03a-vg_stop_0: 
> unknown error (node=mds1, call=324, rc=1, cib-update=1695, confirmed=true)
> Apr 19 14:58:31 mds1 crmd[3453]:   notice: mds1-mdt0-es03a-vg_stop_0:324 [ 
> ocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
> correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctl
> Apr 19 14:58:31 mds1 crmd[3453]:  warning: Action 2993 (mdt0-es03a-vg_stop_0) 
> on mds1 failed (target: 0 vs. rc: 1): Error
> Apr 19 14:58:31 mds1 crmd[3453]:  warning: Action 2993 (mdt0-es03a-vg_stop_0) 
> on mds1 failed (target: 0 vs. rc: 1): Error
> Apr 19 15:02:03 mds1 pengine[3452]:  warning: Processing failed op stop for 
> mdt0-es03a-vg on mds1: unknown error (1)
> Apr 19 15:02:03 mds1 pengine[3452]:  warning: Processing failed op stop for 
> mdt0-es03a-vg on mds1: unknown error (1)
> Apr 19 15:02:03 mds1 pengine[3452]:  warning: Node mds1 will be fenced 
> because of resource failure(s)
> Apr 19 15:02:03 mds1 pengine[3452]:  warning: Forcing mdt0-es03a-vg away from 
> mds1 after 100 failures (max=100)
> Apr 19 15:02:03 mds1 pengine[3452]:  warning: Scheduling Node mds1 for STONITH
> Apr 19 15:02:03 mds1 pengine[3452]:   notice: Stop of failed resource 
> mdt0-es03a-vg is implicit after mds1 is fenced
> Apr 19 15:02:03 mds1 pengine[3452]:   notice: Recover mdt0-es03a-vg
> (Started mds1 -> mds0)
> [... many of these ]
> Apr 19 15:07:22 mds1 pengine[3452]:  warning: Processing failed op stop for 
> mdt0-es03a-vg on mds1: unknown error (1)
> Apr 19 15:07:22 mds1 pengine[3452]:  warning: Processing failed op stop for 
> mdt0-es03a-vg on mds1: unknown error (1)
> Apr 19 15:07:22 mds1 pengine[3452]:  warning: Node mds1 will be fenced 
> because of resource failure(s)
> Apr 19 15:07:22 mds1 pengine[3452]:  warning: Forcing mdt0-es03a-vg away from 
> mds1 after 100 failures (max=100)
> Apr 19 15:07:23 mds1 pengine[3452]:  warning: Scheduling Node mds1 for STONITH
> Apr 19 15:07:23 mds1 pengine[3452]:   notice: Stop of failed resource 
> md

Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-05-02 Thread Ken Gaillot

On 04/25/2016 07:28 AM, Lars Ellenberg wrote:
> On Thu, Apr 21, 2016 at 12:50:43PM -0500, Ken Gaillot wrote:
>> Hello everybody,
>>
>> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
>>
>> The most prominent feature will be Klaus Wenninger's new implementation
>> of event-driven alerts -- the ability to call scripts whenever
>> interesting events occur (nodes joining/leaving, resources
>> starting/stopping, etc.).
> 
> What exactly is "etc." here?
> What is the comprehensive list
> of which "events" will trigger "alerts"?

The exact list should be documented in Pacemaker Explained before the
final 1.1.15 release. I think it's comparable to what crm_mon -E does
currently. The basic categories are node events, fencing events, and
resource events.

> My guess would be
>  DC election/change
>which does not necessarily imply membership change
>  change in membership
>which includes change in quorum
>  fencing events
>(even failed fencing?)
>  resource start/stop/promote/demote
>   (probably) monitor failure?
>maybe only if some fail-count changes to/from infinity?
>or above a certain threshold?
> 
>  change of maintenance-mode?
>  node standby/online (maybe)?
>  maybe "resource cannot be run anywhere"?

It would certainly be possible to expand alerts to more situations if
there is a need. I think the existing ones will be sufficient for common
use cases though.

> would it be useful to pass in the "transaction ID"
> or other pointer to the recorded cib input at the time
> the "alert" was triggered?

Possibly, though it isn't currently. We do pass a node-local counter and
a subsecond-resolution timestamp, to help with ordering.

> can an alert "observer" (alert script) "register"
> for only a subset of the "alerts"?

Not explicitly, but the alert type is passed in as an environment
variable, so the script can simply exit for "uninteresting" event types.
That's not as efficient since the process must still be spawned, but it
simplifies things.

> if so, can this filter be per alert script,
> or per "recipient", or both?
> 
> Thanks,
> 
> Lars
> 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] OpenStack Summit - Austin recap

2016-05-03 Thread Ken Gaillot

Hi all,

Last week's OpenStack Summit in Austin, Texas, was quite an event --
equal parts spectacle and substance. ;-)

OpenStack -- the FOSS world's paradigm shifter for cloud infrastructure
-- is a growing part of the Pacemaker user base.

Thanks to the users who showed up for the ClusterLabs lunch meet-up. We
had people from SuSE, Red Hat, and NTT, and folks from LINBIT were
around the conference as well. Additionally, there were technical
meetings with much the same participants to discuss OpenStack's emerging
"instance HA" efforts.

My photography skills could use some improvement, but I did get a couple
of snapshots: http://people.redhat.com/kgaillot/atx-2016/

For those who couldn't make it (and even for those who did but could use
a refresher), videos of every presentation at the summit are available
at https://www.openstack.org/videos/summits/show/6

In particular, anyone interested in HA and OpenStack should check out
Adam Spiers and Dawid Deja's excellent presentation on the state of
instance HA:
https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation

I was a bit surprised by how many presenters claimed "HA" as part of
their topic's features, but when pressed, said their HA solution was
either planned for the future, or couldn't handle split-brain. It seems
we still have a lot of work to do to raise awareness of what true HA means.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Ken Gaillot

On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>> Hi all,
>>
>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>> would like Pacemaker to be extended to export the current failcount as
>> an environment variable to OCF RA scripts when they are invoked with
>> 'start' or 'stop' actions.  This would mean that if you have
>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>> would be able to implement a different behaviour for the third and
>> final 'stop' of a service executed on a node, which is different to
>> the previous 'stop' actions executed just prior to attempting a
>> restart of the service.  (In the non-clone case, this would happen
>> just before migrating the service to another node.)
> So what you actually want to know is how much headroom
> there still is till the resource would be migrated.
> So wouldn't it then be much more catchy if we don't pass
> the failcount but rather the headroom?

Yes, that's the plan: pass a new environment variable with
(migration-threshold - fail-count) when recovering a resource. I haven't
worked out the exact behavior yet, but that's the idea. I do hope to get
this in 1.1.15 since it's a small change.

The advantage over using crm_failcount is that it will be limited to the
current recovery attempt, and it will calculate the headroom as you say,
rather than the raw failcount.

>> One use case for this is to invoke "nova service-disable" if Pacemaker
>> fails to restart the nova-compute service on an OpenStack compute
>> node.
>>
>> Is it feasible to squeeze this in before the 1.1.15 release?
>>
>> Thanks a lot!
>> Adam

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ringid interface FAULTY no resource move

2016-05-04 Thread Ken Gaillot

On 05/04/2016 07:14 AM, Rafał Sanocki wrote:
> Hello,
> I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
> , DRBD .  When i plug out cable nothing happend.
> 
> Corosync.conf
> 
> # Please read the corosync.conf.5 manual page
> totem {
> version: 2
> crypto_cipher: none
> crypto_hash: none
> rrp_mode: passive
> 
> interface {
> ringnumber: 0
> bindnetaddr: 172.17.10.0
> mcastport: 5401
> ttl: 1
> }
> interface {
> ringnumber: 1
> bindnetaddr: 255.255.255.0
> mcastport: 5409
> ttl: 1
> }

255.255.255.0 is not a valid bindnetaddr. bindnetaddr should be the IP
network address (not netmask) of the desired interface.

Also, the point of rrp is to have two redundant network links. So
unplugging one shouldn't cause problems, if the other is still up.

> 
> transport: udpu
> }
> 
> logging {
> fileline: off
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> 
> nodelist {
> node {
> ring0_addr: 172.17.10.81
> ring1_addr: 255.255.255.1
> nodeid: 1
> }
> node {
> ring0_addr: 172.17.10.89
> ring1_addr: 255.255.255.9
> nodeid: 2
> }
> 
> }
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> }
> 
> crm config
> 
> crm(live)configure# show
> node 1: cs01A
> node 2: cs01B
> primitive p_drbd2dev ocf:linbit:drbd \
> params drbd_resource=b1 \
> op monitor interval=29s role=Master \
> op monitor interval=31s role=Slave \
> meta target-role=Started
> primitive p_exportfs_fs2 exportfs \
> params fsid=101 directory="/data1/b1"
> options="rw,sync,no_root_squash,insecure,anonuid=100,anongid=101,nohide"
> clientspec="172.17.10.0/255.255.255.0" wait_for_leasetime_on_stop=false \
> op monitor interval=30s \
> op start interval=0 timeout=240s \
> op stop interval=0 timeout=100s \
> meta target-role=Started
> primitive p_ip_2 IPaddr2 \
> params ip=172.17.10.97 nic=neteth0 cidr_netmask=24 \
> op monitor interval=30s timeout=5s \
> meta target-role=Started
> primitive p_mount_fs2 Filesystem \
> params fstype=reiserfs options="noatime,nodiratime,notail"
> directory="/data1" device="/dev/drbd2" \
> op start interval=0 timeout=400s \
> op stop interval=0 timeout=100s \
> op monitor interval=30s \
> meta target-role=Started
> group g_nfs2 p_ip_2 p_mount_fs2 p_exportfs_fs2
> ms ms_drbd2 p_drbd2dev \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true is-managed=true target-role=Slave
> colocation co_drbd2 inf: g_nfs2 ms_drbd2:Master
> order ms_drbd2_order Mandatory: ms_drbd2:promote g_nfs2:start
> property cib-bootstrap-options: \
> stonith-enabled=false \
> have-watchdog=true \
> dc-version=1.1.14-535193a \
> cluster-infrastructure=corosync \
> maintenance-mode=false \
> no-quorum-policy=ignore \
> last-lrm-refresh=1460627538
> 
> 
> # ip addr show
> neteth1:  mtu 1500 qdisc mq portid
> d8d385bda90c state DOWN group default qlen 1000
> link/ether d8:d3:85:aa:aa:aa brd ff:ff:ff:ff:ff:ff
> inet 255.255.255.1/24 brd 255.255.255.255 scope global neteth1
>valid_lft forever preferred_lft forever
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
> id  = 172.17.10.81
> status  = ring 0 active with no faults
> RING ID 1
> id  = 255.255.255.1
> status  = Marking ringid 1 interface 255.255.255.1 FAULTY
> 
> #crm_mon -A
> 
> Stack: corosync
> Current DC: csb01A (version 1.1.14-535193a) - partition with quorum
> Last updated: Wed May  4 14:11:34 2016  Last change: Thu Apr 14
> 13:06:15 2016 by root via crm_resource on csb01B
> 
> 2 nodes and 5 resources configured: 2 resources DISABLED and 0 BLOCKED
> from being started due to failures
> 
> Online: [ cs01A cs01B ]
> 
>  Resource Group: g_nfs2
>  p_ip_2 (ocf::heartbeat:IPaddr2):   Started csb01A
>  p_mount_fs2(ocf::heartbeat:Filesystem):Started csb01A
>  p_exportfs_fs2 (ocf::heartbeat:exportfs):  Started csb01A
>  Master/Slave Set: ms_drbd2 [p_drbd2dev]
>  Masters: [ csb01A ]
>  Slaves (target-role): [ csb01B ]
> 
> Node Attributes:
> * Node csb01A:
> + master-p_drbd2dev : 1
> * Node csb01B:
> + master-p_drbd2dev : 1000
> 


_

Re: [ClusterLabs] why and when a call of crm_attribute can be delayed ?

2016-05-04 Thread Ken Gaillot

On 04/25/2016 05:02 AM, Jehan-Guillaume de Rorthais wrote:
> Hi all,
> 
> I am facing a strange issue with attrd while doing some testing on a three 
> node
> cluster with the pgsqlms RA [1].
> 
> pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave
> setup on top of pgsqld.
> 
> Before triggering a failure, here was the situation:
> 
>   * centos1: pgsql-ha slave
>   * centos2: pgsql-ha slave
>   * centos3: pgsql-ha master
> 
> Then we triggered a failure: the node centos3 has been kill using 
> 
>   echo c > /proc/sysrq-trigger
> 
> In this situation, PEngine provide a transition where :
> 
>   * centos3 is fenced 
>   * pgsql-ha on centos2 is promoted
> 
> During the pre-promote notify action in the pgsqlms RA, each remaining slave 
> are
> setting a node attribute called lsn_location, see: 
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504
> 
>   crm_attribute -l reboot -t status --node "$nodename" \
> --name lsn_location --update "$node_lsn"
> 
> During the promotion action in the pgsqlms RA, the RA check the lsn_location 
> of
> the all the nodes to make sure the local one is higher or equal to all others.
> See:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292
> 
> This is where we face a attrd behavior we don't understand.
> 
> Despite we can see in the log the RA was able to set its local
> "lsn_location", during the promotion action, the RA was unable to read its
> local lsn_location":
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
> INFO: pgsql_notify: promoting instance on node "centos2" 
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
> INFO: pgsql_notify: current node LSN: 0/1EE24000 
> 
>   [...]
> 
>   pgsqlms(pgsqld)[9023]:  2016/04/22_14:46:16
> CRIT: pgsql_promote: can not get current node LSN location
> 
>   Apr 22 14:46:16 [5864] centos2   lrmd:
> notice: operation_finished: pgsqld_promote_0:9023:stderr 
> [ Error performing operation: No such device or address ] 
> 
>   Apr 22 14:46:16 [5864] centos2   lrmd: 
> info: log_finished:  finished - rsc:pgsqld
> action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms
> queue-time:0ms
> 
> The error comes from:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320
> 
> **After** this error, we can see in the log file attrd set the "lsn_location" 
> of
> centos2:
> 
>   Apr 22 14:46:16 [5865] centos2
> attrd: info: attrd_peer_update:
> Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 
> 
>   Apr 22 14:46:16 [5865] centos2
> attrd: info: write_attribute:   
> Write out of 'lsn_location' delayed:update 189 in progress
> 
> 
> As I understand it, the call of crm_attribute during pre-promote notification
> has been taken into account AFTER the "promote" action, leading to this error.
> Am I right?
> 
> Why and how this could happen? Could it comes from the dampen parameter? We 
> did
> not set any dampen anywhere, is there a default value in the cluster setup?
> Could we avoid this behavior?

Unfortunately, that is expected. Both the cluster's call of the RA's
notify action, and the RA's call of crm_attribute, are asynchronous. So
there is no guarantee that anything done by the pre-promote notify will
be complete (or synchronized across other cluster nodes) by the time the
promote action is called.

There would be no point in the pre-promote notify waiting for the
attribute value to be retrievable, because the cluster isn't going to
wait for the pre-promote notify to finish before calling promote.

Maybe someone else can come up with a better idea, but I'm thinking
maybe the attribute could be set as timestamp:lsn, and the promote
action could poll attrd repeatedly (for a small duration lower than the
typical promote timeout) until it gets lsn's with a recent timestamp
from all nodes. One error condition to handle would be if one of the
other slaves happens to fail or be unresponsive at that time.

> Please, find in attachment a tarball with :
>   * all cluster logfiles from the three nodes
>   * the content of /var/lib/pacemaker from the three nodes:
> * CIBs
> * PEngine transitions
> 
> 
> Regards,
> 
> [1] https://github.com/dalibo/PAF
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: ringid interface FAULTY no resource move

2016-05-05 Thread Ken Gaillot

On 05/05/2016 07:43 AM, Rafał Sanocki wrote:
> for what?
> 
> cluster working good when i stop pacemaker, resources go to second node,
> but when connection is lost nothing happend.
> 
> Failed Actions:
> * p_ip_2_monitor_3 on csb01B 'not running' (7): call=47,
> status=complete, exitreason='none',
> last-rc-change='Wed May  4 17:34:50 2016', queued=0ms, exec=0ms
> 
> I just want to move resources when connection on one ring is lost.

If you have two working rings, and one is lost, then the other will be
used automatically, with no resource failover required. "Nothing
happened" in that case is the desired outcome.

If you lose your only ring (or all rings), then fencing is required for
the cluster to recover resources safely.

As an example, let's say we have two nodes, and two networks, one used
for public-facing services and one used for cluster communication, with
an IP address as the highly available resource. If the cluster
communication link fails but the public-facing link is still up, without
fencing both nodes could bring up the IP address. That would lead to
some packets going to one node and some to the other, rendering the
service completely unusable. Fencing allows one node to be sure the
other is not using the IP. Once you use shared storage, databases, and
such, the risk is much greater -- a reconciliation nightmare at best,
complete data loss at worst.

> W dniu 2016-05-04 o 15:50, emmanuel segura pisze:
>> use fencing and drbd fencing handler
>>
>> 2016-05-04 14:46 GMT+02:00 Rafał Sanocki :
>>> Resources shuld move to second node when any  interface is down.
>>>
>>>
>>>
>>>
>>> W dniu 2016-05-04 o 14:41, Ulrich Windl pisze:
>>>
>>> Rafal Sanocki  schrieb am 04.05.2016 um
>>> 14:14
>>> in
 Nachricht <78d882b1-a407-31e0-2b9e-b5f8406d4...@gmail.com>:
> Hello,
> I cant find what i did wrong. I have 2 node cluster, Corosync
> ,Pacemaker
> , DRBD .  When i plug out cable nothing happend.
 "nothing"? The wrong cable?

 [...]

 Regards,
 Ulrich

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

2016-05-05 Thread Ken Gaillot

On 05/05/2016 08:36 AM, Nikhil Utane wrote:
> Hi,
> 
> Continuing with my adventure to run Pacemaker & Corosync on our
> big-endian system, I managed to get past the corosync issue for now. But
> facing an issue in running Pacemaker.
> 
> Seeing following messages in corosync.log.
>  pacemakerd:  warning: pcmk_child_exit:  The cib process (2) can no
> longer be respawned, shutting the cluster down.
>  pacemakerd:  warning: pcmk_child_exit:  The stonith-ng process (20001)
> can no longer be respawned, shutting the cluster down.
>  pacemakerd:  warning: pcmk_child_exit:  The lrmd process (20002) can no
> longer be respawned, shutting the cluster down.
>  pacemakerd:  warning: pcmk_child_exit:  The attrd process (20003) can
> no longer be respawned, shutting the cluster down.
>  pacemakerd:  warning: pcmk_child_exit:  The pengine process (20004) can
> no longer be respawned, shutting the cluster down.
>  pacemakerd:  warning: pcmk_child_exit:  The crmd process (20005) can no
> longer be respawned, shutting the cluster down.
> 
> I see following error before these messages. Not sure if this is the cause.
> May 05 11:26:24 [19998] airv_cu pacemakerd:error:
> cluster_connect_quorum:   Corosync quorum is not configured
> 
> I tried removing the quorum block (which is anyways blank) from the conf
> file but still had the same error.

Yes, that is the issue. Pacemaker can't do anything if it can't ask
corosync about quorum. I don't know what the issue is at the corosync
level, but your corosync.conf should have:

quorum {
provider: corosync_votequorum
}


> Attaching the log and conf files. Please let me know if there is any
> obvious mistake or how to investigate it further.
> 
> I am using pcs cluster start command to start the cluster
> 
> -Thanks
> Nikhil

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

2016-05-05 Thread Ken Gaillot

On 05/05/2016 11:25 AM, Nikhil Utane wrote:
> Thanks Ken for your quick response as always.
> 
> But what if I don't want to use quorum? I just want to bring up
> pacemaker + corosync on 1 node to check that it all comes up fine.
> I added corosync_votequorum as you suggested. Additionally I also added
> these 2 lines:
> 
> expected_votes: 2
> two_node: 1

There's actually nothing wrong with configuring a single-node cluster.
You can list just one node in corosync.conf and leave off the above.

> However still pacemaker is not able to run.

There must be other issues involved. Even if pacemaker doesn't have
quorum, it will still run, it just won't start resources.

> [root@airv_cu root]# pcs cluster start
> Starting Cluster...
> Starting Pacemaker Cluster Manager[FAILED]
> 
> Error: unable to start pacemaker
> 
> Corosync.log:
> *May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> pcmk_quorum_notification: Membership 240: quorum still lost (1)*
> May 05 16:15:20 [16259] airv_cu corosync debug   [QB] Free'ing
> ringbuffer: /dev/shm/qb-cmap-request-16259-16294-21-header
> May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
> crm_update_peer_state_iter:   pcmk_quorum_notification: Node
> airv_cu[181344357] - state is now member (was (null))
> May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> pcmk_cpg_membership:  Node 181344357 joined group pacemakerd
> (counter=0.0)
> May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> pcmk_cpg_membership:  Node 181344357 still member of group
> pacemakerd (peer=airv_cu, counter=0.0)
> May 05 16:15:20 [16294] airv_cu pacemakerd:  warning: pcmk_child_exit:
>  The cib process (16353) can no longer be respawned, shutting the
> cluster down.
> May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
> pcmk_shutdown_worker:     Shutting down Pacemaker
> 
> The log and conf file is attached.
> 
> -Regards
> Nikhil
> 
> On Thu, May 5, 2016 at 8:04 PM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 05/05/2016 08:36 AM, Nikhil Utane wrote:
> > Hi,
> >
> > Continuing with my adventure to run Pacemaker & Corosync on our
> > big-endian system, I managed to get past the corosync issue for now. But
> > facing an issue in running Pacemaker.
> >
> > Seeing following messages in corosync.log.
> >  pacemakerd:  warning: pcmk_child_exit:  The cib process (2) can no
> > longer be respawned, shutting the cluster down.
> >  pacemakerd:  warning: pcmk_child_exit:  The stonith-ng process (20001)
> > can no longer be respawned, shutting the cluster down.
> >  pacemakerd:  warning: pcmk_child_exit:  The lrmd process (20002) can no
> > longer be respawned, shutting the cluster down.
> >  pacemakerd:  warning: pcmk_child_exit:  The attrd process (20003) can
> > no longer be respawned, shutting the cluster down.
> >  pacemakerd:  warning: pcmk_child_exit:  The pengine process (20004) can
> > no longer be respawned, shutting the cluster down.
> >  pacemakerd:  warning: pcmk_child_exit:  The crmd process (20005) can no
> > longer be respawned, shutting the cluster down.
> >
> > I see following error before these messages. Not sure if this is the 
> cause.
> > May 05 11:26:24 [19998] airv_cu pacemakerd:error:
> > cluster_connect_quorum:   Corosync quorum is not configured
> >
> > I tried removing the quorum block (which is anyways blank) from the conf
> > file but still had the same error.
> 
> Yes, that is the issue. Pacemaker can't do anything if it can't ask
> corosync about quorum. I don't know what the issue is at the corosync
> level, but your corosync.conf should have:
> 
> quorum {
> provider: corosync_votequorum
> }
> 
> 
> > Attaching the log and conf files. Please let me know if there is any
> > obvious mistake or how to investigate it further.
> >
> > I am using pcs cluster start command to start the cluster
> >
> > -Thanks
> > Nikhil

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-09 Thread Ken Gaillot

On 05/04/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
>>> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>>>> Hi all,
>>>>
>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>>>> would like Pacemaker to be extended to export the current failcount as
>>>> an environment variable to OCF RA scripts when they are invoked with
>>>> 'start' or 'stop' actions.  This would mean that if you have
>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>>>> would be able to implement a different behaviour for the third and
>>>> final 'stop' of a service executed on a node, which is different to
>>>> the previous 'stop' actions executed just prior to attempting a
>>>> restart of the service.  (In the non-clone case, this would happen
>>>> just before migrating the service to another node.)
>>> So what you actually want to know is how much headroom
>>> there still is till the resource would be migrated.
>>> So wouldn't it then be much more catchy if we don't pass
>>> the failcount but rather the headroom?
>>
>> Yes, that's the plan: pass a new environment variable with
>> (migration-threshold - fail-count) when recovering a resource. I haven't
>> worked out the exact behavior yet, but that's the idea. I do hope to get
>> this in 1.1.15 since it's a small change.
>>
>> The advantage over using crm_failcount is that it will be limited to the
>> current recovery attempt, and it will calculate the headroom as you say,
>> rather than the raw failcount.
> 
> Headroom sounds more usable, but if it's not significant extra work,
> why not pass both?  It could come in handy, even if only for more
> informative logging from the RA.
> 
> Thanks a lot!

Here is what I'm testing currently:

- When the cluster recovers a resource, the resource agent's stop action
will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
migration-threshold - fail-count on the local node.

- The variable is not added for any action other than stop.

- I'm preferring simplicity over flexibility by providing only a single
variable. The RA theoretically can already get the migration-threshold
from the CIB and fail-count from attrd -- what we're adding is the
knowledge that the stop is part of a recovery.

- If the stop is final (the cluster does not plan to start the resource
anywhere), the variable may be set to 0, or unset. The RA should treat 0
and unset as equivalent.

- So, the variable will be 1 for the stop before the last time the
cluster will try to start the resource on the same node, and 0 or unset
for the last stop on this node before trying to start on another node.

- The variable will be set only in situations when the cluster will
consider migration-threshold. This makes sense, but some situations may
be unintuitive:

-- If a resource is being recovered, but the fail-count is being cleared
in the same transition, the cluster will ignore migration-threshold (and
the variable will not be set). The RA might see recovery_left=5, 4, 3,
then someone clears the fail-count, and it won't see recovery_left even
though there is a stop and start being attempted.

-- Migration-threshold will be considered (and the variable will be set)
only if the resource is being recovered due to failure, not if the
resource is being restarted or moved for some other reason (constraints,
node standby, etc.).

-- The previous point is true even if the resource is restarting/moving
because it is part of a group with another member being recovered due to
failure. Only the failed resource will get the variable. I can see this
might be problematic for interested RAs, because the resource may be
restarted several times on the local node then forced away, without the
variable ever being present -- but the resource will be forced away
because it is part of a group that is moving, not because it is being
recovered (its own fail-count stays 0).

Let me know if you see any problems or have any suggestions.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-10 Thread Ken Gaillot

On 05/10/2016 02:29 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 10.05.2016 um 00:40 in 
>>>> Nachricht
> <573111d3.7060...@redhat.com>:
>> On 05/04/2016 11:47 AM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
>>>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
>>>>> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>>>>>> would like Pacemaker to be extended to export the current failcount as
>>>>>> an environment variable to OCF RA scripts when they are invoked with
>>>>>> 'start' or 'stop' actions.  This would mean that if you have
>>>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>>>>>> would be able to implement a different behaviour for the third and
>>>>>> final 'stop' of a service executed on a node, which is different to
>>>>>> the previous 'stop' actions executed just prior to attempting a
>>>>>> restart of the service.  (In the non-clone case, this would happen
>>>>>> just before migrating the service to another node.)
>>>>> So what you actually want to know is how much headroom
>>>>> there still is till the resource would be migrated.
>>>>> So wouldn't it then be much more catchy if we don't pass
>>>>> the failcount but rather the headroom?
>>>>
>>>> Yes, that's the plan: pass a new environment variable with
>>>> (migration-threshold - fail-count) when recovering a resource. I haven't
>>>> worked out the exact behavior yet, but that's the idea. I do hope to get
>>>> this in 1.1.15 since it's a small change.
>>>>
>>>> The advantage over using crm_failcount is that it will be limited to the
>>>> current recovery attempt, and it will calculate the headroom as you say,
>>>> rather than the raw failcount.
>>>
>>> Headroom sounds more usable, but if it's not significant extra work,
>>> why not pass both?  It could come in handy, even if only for more
>>> informative logging from the RA.
>>>
>>> Thanks a lot!
>>
>> Here is what I'm testing currently:
>>
>> - When the cluster recovers a resource, the resource agent's stop action
>> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
>> migration-threshold - fail-count on the local node.
> 
> With that mechanism RA testingwill be more complicated as it is now, and I 
> cannot see the benefit yet.

Testing will be more complicated for RAs that choose to behave
differently depending on the variable value, but the vast, vast majority
won't, so it will have no effect on most users. No pacemaker behavior
changes.

BTW I should have explicitly mentioned that the variable name is up for
discussion; I had a hard time coming up with something meaningful that
didn't span an entire line of text.

>>
>> - The variable is not added for any action other than stop.
>>
>> - I'm preferring simplicity over flexibility by providing only a single
>> variable. The RA theoretically can already get the migration-threshold
>> from the CIB and fail-count from attrd -- what we're adding is the
>> knowledge that the stop is part of a recovery.
>>
>> - If the stop is final (the cluster does not plan to start the resource
>> anywhere), the variable may be set to 0, or unset. The RA should treat 0
>> and unset as equivalent.
>>
>> - So, the variable will be 1 for the stop before the last time the
>> cluster will try to start the resource on the same node, and 0 or unset
>> for the last stop on this node before trying to start on another node.
> 
> Be aware that the node could be fenced (for reasons ouside of your RA) even 
> before all these attempts are carried out.

Yes, by listing such scenarios and the ones below, I am hoping the
potential users of this feature can think through whether it will be
sufficient for their use cases.

>>
>> - The variable will be set only in situations when the cluster will
>> consider migration-threshold. This makes sense, but some situations may
>> be unintuitive:
>>
>> -- If a resource is being recovered, but the fail-count is being cleared
>> in the same transition, the cluster will ignore migration-threshold (and
>> the variable will not be set). The RA might see recovery_left=5, 4, 3,
>> then someone clears t

Re: [ClusterLabs] pacemaker and fence_sanlock

2016-05-12 Thread Ken Gaillot

On 05/11/2016 09:14 PM, Da Shi Cao wrote:
> Dear all,
> 
> I'm just beginning to use pacemaker+corosync as our HA solution on
> Linux, but I got stuck at the stage of configuring fencing.
> 
> Pacemaker 1.1.15,  Corosync Cluster Engine, version '2.3.5.46-d245', and
> sanlock 3.3.0 (built May 10 2016 05:13:12)
> 
> I have the following questions:
> 
> 1. stonith_admin --list-installed will only list two agents: fence_pcmk,
> fence_legacy before sanlock is compiled and installed under /usr/local.
> But after "make install" of sanlock, stonith_admin --list-installed will
> list: 
> 
>  fence_sanlockd
>  fence_sanlock
>  fence_pcmk
>  fence_legacy
>  It is weird and I wonder what makes stonith_admin know about fence_sanlock?

I'm guessing you also installed pacemaker under /usr/local;
stonith_admin will simply list $installdir/sbin/fence_*

> 2. How to configure the fencing by fence_sanlock into pacemaker? I've
> tried to create a new resource to do the unfencing for each node, but
> the resource start will fail since there is no monitor operation of
> fence_sanlock agent, because resource manager will fire monitor once
> after the start to make sure it has been started OK.

I'm not familiar with fence_sanlock, but it should be fine to do what
you describe. There's probably an issue with your configuration. What
command did you use to configure the resource?

> 3. How to create a fencing resource to do the fencing by sanlock. This
> I've not tried yet. But I wonder which node/nodes of the majority will
> initiate the fence operations to the nodes without quorum.

Once you've defined the resource in the pacemaker configuration, the
cluster will intelligently decide when and how to call it.

When you check the cluster status, you'll see that the fence device is
"running" on one node. In fact, any node can use the fence device
(assuming the configuration doesn't specifically ban it); the listed
node is the one running the recurring monitor on the resource. The
cluster considers that node to have "verified" access to the device, so
it will prefer that node when fencing using the device -- but it may
decide to choose another node when appropriate.

You may be interested to know that pacemaker has recently gained native
support for watchdog-based fencing via the "sbd" software package. See:

  http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit/
  http://clusterlabs.org/wiki/Using_SBD_with_Pacemaker

Some discussion of common configuration issues can be seen at:

  https://bugzilla.redhat.com/show_bug.cgi?id=1221680

If you have a Red Hat subscription, Red Hat has a simple walk-through
for configuring sbd with pacemaker on RHEL 6.8+/7.1+ (using watchdog
only, no "poison pill" shared storage):

  https://access.redhat.com/articles/2212861

> Thank you very much.
> Dashi Cao

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Ken Gaillot

On 05/12/2016 02:56 AM, Ulrich Windl wrote:
> Hi!
> 
> I have a question regarding an RA written by myself and pacemaker 
> 1.1.12-f47ea56 (SLES11 SP4):
> 
> During "probe" all resources' "monitor" actions are executed (regardless of 
> any ordering constraints). Therefore my RA considers a parameter as invalid 
> ("file does not exist") (the file will be provided once some supplying 
> resource is up) and returns rc=2.
> OK, this may not be optimal, but pacemaker makes it worse: It does not repeat 
> the probe once the resource would start, but keeps the state, preventing a 
> resource start:
> 
>  primitive_monitor_0 on h05 'invalid parameter' (2): call=73, 
> status=complete, exit-reason='none', last-rc-change='Wed May 11 17:03:39 
> 2016', queued=0ms, exec=82ms

Correct, OCF_ERR_CONFIGURED is a "fatal" error:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_how_are_ocf_return_codes_interpreted

> So you would say that monitor may only return "success" or "not running", but 
> I feel the RA should detect the condition that the resource could not run at 
> all at the present state.

OCF_ERR_CONFIGURED is meant to indicate that the resource could not
possibly run *as configured*, regardless of the system's current state.
So for example, a required parameter is missing or invalid.

You could possibly use OCF_ERR_ARGS in this case (a "hard" error that
bans the particular node, and means that the resource's configuration is
not valid on this particular node).

But, I suspect the right answer here is simply an order constraint
between the supplying resource and this resource. This resource's start
action, not monitor, should be the one that checks for the existence of
the supplied file.

> Shouldn't pacemaker reprobe resources before it tries to start them?

Probes are meant to check whether the resource is already active
anywhere. The decision of whether and where to start the resource takes
into account the result of the probes, so it doesn't make sense to
re-probe -- that's what the initial probe was for.

> Before my RA had passed all the ocf-tester checks, so this situation is hard 
> to test (unless you have a test cluster you can restart any time).
> 
> (After manual resource cleanup the resource started as usual)
> 
> My monitor uses the following logic:
> ---
> monitor|status)
> if validate; then
> set_variables
> check_resource || exit $OCF_NOT_RUNNING
> status=$OCF_SUCCESS
> else # cannot check status with invalid parameters
> status=$?
> fi
> exit $status
> ;;
> ---
> 
> Should I mess with ocf_is_probe?
> 
> Regards,
> Ulrich

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-12 Thread Ken Gaillot

On 05/12/2016 06:21 AM, Adam Spiers wrote:
> Hi Ken,
> 
> Firstly thanks a lot not just for working on this, but also for being
> so proactive in discussing the details.  A perfect example of
> OpenStack's "Open Design" philosophy in action :-)
> 
> Ken Gaillot  wrote:
>> On 05/10/2016 02:29 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot  schrieb am 10.05.2016 um 00:40 in 
>>>>>> Nachricht
>>> <573111d3.7060...@redhat.com>:
> 
> [snipped]
> 
>>>> Here is what I'm testing currently:
>>>>
>>>> - When the cluster recovers a resource, the resource agent's stop action
>>>> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
>>>> migration-threshold - fail-count on the local node.
>>>
>>> With that mechanism RA testingwill be more complicated as it is
>>> now, and I cannot see the benefit yet.
>>
>> Testing will be more complicated for RAs that choose to behave
>> differently depending on the variable value, but the vast, vast majority
>> won't, so it will have no effect on most users. No pacemaker behavior
>> changes.
>>
>> BTW I should have explicitly mentioned that the variable name is up for
>> discussion; I had a hard time coming up with something meaningful that
>> didn't span an entire line of text.
> 
> I'd prefer plural (OCF_RESKEY_CRM_meta_recoveries_left) but other than
> that I think it's good.  OCF_RESKEY_CRM_meta_retries_left is shorter;
> not sure whether it's marginally worse or better though.

I'm now leaning to restart_remaining (restarts_remaining would be just
as good).

>>>> - The variable is not added for any action other than stop.
>>>>
>>>> - I'm preferring simplicity over flexibility by providing only a single
>>>> variable. The RA theoretically can already get the migration-threshold
>>>> from the CIB and fail-count from attrd -- what we're adding is the
>>>> knowledge that the stop is part of a recovery.
>>>>
>>>> - If the stop is final (the cluster does not plan to start the resource
>>>> anywhere), the variable may be set to 0, or unset. The RA should treat 0
>>>> and unset as equivalent.
>>>>
>>>> - So, the variable will be 1 for the stop before the last time the
>>>> cluster will try to start the resource on the same node, and 0 or unset
>>>> for the last stop on this node before trying to start on another node.
> 
> OK, so the RA code would typically be something like this?
> 
> if [ ${OCF_RESKEY_CRM_meta_retries_left:-0} = 0 ]; then
> # This is the final stop, so tell the external service
> # not to send any more work our way.
> disable_service
> fi

I'd use -eq :) but yes

>>> Be aware that the node could be fenced (for reasons ouside of your
>>> RA) even before all these attempts are carried out.
>>
>> Yes, by listing such scenarios and the ones below, I am hoping the
>> potential users of this feature can think through whether it will be
>> sufficient for their use cases.
> 
> That's a good point, but I think it's OK because if the node gets
> fenced, we have one and shortly two different mechanisms for achieving
> the same thing:
> 
>   1. add another custom fencing agent to fencing_topology
>   2. use the new events mechanism
> 
>>>> - The variable will be set only in situations when the cluster will
>>>> consider migration-threshold. This makes sense, but some situations may
>>>> be unintuitive:
>>>>
>>>> -- If a resource is being recovered, but the fail-count is being cleared
>>>> in the same transition, the cluster will ignore migration-threshold (and
>>>> the variable will not be set). The RA might see recovery_left=5, 4, 3,
>>>> then someone clears the fail-count, and it won't see recovery_left even
>>>> though there is a stop and start being attempted.
> 
> Hmm.  So how would the RA distinguish that case from the one where
> the stop is final?

That's the main question in all this. There are quite a few scenarios
where there's no meaningful distinction between 0 and unset. With the
current implementation at least, the ideal approach is for the RA to
treat the last stop before a restart the same as a final stop.

>>>> -- Migration-threshold will be considered (and the variable will be set)
>>>> only if the resource is being recovered due to failure, not if the
>>>> resource is being restarted or moved

Re: [ClusterLabs] notify action asynchronous ?

2016-05-12 Thread Ken Gaillot

On 05/12/2016 04:37 AM, Jehan-Guillaume de Rorthais wrote:
> Le Sun, 8 May 2016 16:35:25 +0200,
> Jehan-Guillaume de Rorthais  a écrit :
> 
>> Le Sat, 7 May 2016 00:27:04 +0200,
>> Jehan-Guillaume de Rorthais  a écrit :
>>
>>> Le Wed, 4 May 2016 09:55:34 -0500,
>>> Ken Gaillot  a écrit :
>> ...
>>>> There would be no point in the pre-promote notify waiting for the
>>>> attribute value to be retrievable, because the cluster isn't going to
>>>> wait for the pre-promote notify to finish before calling promote.
>>>
>>> Oh, this is surprising. I thought the pseudo action
>>> "*_confirmed-pre_notify_demote_0" in the transition graph was a wait for
>>> each resource clone return code before going on with the transition. The
>>> graph is confusing, if the cluster isn't going to wait for the pre-promote
>>> notify to finish before calling promote, I suppose some arrows should point
>>> directly from start (or post-start-notify?) action directly to the promote
>>> action then, isn't it?
>>>
>>> This is quite worrying as our RA rely a lot on notifications. As instance,
>>> we try to recover a PostgreSQL instance during pre-start or pre-demote if we
>>> detect a recover action...
>>
>> I'm coming back on this point.
>>
>> Looking at this documentation page:
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html
>>
>> I can read "Arrows indicate ordering dependencies".
>>
>> Looking at the transition graph I am studying (see attachment, a simple
>> master resource move), I still don't understand how the cluster isn't going 
>> to
>> wait for a pre-promote notify to finish before calling promote.
>>
>> So either I misunderstood your words or I miss something else important, 
>> which
>> is quite possible as I am fairly new to this word. Anyway, I try to make a
>> RA as robust as possible and any lights/docs are welcome!
> 
> I tried to trigger this potential asynchronous behavior of the notify action,
> but couldn't observe it.
> 
> I added different sleep period in the notify action for each node of my 
> cluster:
>   * 10s for hanode1
>   * 15s for hanode2
>   * 20s for hanode3
> 
> The master was on hanode1 and  the DC was hanode1. While moving the master
> resource to hanode2, I can see in the log files that the DC is always
> waiting for the rc of hanode3 before triggering the next action in the
> transition.
> 
> So, **in pratice**, it seems the notify action is synchronous. In theory now, 
> I
> still wonder if I misunderstood your words...

I think you're right, and I was mistaken. The asynchronicity most likely
comes purely from crm_attribute not waiting for the value to be set and
propagated to all nodes.

I think I was confusing clone notifications with the new alerts feature,
which is asynchronous. We named that "alerts" to try to avoid such
confusion, but my brain hasn't gotten the memo yet ;)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-12 Thread Ken Gaillot

On 05/09/2016 06:36 PM, Jehan-Guillaume de Rorthais wrote:
> Le Mon, 9 May 2016 17:40:19 -0500,
> Ken Gaillot  a écrit :
> 
>> On 05/04/2016 11:47 AM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
>>>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
>>>>> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>>>>>> would like Pacemaker to be extended to export the current failcount as
>>>>>> an environment variable to OCF RA scripts when they are invoked with
>>>>>> 'start' or 'stop' actions.  This would mean that if you have
>>>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>>>>>> would be able to implement a different behaviour for the third and
>>>>>> final 'stop' of a service executed on a node, which is different to
>>>>>> the previous 'stop' actions executed just prior to attempting a
>>>>>> restart of the service.  (In the non-clone case, this would happen
>>>>>> just before migrating the service to another node.)
>>>>> So what you actually want to know is how much headroom
>>>>> there still is till the resource would be migrated.
>>>>> So wouldn't it then be much more catchy if we don't pass
>>>>> the failcount but rather the headroom?
>>>>
>>>> Yes, that's the plan: pass a new environment variable with
>>>> (migration-threshold - fail-count) when recovering a resource. I haven't
>>>> worked out the exact behavior yet, but that's the idea. I do hope to get
>>>> this in 1.1.15 since it's a small change.
>>>>
>>>> The advantage over using crm_failcount is that it will be limited to the
>>>> current recovery attempt, and it will calculate the headroom as you say,
>>>> rather than the raw failcount.
>>>
>>> Headroom sounds more usable, but if it's not significant extra work,
>>> why not pass both?  It could come in handy, even if only for more
>>> informative logging from the RA.
>>>
>>> Thanks a lot!
>>
>> Here is what I'm testing currently:
>>
>> - When the cluster recovers a resource, the resource agent's stop action
>> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
>> migration-threshold - fail-count on the local node.
>>
>> - The variable is not added for any action other than stop.
> 
> If the resource is a multistate one, the recover action will do a
> demote->stop->start->promote. What if the failure occurs during the first
> demote call and a new transition will try to demote first again? I suppose 
> this
> new variable should appears at least in demote and stop action to cover such
> situation, isn't it?

Good question. I can easily imagine a "lightweight stop", but I can't
think of a practical use for a "lightweight demote". If someone has a
scenario where that would be useful, I can look at adding it.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Q: monitor and probe result codes and consequences

2016-05-13 Thread Ken Gaillot

On 05/13/2016 06:00 AM, Ulrich Windl wrote:
>>>> Dejan Muhamedagic  schrieb am 13.05.2016 um 12:16 in
> Nachricht <20160513101626.GA12493@walrus.homenet>:
>> Hi,
>>
>> On Fri, May 13, 2016 at 09:05:54AM +0200, Ulrich Windl wrote:
>>>>>> Ken Gaillot  schrieb am 12.05.2016 um 16:41 in 
>>>>>> Nachricht
>>> <57349629.40...@redhat.com>:
>>>> On 05/12/2016 02:56 AM, Ulrich Windl wrote:
>>>>> Hi!
>>>>>
>>>>> I have a question regarding an RA written by myself and pacemaker 
>>>> 1.1.12-f47ea56 (SLES11 SP4):
>>>>>
>>>>> During "probe" all resources' "monitor" actions are executed (regardless 
>>>>> of 
>>
>>>> any ordering constraints). Therefore my RA considers a parameter as 
>>>> invalid 
>>
>>>> ("file does not exist") (the file will be provided once some supplying 
>>>> resource is up) and returns rc=2.
>>>>> OK, this may not be optimal, but pacemaker makes it worse: It does not 
>>>> repeat the probe once the resource would start, but keeps the state, 
>>>> preventing a resource start:
>>>>>
>>>>>  primitive_monitor_0 on h05 'invalid parameter' (2): call=73, 
>>>> status=complete, exit-reason='none', last-rc-change='Wed May 11 17:03:39 
>> 2016', 
>>>> queued=0ms, exec=82ms
>>>>
>>>> Correct, OCF_ERR_CONFIGURED is a "fatal" error:
>>>>
>>>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explai
>>  
>>
>>>> ned/index.html#_how_are_ocf_return_codes_interpreted
>>>
>>> I think the mistake here is assuming that OCF_ERR_CONFIGURED only depends 
>>> on 
>> the RA parameters, when in fact the validity of RA params may depend on the 
>> environment found at the time of checking. And as we all now the environment 
>> changes, especially when resources are started and stopped.

It's not a mistake; it's the definition of OCF_ERR_CONFIGURED. The
various documentation could be more clear, but the intent is: the
resource's configuration *in the cluster configuration* is inherently
invalid, and could never work, regardless of what system it was on.
Examples listed include a required parameter not being specified, or a
nonnumeric value supplied for an integer parameter.

OCF_ERR_CONFIGURED is the *only* fatal error (fatal = the operation will
not be retried anywhere).

Of course, as you mentioned later, here you actually have OCF_ERR_ARGS
(2) -- I saw "invalid parameter" and mistakenly thought
OCF_ERR_CONFIGURED ...

>>>>> So you would say that monitor may only return "success" or "not running", 
>>>> but I feel the RA should detect the condition that the resource could not 
>> run 
>>>> at all at the present state.
>>>>
>>>> OCF_ERR_CONFIGURED is meant to indicate that the resource could not
>>>> possibly run *as configured*, regardless of the system's current state.
>>>
>>> But how do you handle parameters that describe file or host names (which 
>>> may 
>> exist or not independently of a change in the param's value)?
>>
>> The RA should've exited with OCF_ERR_INSTALLED. That's the code
>> which means that there's something wrong with the environment on
>> this node, but that the resource could be started on another one.
> 
> Really, besides implementation, I don't see why OCF_ERR_INSTALLED is less 
> permanent than  OCF_ERR_ARGS.
> 
>>
>>>> So for example, a required parameter is missing or invalid.
>>>>
>>>> You could possibly use OCF_ERR_ARGS in this case (a "hard" error that
>>>> bans the particular node, and means that the resource's configuration is
>>>> not valid on this particular node).
>>>
>>> ("rc=2" _is_ OCF_ERR_ARGS)

OCF_ERR_ARGS and OCF_ERR_INSTALLED are both hard errors (hard = the
operation will not be retried on this node).

Looking at the code, OCF_ERR_INSTALLED is treated differently for probes
(it's not recorded as a failed op), but I think the node would still get
banned from running the resource.

I totally understand your point now: The probe may hit one of these
conditions only because it is called before depended-on resources are
up, but the cluster doesn't really care -- it just wants to know whether
the resource is running. Using ocf_is_probe to mask these errors works,
bu

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-13 Thread Ken Gaillot

On 05/12/2016 02:26 AM, Nguyen Xuan. Hai wrote:
> Hi,
> 
> I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is
> it possible?
> Is there any examination about that?
> 
> Thanks for your help!
> Hai Nguyen

It is not currently possible, and there are no plans to support it.

I'm not familiar enough with the details to know whether it would be a
suitable replacement. I don't think the current developers working on
Pacemaker have the time to investigate the possibility, but we always
welcome anyone interested in contributing :-)

Pacemaker already has the ability to work with either Heartbeat or
Corosync as the cluster communication layer, so there is already some
abstraction in place. Much of the layer-specific code is in Pacemaker's
libcluster.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] start a resource

2016-05-13 Thread Ken Gaillot

On 05/06/2016 01:01 PM, Dimitri Maziuk wrote:
> On 05/06/2016 12:05 PM, Ian wrote:
>> Are you getting any other errors now that you've fixed the
>> config?
> 
> It's running now that I did the cluster stop/start, but no: I
> wasn't getting any other errors. I did have a symlink resource
> "stopped" for no apparent reason and with no errors logged.
> 
> The cluster is a basic active-passive pair. The relevant part of
> the setup is:
> 
> drbd filesystem floating ip colocated with drbd filesystem +inf 
> order drbd filesystem then floating ip
> 
> ocf:heartbeat:symlink resource that does /etc/rsyncd.conf ->
> /drbd/etc/rsyncd.conf colocated with drbd filesystem +inf order
> drbd filesystem then the symlink
> 
> ocf:heartbeat:rsyncd resource that is colocated with the symlink 
> order symlink then rsyncd order floating ip then rsyncd
> 
> (Looking at this, maybe I should also colocate rsyncd with floating
> ip to avoid any confusion in pacemaker's little brain.)

Not strictly necessary, since rsync is colocated with symlink which is
colocated with filesystem, and ip is also colocated with filesystem.

But it is a good idea to model all logical dependencies, since you
don't know what changes you might make to the configuration in the
future. If you want rsyncd to always be with the floating ip, then by
all means add a colocation constraint.

> But this is not specific to rsyncd: the behaviour was exactly the
> same when a co-worker made a typo in apache config (which is
> another resource on the same cluster). The only way to restart
> apache was to "pcs cluster stop ; pcs cluster start" and that
> randomly killed ssh connections to the nodes' "proper" IPs.

That is definitely not a properly functioning cluster. Something is
going wrong at some level.

When you say that "pcs resource cleanup" didn't fix the issue, what
happened after that? Did "pcs status" still show an error for the
resource? If so, there was an additional failure.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-13 Thread Ken Gaillot

On 05/08/2016 11:19 PM, Nikhil Utane wrote:
> Moving these questions to a different thread.
> 
> Hi,
> 
> We have limited storage capacity in our system for different folders. 
> How can I configure to use a different folder for /var/lib/pacemaker?

./configure --localstatedir=/wherever (defaults to /var or ${prefix}/var)

That will change everything that normally is placed or looked for under
/var (/var/lib/pacemaker, /var/lib/heartbeat, /var/run, etc.).

Note that while ./configure lets you change the location of nearly
everything, /usr/lib/ocf/resource.d is an exception, because it is
specified in the OCF standard.

> 
> 
> Also, in /dev/shm I see that it created around 300+ files of around
> 250 MB.
> 
> For e.g.
> -rw-rw1 hacluste hacluste  8232 May  6 13:03
> qb-cib_rw-response-25035-25038-10-header
> -rw-rw1 hacluste hacluste540672 May  6 13:03
> qb-cib_rw-response-25035-25038-10-data
> -rw---1 hacluste hacluste  8232 May  6 13:03
> qb-cib_rw-response-25035-25036-12-header
> -rw---1 hacluste hacluste540672 May  6 13:03
> qb-cib_rw-response-25035-25036-12-data
> And many more..
> 
> We have limited space in /dev/shm and all these files are filling it
> up. Are these all needed? Any way to limit? Do we need to do any
> clean-up if pacemaker termination was not graceful? 
> 
> -Thanks
> Nikhil
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Resource failure-timeout does not reset when resource fails to connect to both nodes

2016-05-13 Thread Ken Gaillot

On 03/28/2016 11:44 AM, Sam Gardner wrote:
> I have a simple resource defined:
> 
> [root@ha-d1 ~]# pcs resource show dmz1
>  Resource: dmz1 (class=ocf provider=internal type=ip-address)
>   Attributes: address=172.16.10.192 monitor_link=true
>   Meta Attrs: migration-threshold=3 failure-timeout=30s
>   Operations: monitor interval=7s (dmz1-monitor-interval-7s)
> 
> This is a custom resource which provides an ethernet alias to one of the
> interfaces on our system.
> 
> I can unplug the cable on either node and failover occurs as expected,
> and 30s after re-plugging it I can repeat the exercise on the opposite
> node and failover will happen as expected.
> 
> However, if I unplug the cable from both nodes, the failcount goes up,
> and the 30s failure-timeout does not reset the failcounts, meaning that
> pacemaker never tries to start the failed resource again.

Apologies for the late response, but:

Time-based actions in Pacemaker, including failure-timeout, are not
guaranteed to be checked more frequently than the value of the
cluster-recheck-interval cluster property, which defaults to 15 minutes.

If the cluster responds to an event (node joining/leaving, monitor
failure, etc.), it will check time-based actions at that point, but
otherwise it doesn't. So cluster-recheck-interval acts as a maximum time
between such checks.

Try lowering your cluster-recheck-interval. Personally, I would think
30s for a failure-timeout is awfully quick; it would lead to continuous
retries. And it would require setting cluster-recheck-interval to
something similar, which would add a lot of computational overhead to
the cluster.

I'm curious what values of cluster-recheck-interval and failure-timeout
people are commonly using "in the wild". On a small, underutilized
cluster, you could probably get away with setting them quite low, but on
larger clusters, I would expect it would be too much overhead.

> Full list of resources:
> 
>  Resource Group: network
>  inif   (off::internal:ip.sh):   Started ha-d1.dev.com
>  outif  (off::internal:ip.sh):   Started ha-d2.dev.com
>  dmz1   (off::internal:ip.sh):   Stopped
>  Master/Slave Set: DRBDMaster [DRBDSlave]
>  Masters: [ ha-d1.dev.com ]
>  Slaves: [ ha-d2.dev.com ]
>  Resource Group: filesystem
>  DRBDFS (ocf::heartbeat:Filesystem):Stopped
>  Resource Group: application
>  service_failover   (off::internal:service_failover):Stopped
> 
> Failcounts for dmz1
>  ha-d1.dev.com: 4
>  ha-d2.dev.com: 4
> 
> Is there any way to automatically recover from this scenario, other than
> setting an obnoxiously high migration-threshold? 
> 
> -- 
> 
> *Sam Gardner   *
> 
> Software Engineer
> 
> *Trustwave** *| SMART SECURITY ON DEMAND

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] unable to start fence_scsi

2016-05-16 Thread Ken Gaillot

On 05/14/2016 08:54 AM, Marco A. Carcano wrote:
> I hope to find here someone who can help me:
> 
> I have a 3 node cluster and I’m struggling to create a GFSv2 shared storage. 
> The  weird thing is that despite cluster seems OK, I’m not able to have the 
> fence_scsi stonith device managed, and this prevent CLVMD and GFSv2 to start.
> 
> I’m using CentOS 7.1, selinux and firewall disabled
> 
> I created the stonith device with the following command
> 
> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
> apache-up002.ring0 apache-up003.ring0 apache-up001.ring1 apache-up002.ring1 
> apache-up003.ring1” 
>  pcmk_reboot_action="off" 
> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
> provides="unfencing" —force
> 
> Notice that is a 3 node cluster with a redundant ring: hosts with .ring1 
> suffix are the same of the ones with .ring0 suffix, but with a different IP 
> address

pcmk_host_list only needs the names of the nodes as specified in the
Pacemaker configuration. It allows the cluster to answer the question,
"What device can I use to fence this particular node?"

Sometimes the fence device itself needs to identify the node by a
different name than the one used by Pacemaker. In that case, use
pcmk_host_map, which maps each cluster node name to a fence device node
name.

The one thing your command is missing is an "op monitor". I'm guessing
that's why it required "--force" (which shouldn't be necessary) and why
the cluster is treating it as unmanaged.

> /dev/mapper/36001405973e201b3fdb4a999175b942f is a multipath device for 
> /dev/sda and /dev/sdb
> 
> in log files everything seems right. However pcs status reports the following:
> 
> Cluster name: apache-0
> Last updated: Sat May 14 15:35:56 2016Last change: Sat May 14 
> 15:18:17 2016 by root via cibadmin on apache-up001.ring0
> Stack: corosync
> Current DC: apache-up003.ring0 (version 1.1.13-10.el7_2.2-44eb2dd) - 
> partition with quorum
> 3 nodes and 7 resources configured
> 
> Online: [ apache-up001.ring0 apache-up002.ring0 apache-up003.ring0 ]
> 
> Full list of resources:
> 
>  scsi (stonith:fence_scsi):   Stopped (unmanaged)
> 
> PCSD Status:
>   apache-up001.ring0: Online
>   apache-up002.ring0: Online
>   apache-up003.ring0: Online
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> However SCSI fencing and persistent id reservation seems right:
> 
> sg_persist -n -i -r -d /dev/mapper/36001405973e201b3fdb4a999175b942f 
>   PR generation=0x37, Reservation follows:
> Key=0x9b0e
> scope: LU_SCOPE,  type: Write Exclusive, registrants only
> 
> sg_persist -n -i -k -d /dev/mapper/36001405973e201b3fdb4a999175b942f 
>   PR generation=0x37, 6 registered reservation keys follow:
> 0x9b0e
> 0x9b0e
> 0x9b0e0001
> 0x9b0e0001
> 0x9b0e0002
> 0x9b0e0002
> 
> if I manually fence the second node:
> 
> pcs stonith fence apache-up002.ring0
> 
> I got as expected
> 
> sg_persist -n -i -k -d /dev/mapper/36001405973e201b3fdb4a999175b942f 
>   PR generation=0x38, 4 registered reservation keys follow:
> 0x9b0e
> 0x9b0e
> 0x9b0e0002
> 0x9b0e0002
> 
> Cluster configuration seems OK
> 
> crm_verify -L -V reports no errors neither warnings, 
> 
> corosync-cfgtool -s
> 
> Printing ring status.
> Local node ID 1
> RING ID 0
>   id  = 192.168.15.9
>   status  = ring 0 active with no faults
> RING ID 1
>   id  = 192.168.16.9
>   status  = ring 1 active with no faults
> 
> corosync-quorumtool -s
> 
> Quorum information
> --
> Date: Sat May 14 15:42:38 2016
> Quorum provider:  corosync_votequorum
> Nodes:3
> Node ID:  1
> Ring ID:  820
> Quorate:  Yes
> 
> Votequorum information
> --
> Expected votes:   3
> Highest expected: 3
> Total votes:  3
> Quorum:   2  
> Flags:Quorate 
> 
> Membership information
> --
> Nodeid  Votes Name
>  3  1 apache-up003.ring0
>  2  1 apache-up002.ring0
>  1  1 apache-up001.ring0 (local)
> 
> 
> corosync-cmapctl  | grep members
> runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.15.9) r(1) 
> ip(192.168.16.9) 
> runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.1.status (str) = joined
> runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.15.8) r(1) 
> ip(192.168.16.8) 
> runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.2.status (str) = joined
> runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.15.7) r(1) 
> ip(192.168.16.7) 
> runtime.totem.pg.mrp.srp.members.3.join_co

Re: [ClusterLabs] Help with service banning on a node

2016-05-16 Thread Ken Gaillot

On 05/16/2016 01:36 AM, Leon Botes wrote:
> Hi List.
> 
> I have the following configuration:
> 
> pcs -f ha_config property set symmetric-cluster="true"
> pcs -f ha_config property set no-quorum-policy="stop"
> pcs -f ha_config property set stonith-enabled="false"
> pcs -f ha_config resource defaults resource-stickiness="200"
> 
> pcs -f ha_config resource create drbd ocf:linbit:drbd drbd_resource=r0
> op monitor interval=60s
> pcs -f ha_config resource master drbd master-max=1 master-node-max=1
> clone-max=2 clone-node-max=1 notify=true
> pcs -f ha_config resource create vip-blue ocf:heartbeat:IPaddr2
> ip=192.168.101.100 cidr_netmask=32 nic=blue op monitor interval=20s
> pcs -f ha_config resource create vip-green ocf:heartbeat:IPaddr2
> ip=192.168.102.100 cidr_netmask=32 nic=blue op monitor interval=20s
> 
> pcs -f ha_config constraint colocation add vip-blue drbd-master INFINITY
> with-rsc-role=Master
> pcs -f ha_config constraint colocation add vip-green drbd-master
> INFINITY with-rsc-role=Master
> 
> pcs -f ha_config constraint location drbd-master prefers stor-san1=50
> pcs -f ha_config constraint location drbd-master avoids stor-node1=INFINITY
> pcs -f ha_config constraint location vip-blue prefers stor-san1=50
> pcs -f ha_config constraint location vip-blue avoids stor-node1=INFINITY
> pcs -f ha_config constraint location vip-green prefers stor-san1=50
> pcs -f ha_config constraint location vip-green avoids stor-node1=INFINITY
> 
> pcs -f ha_config constraint order promote drbd-master then start vip-blue
> pcs -f ha_config constraint order start vip-blue then start vip-green
> 
> Which results in:
> 
> [root@san1 ~]# pcs status
> Cluster name: ha_cluster
> Last updated: Mon May 16 08:21:28 2016  Last change: Mon May 16
> 08:21:25 2016 by root via crm_resource on iscsiA-san1
> Stack: corosync
> Current DC: iscsiA-node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition
> with quorum
> 3 nodes and 4 resources configured
> 
> Online: [ iscsiA-node1 iscsiA-san1 iscsiA-san2 ]
> 
> Full list of resources:
> 
>  Master/Slave Set: drbd-master [drbd]
>  drbd   (ocf::linbit:drbd): FAILED iscsiA-node1 (unmanaged)
>  Masters: [ iscsiA-san1 ]
>  Stopped: [ iscsiA-san2 ]
>  vip-blue   (ocf::heartbeat:IPaddr2):   Started iscsiA-san1
>  vip-green  (ocf::heartbeat:IPaddr2):   Started iscsiA-san1
> 
> Failed Actions:
> * drbd_stop_0 on iscsiA-node1 'not installed' (5): call=18,
> status=complete, exitreason='none',
> last-rc-change='Mon May 16 08:20:16 2016', queued=0ms, exec=45ms
> 
> 
> PCSD Status:
>   iscsiA-san1: Online
>   iscsiA-san2: Online
>   iscsiA-node1: Online
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> 
> 
> Is there any way in the configuration to make the drbd sections
> completely be ignored on  iscsiA-node1  to avoide this:
>  drbd (ocf::linbit:drbd): FAILED iscsiA-node1 (unmanaged)
> and
> Failed Actions:
> * drbd_stop_0 on iscsiA-node1 'not installed' (5): call=18,
> status=complete, exitreason='none',
> last-rc-change='Mon May 16 08:20:16 2016', queued=0ms, exec=45ms
> 
> Tried the ban statements but that seesm to have the same result.

Yes, see "resource-discovery":

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_location_properties

> Also is there any better way to write the configuration so that the drbd
> starts first then the vip's and colocate together. Also ensure that they
> run on only san1 or san2. Tried grouping but that seems to fail with
> Master / Slave resourcess.

Your constraints are fine. I wouldn't add location preferences for the
VIPs; the mandatory colocation constraint with drbd-master takes care of
that better. But it doesn't really hurt.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 2

2016-05-16 Thread Ken Gaillot

The second release candidate for Pacemaker version 1.1.15 is now
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc2

The most interesting changes since 1.1.15-rc1 are:

* With the new "alerts" feature, the "tstamp_format" attribute has been
renamed to "timestamp-format" and properly defaults to "%H:%M:%S.%06N".

* A regression introduced in 1.1.15-rc1 has been fixed. After a cluster
partition, node attribute values might not be properly re-synchronized
among nodes.

* The SysInfo resource now automatically sets the #health_disk node
attribute back to "green" if free disk space recovers after becoming too
low.

* Other minor bug fixes.

Everyone is encouraged to download, compile and test the new release.
Your feedback is important and appreciated. I am aiming for one or two
more release candidates, with the final released in mid- to late June.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] start a resource

2016-05-17 Thread Ken Gaillot

On 05/16/2016 12:22 PM, Dimitri Maziuk wrote:
> On 05/13/2016 04:31 PM, Ken Gaillot wrote:
> 
>> That is definitely not a properly functioning cluster. Something
>> is going wrong at some level.
> 
> Yeah, well... how do I find out what/where?

What happens after "pcs resource cleanup"? "pcs status" reports the
time associated with each failure, so you can check whether you are
seeing the same failure or a new one.

The system log is usually the best starting point, as it will have
messages from pacemaker, corosync and the resource agents. You can
look around the time of the failure(s) to look for details or anything
unusual.

Pacemaker also has a detail log (by default, /var/log/pacemaker.log).
In general, this is more useful to developers than administrators, but
if the system log doesn't help, it can sometimes shed a little more light.

> One question: in corosync.conf I have nodelist { node { ring0_addr:
> node1_name nodeid: 1 } node { ring0_addr: node2_name nodeid: 2 } }
> 
> Could 'pcs cluster stop/start' reset the interface that resolves
> to nodeX_name? If so, that would answer why ssh connections get
> killed.

No, Pacemaker and pcs don't touch the interfaces (unless of course you
explicitly add a cluster resource to do so, which wouldn't work anyway
for the interface(s) that corosync itself needs to use).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1959 matches

Mail list logo