Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-22 Thread Andrei Borzenkov
23.07.2016 01:37, Nate Clark пишет:
> Hello,
> 
> I am running pacemaker 1.1.13 with corosync and think I may have
> encountered a start up timing issue on a two node cluster. I didn't
> notice anything in the changelog for 14 or 15 that looked similar to
> this or open bugs.
> 
> The rough out line of what happened:
> 
> Module 1 and 2 running
> Module 1 is DC
> Module 2 shuts down
> Module 1 updates node attributes used by resources
> Module 1 shuts down
> Module 2 starts up
> Module 2 votes itself as DC
> Module 1 starts up
> Module 2 sees module 1 in corosync and notices it has quorum
> Module 2 enters policy engine state.
> Module 2 policy engine decides to fence 1
> Module 2 then continues and starts resource on itself based upon the old state
> 
> For some reason the integration never occurred and module 2 starts to
> perform actions based on stale state.
> 
> Here is the full logs
> Jul 20 16:29:06.376805 module-2 crmd[21969]:   notice: Connecting to
> cluster infrastructure: corosync
> Jul 20 16:29:06.386853 module-2 crmd[21969]:   notice: Could not
> obtain a node name for corosync nodeid 2
> Jul 20 16:29:06.392795 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.403611 module-2 crmd[21969]:   notice: Quorum lost
> Jul 20 16:29:06.409237 module-2 stonith-ng[21965]:   notice: Watching
> for stonith topology changes
> Jul 20 16:29:06.409474 module-2 stonith-ng[21965]:   notice: Added
> 'watchdog' to the device list (1 active devices)
> Jul 20 16:29:06.413589 module-2 stonith-ng[21965]:   notice: Relying
> on watchdog integration for fencing
> Jul 20 16:29:06.416905 module-2 cib[21964]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.417044 module-2 crmd[21969]:   notice:
> pcmk_quorum_notification: Node module-2[2] - state is now member (was
> (null))
> Jul 20 16:29:06.421821 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.422121 module-2 crmd[21969]:   notice: Notifications disabled
> Jul 20 16:29:06.422149 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:06.422286 module-2 crmd[21969]:   notice: The local CRM
> is operational
> Jul 20 16:29:06.422312 module-2 crmd[21969]:   notice: State
> transition S_STARTING -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_started ]
> Jul 20 16:29:07.416871 module-2 stonith-ng[21965]:   notice: Added
> 'fence_sbd' to the device list (2 active devices)
> Jul 20 16:29:08.418567 module-2 stonith-ng[21965]:   notice: Added
> 'ipmi-1' to the device list (3 active devices)
> Jul 20 16:29:27.423578 module-2 crmd[21969]:  warning: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jul 20 16:29:27.424298 module-2 crmd[21969]:   notice: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jul 20 16:29:27.460834 module-2 crmd[21969]:  warning: FSA: Input
> I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jul 20 16:29:27.463794 module-2 crmd[21969]:   notice: Notifications disabled
> Jul 20 16:29:27.463824 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:27.473285 module-2 attrd[21967]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:27.498464 module-2 pengine[21968]:   notice: Relying on
> watchdog integration for fencing
> Jul 20 16:29:27.498536 module-2 pengine[21968]:   notice: We do not
> have quorum - fencing and resource management disabled
> Jul 20 16:29:27.502272 module-2 pengine[21968]:  warning: Node
> module-1 is unclean!
> Jul 20 16:29:27.502287 module-2 pengine[21968]:   notice: Cannot fence
> unclean nodes until quorum is attained (or no-quorum-policy is set to
> ignore)
> Jul 20 16:29:27.503521 module-2 pengine[21968]:   notice: Start
> fence_sbd(module-2 - blocked)
> Jul 20 16:29:27.503539 module-2 pengine[21968]:   notice: Start
> ipmi-1(module-2 - blocked)
> Jul 20 16:29:27.503559 module-2 pengine[21968]:   notice: Start
> SlaveIP(module-2 - blocked)
> Jul 20 16:29:27.503582 module-2 pengine[21968]:   notice: Start
> postgres:0(module-2 - blocked)
> Jul 20 16:29:27.503597 module-2 pengine[21968]:   notice: Start
> ethmonitor:0(module-2 - blocked)
> Jul 20 16:29:27.503618 module-2 pengine[21968]:   notice: Start
> tomcat-instance:0(module-2 - blocked)
> Jul 20 16:29:27.503629 module-2 pengine[21968]:   notice: Start
> ClusterMonitor:0(module-2 - blocked)
> Jul 20 16:29:27.506945 module-2 pengine[21968]:  warning: Calculated
> Transition 0: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Jul 20 16:29:27.507976 module-2 crmd[21969]:   notice: Initiating
> action 4: monitor fence_sbd_monitor_0 on module-2 (local)
> Jul 20 16:29:27.509282 module-2 crmd[21969]:   

Re: [ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread Andrei Borzenkov
23.07.2016 00:07, TEG AMJG пишет:
...

>  Master: kamailioetcclone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
>   Resource: kamailioetc (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=kamailioetc
>Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
>promote interval=0s timeout=90
> (kamailioetc-promote-interval-0s)
>demote interval=0s timeout=90
> (kamailioetc-demote-interval-0s)
>stop interval=0s timeout=100 (kamailioetc-stop-interval-0s)
>monitor interval=10s (kamailioetc-monitor-interval-10s)
...

> 
> The problem is that when i have only one node online in corosync and start
> the other node to rejoin the cluster, all my resources restart and
> sometimes even migrates to the other node 

Try adding interleave=true to your clone resource.

> (starting by changing in
> promotion who is master and who is slave) even though the first node is
> healthy and i use resource-stickiness=200 as a default in all resources
> inside the cluster.
> 
> I do believe it has something to do with the constraint of promotion that
> happens with DRBD.
> 
> Thank you very much in advance.
> 
> Regards.
> 
> Alejandro
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread TEG AMJG
Hi

I am having a problem with a very simple Active/Passive cluster using DRBD.

This is my configuration:

Cluster Name: kamcluster
Corosync Nodes:
 kam1vs3 kam2vs3
Pacemaker Nodes:
 kam1vs3 kam2vs3

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.206 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  monitor interval=10s (ClusterIP-monitor-interval-10s)
 Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.207 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s)
  monitor interval=10s (ClusterIP2-monitor-interval-10s)
 Resource: rtpproxycluster (class=systemd type=rtpproxy)
  Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s)
  stop interval=0s on-fail=fence
(rtpproxycluster-stop-interval-0s)
 Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4
  Operations: start interval=0s timeout=60 (kamailioetcfs-start-interval-0s)
  monitor interval=10s on-fail=fence
(kamailioetcfs-monitor-interval-10s)
  stop interval=0s on-fail=fence
(kamailioetcfs-stop-interval-0s)
 Clone: fence_kam2_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam2_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3
   Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s)
 Master: kamailioetcclone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
  Resource: kamailioetc (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=kamailioetc
   Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
   promote interval=0s timeout=90
(kamailioetc-promote-interval-0s)
   demote interval=0s timeout=90
(kamailioetc-demote-interval-0s)
   stop interval=0s timeout=100 (kamailioetc-stop-interval-0s)
   monitor interval=10s (kamailioetc-monitor-interval-10s)
 Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio)
  Attributes: listen_address=10.0.1.206 conffile=/etc/kamailio/kamailio.cfg
pidfile=/var/run/kamailio.pid monitoring_ip=10.0.1.206
monitoring_ip2=10.0.1.207 port=5060 proto=udp
kamctlrc=/etc/kamailio/kamctlrc
  Operations: start interval=0s timeout=60
(kamailiocluster-start-interval-0s)
  stop interval=0s on-fail=fence
(kamailiocluster-stop-interval-0s)
  monitor interval=5s (kamailiocluster-monitor-interval-5s)
 Clone: fence_kam1_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam1_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3
   Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: kamailiocluster
Enabled on: kam1vs3 (score:INFINITY) (role: Started)
(id:cli-prefer-kamailiocluster)
Ordering Constraints:
  start ClusterIP then start ClusterIP2 (kind:Mandatory)
(id:order-ClusterIP-ClusterIP2-mandatory)
  start ClusterIP2 then start rtpproxycluster (kind:Mandatory)
(id:order-ClusterIP2-rtpproxycluster-mandatory)
  start fence_kam2_xvm-clone then promote kamailioetcclone (kind:Mandatory)
(id:order-fence_kam2_xvm-clone-kamailioetcclone-mandatory)
  promote kamailioetcclone then start kamailioetcfs (kind:Mandatory)
(id:order-kamailioetcclone-kamailioetcfs-mandatory)
  start kamailioetcfs then start ClusterIP (kind:Mandatory)
(id:order-kamailioetcfs-ClusterIP-mandatory)
  start rtpproxycluster then start kamailiocluster (kind:Mandatory)
(id:order-rtpproxycluster-kamailiocluster-mandatory)
  start fence_kam1_xvm-clone then start fence_kam2_xvm-clone
(kind:Mandatory)
(id:order-fence_kam1_xvm-clone-fence_kam2_xvm-clone-mandatory)
Colocation Constraints:
  rtpproxycluster with ClusterIP2 (score:INFINITY)
(id:colocation-rtpproxycluster-ClusterIP2-INFINITY)
  ClusterIP2 with ClusterIP (score:INFINITY)
(id:colocation-ClusterIP2-ClusterIP-INFINITY)
  ClusterIP with kamailioetcfs (score:INFINITY)
(id:colocation-ClusterIP-kamailioetcfs-INFINITY)
  kamailioetcfs with kamailioetcclone (score:INFINITY)
(with-rsc-role:Master)
(id:colocation-kamailioetcfs-kamailioetcclone-INFINITY)
  kamailioetcclone with fence_kam2_xvm-clone (score:INFINITY)
(id:colocation-kamailioetcclone-fence_kam2_xvm-clone-INFINITY)
  kamailiocluster with rtpproxycluster (score:INFINITY)
(id:colocation-kamailiocluster-rtpproxycluster-INFINITY)
  fence_kam2_xvm-clone with fence_kam1_xvm-clone (score:INFINITY)
(id:colocation-fence_kam2_xvm-clone-fence_kam1_xvm-clone-INFINITY)

Resources Defaults:
 

Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
Great! Thanks for the pointer! Any ideas on the other stuff I was asking about 
(i.e. how to use any other backstore other than block with Pacemaker)?

--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/22/16, 12:24 PM, "Andrei Borzenkov"  wrote:

22.07.2016 18:29, Jason A Ramsey пишет:
> From the command line parameters for the pcs resource create or is it
> something internal (not exposed to the user)? If the former, what
> parameter?
> 


http://www.linux-ha.org/doc/dev-guides/_literal_ocf_resource_instance_literal.html

> --
> 
> [ jR ] @: ja...@eramsey.org
> 
> there is no path to greatness; greatness is the path
> 
> On 7/22/16, 11:08 AM, "Andrei Borzenkov" 
> wrote:
> 
> 22.07.2016 17:43, Jason A Ramsey пишет:
>> Additionally (and this is just a failing on my part), I’m unclear
>> as to where the resource agent is fed the value for 
>> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters
>> one is permitted to supply with “pcs resource create…”
>> 
> 
> It is supplied automatically by pacemaker.
> 
> 



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
From the command line parameters for the pcs resource create or is it something 
internal (not exposed to the user)? If the former, what parameter?

--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/22/16, 11:08 AM, "Andrei Borzenkov"  wrote:

22.07.2016 17:43, Jason A Ramsey пишет:
> Additionally (and this is just a failing on my part), I’m
> unclear as to where the resource agent is fed the value for
> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one
> is permitted to supply with “pcs resource create…”
>

It is supplied automatically by pacemaker.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-22 Thread Andrei Borzenkov
22.07.2016 09:52, Ulrich Windl пишет:
> That could be. Should there be a node list to configure, or can't the agent
> find out itself (for SBD)?
> 

It apparently does it already


gethosts)
echo `sbd -d $sbd_device list | cut -f2 | sort | uniq`
exit 0


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Andrei Borzenkov
22.07.2016 17:43, Jason A Ramsey пишет:
> Additionally (and this is just a failing on my part), I’m
> unclear as to where the resource agent is fed the value for
> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one
> is permitted to supply with “pcs resource create…”
>

It is supplied automatically by pacemaker.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
I’m struggling to understand how to fully exploit the capabilities of targetcli 
using the Pacemaker resource agent for iSCSILogicalUnit. From this block of 
code:

lio-t)
# For lio, we first have to create a target device, then
# add it to the Target Portal Group as an LU.
ocf_run targetcli /backstores/block create 
name=${OCF_RESOURCE_INSTANCE} dev=${OCF_RESKEY_path} || exit $OCF_ERR_GENERIC
if [ -n "${OCF_RESKEY_scsi_sn}" ]; then
echo ${OCF_RESKEY_scsi_sn} > 
/sys/kernel/config/target/core/iblock_${OCF_RESKEY_lio_iblock}/${OCF_RESOURCE_INSTANCE}/wwn/vpd_unit_serial
fi
ocf_run targetcli /iscsi/${OCF_RESKEY_target_iqn}/tpg1/luns 
create /backstores/block/${OCF_RESOURCE_INSTANCE} ${OCF_RESKEY_lun} || exit 
$OCF_ERR_GENERIC

if [ -n "${OCF_RESKEY_allowed_initiators}" ]; then
for initiator in ${OCF_RESKEY_allowed_initiators}; do
ocf_run targetcli 
/iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls create ${initiator} 
add_mapped_luns=False || exit $OCF_ERR_GENERIC
ocf_run targetcli 
/iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls/${initiator} create ${OCF_RESKEY_lun} 
${OCF_RESKEY_lun} || exit $OCF_ERR_GENERIC
done
fi
;;

it looks like I’m only permitted to create a block backstore. Critically 
missing, in this scenario, is the ability to create fileio backstores on things 
like mounted filesystems abstracted by things like drbd. Additionally (and this 
is just a failing on my part), I’m unclear as to where the resource agent is 
fed the value for “${OCF_RESOURCE_INSTANCE}” given the limited number of 
parameters one is permitted to supply with “pcs resource create…”

Can anyone provide any insight please? Thank you in advance!


--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-22 Thread Eric Ren

Hello,

On 07/21/2016 09:31 PM, Da Shi Cao wrote:

I've built the dlm_tool suite using the source from 
https://git.fedorahosted.org/cgit/dlm.git/log/.  The resource uisng 
ocf:pacemaker:controld will always fail to start because of timeout, even if 
start timeout is set to 120s! But if dlm_controld is first started outside the 
cluster management,  then the resource will show up and stay well!
1. Why do you suppose it's because of timeout? Any logs when DLM RA 
failed to start?
"ocf:pacemaker:controld" is bash script 
(/usr/lib/ocf/resource.d/pacemaker/controld).
If taking a look at this script, you'll find it suppose that 
dlm_controld is installed in a certain place (/usr/sbin/dlm_controld for

openSUSE). So, how would dlm RA find your dlm deamon?

Another question is what's the difference of dlm_controld and gfs_controld? 
Must they both be present if a cluster gfs file system is mounted?
2. dlm_controld is a deamon in userland for dlm kernel module, while 
gfs2_controld is for gfs2, i think. However, on the recent release 
(redhat and suse, AFAIK),
gfs_controld is no longer needed. But I don't know much history about 
this change. Hope someone could elaborate on this a bit more;-)


Cheers,
Eric



Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Wednesday, July 20, 2016 4:47:31 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

Thank you all for the information about dlm_controld. I will make a try using 
https://git.fedorahosted.org/cgit/dlm.git/log/ .

Dashi Cao


From: Jan Pokorný 
Sent: Monday, July 18, 2016 8:47:50 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld


On 18/07/16 07:59, Da Shi Cao wrote:

dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)


I have built a cluster purely with
pacemaker+corosync+fence_sanlock. But if agent
ocf:pacemaker:controld is desired, dlm_controld must exist! I can
only find it in cman.
Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:

There should be a package called 'dlm' that has a dlm_controld suitable
for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:

DLM upstream hosted here:
   https://git.fedorahosted.org/cgit/dlm.git/log/

The name of DLM on openSUSE is libdlm.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-22 Thread Jan Pokorný
On 21/07/16 21:51 +0200, Jan Pokorný wrote:
> Yes, it's counterintuitive to have this asymmetry and it could be
> made to work with some added effort at the side of pcs with
> the original, disapproved, sequence as-is, but that's perhaps
> sound of the future per the referenced pcs bug.
> So take this idiom as a rule of thumb not to be questioned
> any time soon.

...at least until something better is around:
https://bugzilla.redhat.com/1359057 (open for comments)

-- 
Jan (Poki)


pgprGseIVSXop.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-22 Thread Ulrich Windl
>>> Andrei Borzenkov  schrieb am 21.07.2016 um 18:39 in
Nachricht :
> 21.07.2016 09:49, Ulrich Windl пишет:
> Ken Gaillot  schrieb am 19.07.2016 um 16:17 in
Nachricht
>> :
>> 
>> [...]
>>> You're right -- if not told otherwise, Pacemaker will query the device
>>> for the target list. In this case, the output of "stonith_admin -l"
>> 
>> In sles11 SP4 I see the following (surprising) output:
>> "stonith_admin -l" shows the usage message
> 
> That's correct.
> 
>> "stonith_admin -l any" shows the configured devices, independently
>> whether the given name is part of the cluster or no. Even if that
>> host does not exist at all the same list is displayed:
>>  prm_stonith_sbd:0
>>  prm_stonith_sbd
>> 
>> Is that the way it's meant to be?
>> 
> 
> Well, SBD can in principle fence any node, so yes, I'd say it is. In my

I'd like to object: Don't you have to reserve a SBD slot for every host, and
despite of that if a host doesn't mount the shared storage SBD cannot fence it
;-) Saying "SBD can fence any node" is equivalent to say any fence agent can
fence any node. So why mess with nodes then?

> case (stonith:external/ipmi) it returns correct information. I am a bit
> surprised that it also does it for non-existing node, but as far as I
> understand if agent returns nothing host is not even checked and it is
> assumed agent can fence anything.

That could be. Should there be a node list to configure, or can't the agent
find out itself (for SBD)?

Regards,
Ulrich

> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-22 Thread Da Shi Cao

The manual "Pacemaker 1.1 Clusters from Scratch" gives the false impression 
that gfs2 relies only on dlm, but I cannot make it work without gfs_controld. 
Again this little daemon is heavily coupled with cman. I think it is quite hard 
to use gfs2 in a cluster build only using "pacemaker+corosync"! Am I wrong?

Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Thursday, July 21, 2016 9:31:51 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

I've built the dlm_tool suite using the source from 
https://git.fedorahosted.org/cgit/dlm.git/log/.  The resource uisng 
ocf:pacemaker:controld will always fail to start because of timeout, even if 
start timeout is set to 120s! But if dlm_controld is first started outside the 
cluster management,  then the resource will show up and stay well!

Another question is what's the difference of dlm_controld and gfs_controld? 
Must they both be present if a cluster gfs file system is mounted?

Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Wednesday, July 20, 2016 4:47:31 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

Thank you all for the information about dlm_controld. I will make a try using 
https://git.fedorahosted.org/cgit/dlm.git/log/ .

Dashi Cao


From: Jan Pokorný 
Sent: Monday, July 18, 2016 8:47:50 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

> On 18/07/16 07:59, Da Shi Cao wrote:
>> dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)

>> I have built a cluster purely with
>> pacemaker+corosync+fence_sanlock. But if agent
>> ocf:pacemaker:controld is desired, dlm_controld must exist! I can
>> only find it in cman.
>> Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:
> There should be a package called 'dlm' that has a dlm_controld suitable
> for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:
> DLM upstream hosted here:
>   https://git.fedorahosted.org/cgit/dlm.git/log/
>
> The name of DLM on openSUSE is libdlm.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org