Re: [ClusterLabs] Previous DC fenced prior to integration
23.07.2016 01:37, Nate Clark пишет: > Hello, > > I am running pacemaker 1.1.13 with corosync and think I may have > encountered a start up timing issue on a two node cluster. I didn't > notice anything in the changelog for 14 or 15 that looked similar to > this or open bugs. > > The rough out line of what happened: > > Module 1 and 2 running > Module 1 is DC > Module 2 shuts down > Module 1 updates node attributes used by resources > Module 1 shuts down > Module 2 starts up > Module 2 votes itself as DC > Module 1 starts up > Module 2 sees module 1 in corosync and notices it has quorum > Module 2 enters policy engine state. > Module 2 policy engine decides to fence 1 > Module 2 then continues and starts resource on itself based upon the old state > > For some reason the integration never occurred and module 2 starts to > perform actions based on stale state. > > Here is the full logs > Jul 20 16:29:06.376805 module-2 crmd[21969]: notice: Connecting to > cluster infrastructure: corosync > Jul 20 16:29:06.386853 module-2 crmd[21969]: notice: Could not > obtain a node name for corosync nodeid 2 > Jul 20 16:29:06.392795 module-2 crmd[21969]: notice: Defaulting to > uname -n for the local corosync node name > Jul 20 16:29:06.403611 module-2 crmd[21969]: notice: Quorum lost > Jul 20 16:29:06.409237 module-2 stonith-ng[21965]: notice: Watching > for stonith topology changes > Jul 20 16:29:06.409474 module-2 stonith-ng[21965]: notice: Added > 'watchdog' to the device list (1 active devices) > Jul 20 16:29:06.413589 module-2 stonith-ng[21965]: notice: Relying > on watchdog integration for fencing > Jul 20 16:29:06.416905 module-2 cib[21964]: notice: Defaulting to > uname -n for the local corosync node name > Jul 20 16:29:06.417044 module-2 crmd[21969]: notice: > pcmk_quorum_notification: Node module-2[2] - state is now member (was > (null)) > Jul 20 16:29:06.421821 module-2 crmd[21969]: notice: Defaulting to > uname -n for the local corosync node name > Jul 20 16:29:06.422121 module-2 crmd[21969]: notice: Notifications disabled > Jul 20 16:29:06.422149 module-2 crmd[21969]: notice: Watchdog > enabled but stonith-watchdog-timeout is disabled > Jul 20 16:29:06.422286 module-2 crmd[21969]: notice: The local CRM > is operational > Jul 20 16:29:06.422312 module-2 crmd[21969]: notice: State > transition S_STARTING -> S_PENDING [ input=I_PENDING > cause=C_FSA_INTERNAL origin=do_started ] > Jul 20 16:29:07.416871 module-2 stonith-ng[21965]: notice: Added > 'fence_sbd' to the device list (2 active devices) > Jul 20 16:29:08.418567 module-2 stonith-ng[21965]: notice: Added > 'ipmi-1' to the device list (3 active devices) > Jul 20 16:29:27.423578 module-2 crmd[21969]: warning: FSA: Input > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > Jul 20 16:29:27.424298 module-2 crmd[21969]: notice: State > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > cause=C_TIMER_POPPED origin=election_timeout_popped ] > Jul 20 16:29:27.460834 module-2 crmd[21969]: warning: FSA: Input > I_ELECTION_DC from do_election_check() received in state S_INTEGRATION > Jul 20 16:29:27.463794 module-2 crmd[21969]: notice: Notifications disabled > Jul 20 16:29:27.463824 module-2 crmd[21969]: notice: Watchdog > enabled but stonith-watchdog-timeout is disabled > Jul 20 16:29:27.473285 module-2 attrd[21967]: notice: Defaulting to > uname -n for the local corosync node name > Jul 20 16:29:27.498464 module-2 pengine[21968]: notice: Relying on > watchdog integration for fencing > Jul 20 16:29:27.498536 module-2 pengine[21968]: notice: We do not > have quorum - fencing and resource management disabled > Jul 20 16:29:27.502272 module-2 pengine[21968]: warning: Node > module-1 is unclean! > Jul 20 16:29:27.502287 module-2 pengine[21968]: notice: Cannot fence > unclean nodes until quorum is attained (or no-quorum-policy is set to > ignore) > Jul 20 16:29:27.503521 module-2 pengine[21968]: notice: Start > fence_sbd(module-2 - blocked) > Jul 20 16:29:27.503539 module-2 pengine[21968]: notice: Start > ipmi-1(module-2 - blocked) > Jul 20 16:29:27.503559 module-2 pengine[21968]: notice: Start > SlaveIP(module-2 - blocked) > Jul 20 16:29:27.503582 module-2 pengine[21968]: notice: Start > postgres:0(module-2 - blocked) > Jul 20 16:29:27.503597 module-2 pengine[21968]: notice: Start > ethmonitor:0(module-2 - blocked) > Jul 20 16:29:27.503618 module-2 pengine[21968]: notice: Start > tomcat-instance:0(module-2 - blocked) > Jul 20 16:29:27.503629 module-2 pengine[21968]: notice: Start > ClusterMonitor:0(module-2 - blocked) > Jul 20 16:29:27.506945 module-2 pengine[21968]: warning: Calculated > Transition 0: /var/lib/pacemaker/pengine/pe-warn-0.bz2 > Jul 20 16:29:27.507976 module-2 crmd[21969]: notice: Initiating > action 4: monitor fence_sbd_monitor_0 on module-2 (local) > Jul 20 16:29:27.509282 module-2 crmd[21969]:
Re: [ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues
23.07.2016 00:07, TEG AMJG пишет: ... > Master: kamailioetcclone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: kamailioetc (class=ocf provider=linbit type=drbd) >Attributes: drbd_resource=kamailioetc >Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s) >promote interval=0s timeout=90 > (kamailioetc-promote-interval-0s) >demote interval=0s timeout=90 > (kamailioetc-demote-interval-0s) >stop interval=0s timeout=100 (kamailioetc-stop-interval-0s) >monitor interval=10s (kamailioetc-monitor-interval-10s) ... > > The problem is that when i have only one node online in corosync and start > the other node to rejoin the cluster, all my resources restart and > sometimes even migrates to the other node Try adding interleave=true to your clone resource. > (starting by changing in > promotion who is master and who is slave) even though the first node is > healthy and i use resource-stickiness=200 as a default in all resources > inside the cluster. > > I do believe it has something to do with the constraint of promotion that > happens with DRBD. > > Thank you very much in advance. > > Regards. > > Alejandro > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues
Hi I am having a problem with a very simple Active/Passive cluster using DRBD. This is my configuration: Cluster Name: kamcluster Corosync Nodes: kam1vs3 kam2vs3 Pacemaker Nodes: kam1vs3 kam2vs3 Resources: Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.0.1.206 cidr_netmask=32 Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s) stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) monitor interval=10s (ClusterIP-monitor-interval-10s) Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.0.1.207 cidr_netmask=32 Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s) stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s) monitor interval=10s (ClusterIP2-monitor-interval-10s) Resource: rtpproxycluster (class=systemd type=rtpproxy) Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s) stop interval=0s on-fail=fence (rtpproxycluster-stop-interval-0s) Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4 Operations: start interval=0s timeout=60 (kamailioetcfs-start-interval-0s) monitor interval=10s on-fail=fence (kamailioetcfs-monitor-interval-10s) stop interval=0s on-fail=fence (kamailioetcfs-stop-interval-0s) Clone: fence_kam2_xvm-clone Meta Attrs: interleave=true clone-max=2 clone-node-max=1 Resource: fence_kam2_xvm (class=stonith type=fence_xvm) Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3 Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s) Master: kamailioetcclone Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true Resource: kamailioetc (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=kamailioetc Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s) promote interval=0s timeout=90 (kamailioetc-promote-interval-0s) demote interval=0s timeout=90 (kamailioetc-demote-interval-0s) stop interval=0s timeout=100 (kamailioetc-stop-interval-0s) monitor interval=10s (kamailioetc-monitor-interval-10s) Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio) Attributes: listen_address=10.0.1.206 conffile=/etc/kamailio/kamailio.cfg pidfile=/var/run/kamailio.pid monitoring_ip=10.0.1.206 monitoring_ip2=10.0.1.207 port=5060 proto=udp kamctlrc=/etc/kamailio/kamctlrc Operations: start interval=0s timeout=60 (kamailiocluster-start-interval-0s) stop interval=0s on-fail=fence (kamailiocluster-stop-interval-0s) monitor interval=5s (kamailiocluster-monitor-interval-5s) Clone: fence_kam1_xvm-clone Meta Attrs: interleave=true clone-max=2 clone-node-max=1 Resource: fence_kam1_xvm (class=stonith type=fence_xvm) Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3 Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s) Stonith Devices: Fencing Levels: Location Constraints: Resource: kamailiocluster Enabled on: kam1vs3 (score:INFINITY) (role: Started) (id:cli-prefer-kamailiocluster) Ordering Constraints: start ClusterIP then start ClusterIP2 (kind:Mandatory) (id:order-ClusterIP-ClusterIP2-mandatory) start ClusterIP2 then start rtpproxycluster (kind:Mandatory) (id:order-ClusterIP2-rtpproxycluster-mandatory) start fence_kam2_xvm-clone then promote kamailioetcclone (kind:Mandatory) (id:order-fence_kam2_xvm-clone-kamailioetcclone-mandatory) promote kamailioetcclone then start kamailioetcfs (kind:Mandatory) (id:order-kamailioetcclone-kamailioetcfs-mandatory) start kamailioetcfs then start ClusterIP (kind:Mandatory) (id:order-kamailioetcfs-ClusterIP-mandatory) start rtpproxycluster then start kamailiocluster (kind:Mandatory) (id:order-rtpproxycluster-kamailiocluster-mandatory) start fence_kam1_xvm-clone then start fence_kam2_xvm-clone (kind:Mandatory) (id:order-fence_kam1_xvm-clone-fence_kam2_xvm-clone-mandatory) Colocation Constraints: rtpproxycluster with ClusterIP2 (score:INFINITY) (id:colocation-rtpproxycluster-ClusterIP2-INFINITY) ClusterIP2 with ClusterIP (score:INFINITY) (id:colocation-ClusterIP2-ClusterIP-INFINITY) ClusterIP with kamailioetcfs (score:INFINITY) (id:colocation-ClusterIP-kamailioetcfs-INFINITY) kamailioetcfs with kamailioetcclone (score:INFINITY) (with-rsc-role:Master) (id:colocation-kamailioetcfs-kamailioetcclone-INFINITY) kamailioetcclone with fence_kam2_xvm-clone (score:INFINITY) (id:colocation-kamailioetcclone-fence_kam2_xvm-clone-INFINITY) kamailiocluster with rtpproxycluster (score:INFINITY) (id:colocation-kamailiocluster-rtpproxycluster-INFINITY) fence_kam2_xvm-clone with fence_kam1_xvm-clone (score:INFINITY) (id:colocation-fence_kam2_xvm-clone-fence_kam1_xvm-clone-INFINITY) Resources Defaults:
Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit
Great! Thanks for the pointer! Any ideas on the other stuff I was asking about (i.e. how to use any other backstore other than block with Pacemaker)? -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path On 7/22/16, 12:24 PM, "Andrei Borzenkov"wrote: 22.07.2016 18:29, Jason A Ramsey пишет: > From the command line parameters for the pcs resource create or is it > something internal (not exposed to the user)? If the former, what > parameter? > http://www.linux-ha.org/doc/dev-guides/_literal_ocf_resource_instance_literal.html > -- > > [ jR ] @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > On 7/22/16, 11:08 AM, "Andrei Borzenkov" > wrote: > > 22.07.2016 17:43, Jason A Ramsey пишет: >> Additionally (and this is just a failing on my part), I’m unclear >> as to where the resource agent is fed the value for >> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters >> one is permitted to supply with “pcs resource create…” >> > > It is supplied automatically by pacemaker. > > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit
From the command line parameters for the pcs resource create or is it something internal (not exposed to the user)? If the former, what parameter? -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path On 7/22/16, 11:08 AM, "Andrei Borzenkov"wrote: 22.07.2016 17:43, Jason A Ramsey пишет: > Additionally (and this is just a failing on my part), I’m > unclear as to where the resource agent is fed the value for > “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one > is permitted to supply with “pcs resource create…” > It is supplied automatically by pacemaker. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device
22.07.2016 09:52, Ulrich Windl пишет: > That could be. Should there be a node list to configure, or can't the agent > find out itself (for SBD)? > It apparently does it already gethosts) echo `sbd -d $sbd_device list | cut -f2 | sort | uniq` exit 0 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit
22.07.2016 17:43, Jason A Ramsey пишет: > Additionally (and this is just a failing on my part), I’m > unclear as to where the resource agent is fed the value for > “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one > is permitted to supply with “pcs resource create…” > It is supplied automatically by pacemaker. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit
I’m struggling to understand how to fully exploit the capabilities of targetcli using the Pacemaker resource agent for iSCSILogicalUnit. From this block of code: lio-t) # For lio, we first have to create a target device, then # add it to the Target Portal Group as an LU. ocf_run targetcli /backstores/block create name=${OCF_RESOURCE_INSTANCE} dev=${OCF_RESKEY_path} || exit $OCF_ERR_GENERIC if [ -n "${OCF_RESKEY_scsi_sn}" ]; then echo ${OCF_RESKEY_scsi_sn} > /sys/kernel/config/target/core/iblock_${OCF_RESKEY_lio_iblock}/${OCF_RESOURCE_INSTANCE}/wwn/vpd_unit_serial fi ocf_run targetcli /iscsi/${OCF_RESKEY_target_iqn}/tpg1/luns create /backstores/block/${OCF_RESOURCE_INSTANCE} ${OCF_RESKEY_lun} || exit $OCF_ERR_GENERIC if [ -n "${OCF_RESKEY_allowed_initiators}" ]; then for initiator in ${OCF_RESKEY_allowed_initiators}; do ocf_run targetcli /iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls create ${initiator} add_mapped_luns=False || exit $OCF_ERR_GENERIC ocf_run targetcli /iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls/${initiator} create ${OCF_RESKEY_lun} ${OCF_RESKEY_lun} || exit $OCF_ERR_GENERIC done fi ;; it looks like I’m only permitted to create a block backstore. Critically missing, in this scenario, is the ability to create fileio backstores on things like mounted filesystems abstracted by things like drbd. Additionally (and this is just a failing on my part), I’m unclear as to where the resource agent is fed the value for “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one is permitted to supply with “pcs resource create…” Can anyone provide any insight please? Thank you in advance! -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] agent ocf:pacemaker:controld
Hello, On 07/21/2016 09:31 PM, Da Shi Cao wrote: I've built the dlm_tool suite using the source from https://git.fedorahosted.org/cgit/dlm.git/log/. The resource uisng ocf:pacemaker:controld will always fail to start because of timeout, even if start timeout is set to 120s! But if dlm_controld is first started outside the cluster management, then the resource will show up and stay well! 1. Why do you suppose it's because of timeout? Any logs when DLM RA failed to start? "ocf:pacemaker:controld" is bash script (/usr/lib/ocf/resource.d/pacemaker/controld). If taking a look at this script, you'll find it suppose that dlm_controld is installed in a certain place (/usr/sbin/dlm_controld for openSUSE). So, how would dlm RA find your dlm deamon? Another question is what's the difference of dlm_controld and gfs_controld? Must they both be present if a cluster gfs file system is mounted? 2. dlm_controld is a deamon in userland for dlm kernel module, while gfs2_controld is for gfs2, i think. However, on the recent release (redhat and suse, AFAIK), gfs_controld is no longer needed. But I don't know much history about this change. Hope someone could elaborate on this a bit more;-) Cheers, Eric Thanks a lot! Dashi Cao From: Da Shi CaoSent: Wednesday, July 20, 2016 4:47:31 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld Thank you all for the information about dlm_controld. I will make a try using https://git.fedorahosted.org/cgit/dlm.git/log/ . Dashi Cao From: Jan Pokorný Sent: Monday, July 18, 2016 8:47:50 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld On 18/07/16 07:59, Da Shi Cao wrote: dlm_controld is very tightly coupled with cman. Wrong assumption. In fact, support for shipping ocf:pacemaker:controld has been explicitly restricted to cases when CMAN logic (specifically the respective handle-all initscript that is in turn, in that limited use case, triggered from pacemaker's proper one and, moreover, takes care of dlm_controld management on its own so any subsequent attempts to do the same would be ineffective) is _not_ around: https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3 (accidental syntactical typos were fixed later on: https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77) I have built a cluster purely with pacemaker+corosync+fence_sanlock. But if agent ocf:pacemaker:controld is desired, dlm_controld must exist! I can only find it in cman. Can the command dlm_controld be obtained without bringing in cman? To recap what others have suggested: On 18/07/16 08:57 +0100, Christine Caulfield wrote: There should be a package called 'dlm' that has a dlm_controld suitable for use with pacemaker. On 18/07/16 17:26 +0800, Eric Ren wrote: DLM upstream hosted here: https://git.fedorahosted.org/cgit/dlm.git/log/ The name of DLM on openSUSE is libdlm. -- Jan (Poki) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker in puppet with cib.xml?
On 21/07/16 21:51 +0200, Jan Pokorný wrote: > Yes, it's counterintuitive to have this asymmetry and it could be > made to work with some added effort at the side of pcs with > the original, disapproved, sequence as-is, but that's perhaps > sound of the future per the referenced pcs bug. > So take this idiom as a rule of thumb not to be questioned > any time soon. ...at least until something better is around: https://bugzilla.redhat.com/1359057 (open for comments) -- Jan (Poki) pgprGseIVSXop.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device
>>> Andrei Borzenkovschrieb am 21.07.2016 um 18:39 in Nachricht : > 21.07.2016 09:49, Ulrich Windl пишет: > Ken Gaillot schrieb am 19.07.2016 um 16:17 in Nachricht >> : >> >> [...] >>> You're right -- if not told otherwise, Pacemaker will query the device >>> for the target list. In this case, the output of "stonith_admin -l" >> >> In sles11 SP4 I see the following (surprising) output: >> "stonith_admin -l" shows the usage message > > That's correct. > >> "stonith_admin -l any" shows the configured devices, independently >> whether the given name is part of the cluster or no. Even if that >> host does not exist at all the same list is displayed: >> prm_stonith_sbd:0 >> prm_stonith_sbd >> >> Is that the way it's meant to be? >> > > Well, SBD can in principle fence any node, so yes, I'd say it is. In my I'd like to object: Don't you have to reserve a SBD slot for every host, and despite of that if a host doesn't mount the shared storage SBD cannot fence it ;-) Saying "SBD can fence any node" is equivalent to say any fence agent can fence any node. So why mess with nodes then? > case (stonith:external/ipmi) it returns correct information. I am a bit > surprised that it also does it for non-existing node, but as far as I > understand if agent returns nothing host is not even checked and it is > assumed agent can fence anything. That could be. Should there be a node list to configure, or can't the agent find out itself (for SBD)? Regards, Ulrich > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] agent ocf:pacemaker:controld
The manual "Pacemaker 1.1 Clusters from Scratch" gives the false impression that gfs2 relies only on dlm, but I cannot make it work without gfs_controld. Again this little daemon is heavily coupled with cman. I think it is quite hard to use gfs2 in a cluster build only using "pacemaker+corosync"! Am I wrong? Thanks a lot! Dashi Cao From: Da Shi CaoSent: Thursday, July 21, 2016 9:31:51 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld I've built the dlm_tool suite using the source from https://git.fedorahosted.org/cgit/dlm.git/log/. The resource uisng ocf:pacemaker:controld will always fail to start because of timeout, even if start timeout is set to 120s! But if dlm_controld is first started outside the cluster management, then the resource will show up and stay well! Another question is what's the difference of dlm_controld and gfs_controld? Must they both be present if a cluster gfs file system is mounted? Thanks a lot! Dashi Cao From: Da Shi Cao Sent: Wednesday, July 20, 2016 4:47:31 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld Thank you all for the information about dlm_controld. I will make a try using https://git.fedorahosted.org/cgit/dlm.git/log/ . Dashi Cao From: Jan Pokorný Sent: Monday, July 18, 2016 8:47:50 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld > On 18/07/16 07:59, Da Shi Cao wrote: >> dlm_controld is very tightly coupled with cman. Wrong assumption. In fact, support for shipping ocf:pacemaker:controld has been explicitly restricted to cases when CMAN logic (specifically the respective handle-all initscript that is in turn, in that limited use case, triggered from pacemaker's proper one and, moreover, takes care of dlm_controld management on its own so any subsequent attempts to do the same would be ineffective) is _not_ around: https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3 (accidental syntactical typos were fixed later on: https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77) >> I have built a cluster purely with >> pacemaker+corosync+fence_sanlock. But if agent >> ocf:pacemaker:controld is desired, dlm_controld must exist! I can >> only find it in cman. >> Can the command dlm_controld be obtained without bringing in cman? To recap what others have suggested: On 18/07/16 08:57 +0100, Christine Caulfield wrote: > There should be a package called 'dlm' that has a dlm_controld suitable > for use with pacemaker. On 18/07/16 17:26 +0800, Eric Ren wrote: > DLM upstream hosted here: > https://git.fedorahosted.org/cgit/dlm.git/log/ > > The name of DLM on openSUSE is libdlm. -- Jan (Poki) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org