[ClusterLabs] Antw: Re: Antw: Re: snapshots in a clvm environment - some questions for proceeding
>>> "Lentes, Bernd"schrieb am 01.03.2017 >>> um 23:13 in Nachricht <5769d607-e3f8-4c7d-bd70-f72e3a994...@helmholtz-muenchen.de>: >> >> Actually that's what we are doing, but at some point in the past I had > ruined >> the directory with the VM images (user error). THe problem then was that you >> need a running VM to restore files inside the VM. This is when you would > like >> to have a crash-consistent backup image of your VM. However I found no > working >> solution yet (we have VM images hosted on OCFS2, hosted on a cLVM LV). >> >> Regards, >> Ulrich >> >>> > > With OCFS2 you could snapshot (i think they call it reflink) the Image file. I didn't find the proper tools to do so in SLES11, and the manual page is quite vage on using the REFLINK feature. How do you do it? > > Bernd > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Cannot clone clvmd resource
Hi! What about colocation and ordering? Regards, Ulrich >>> Anne Nicolasschrieb am 01.03.2017 um 22:49 in Nachricht <0b585272-1c5b-0f07-1f01-747c003c6...@gmail.com>: > Hi there > > > I'm testing quite an easy configuration to work on clvm. I'm just > getting crazy as it seems clmd cannot be cloned on other nodes. > > clvmd start well on node1 but fails on both node2 and node3. > > In pacemaker journalctl I get the following message > Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd: > No such file or directory > Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat > /cmirrord: No such file or directory > Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd > action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms > Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok > (node=node3, call=233, rc=0, cib-update=541, confirmed=true) > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop > p-dlm_stop_0 on node3 (local) > Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm > action:stop call_id:235 > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop > p-dlm_stop_0 on node2 > > Here is my configuration > > node 739312139: node1 > node 739312140: node2 > node 739312141: node3 > primitive admin_addr IPaddr2 \ > params ip=172.17.2.10 \ > op monitor interval=10 timeout=20 \ > meta target-role=Started > primitive p-clvmd ocf:lvm2:clvmd \ > op start timeout=90 interval=0 \ > op stop timeout=100 interval=0 \ > op monitor interval=30 timeout=90 > primitive p-dlm ocf:pacemaker:controld \ > op start timeout=90 interval=0 \ > op stop timeout=100 interval=0 \ > op monitor interval=60 timeout=90 > primitive stonith-sbd stonith:external/sbd > group g-clvm p-dlm p-clvmd > clone c-clvm g-clvm meta interleave=true > property cib-bootstrap-options: \ > have-watchdog=true \ > dc-version=1.1.13-14.7-6f22ad7 \ > cluster-infrastructure=corosync \ > cluster-name=hacluster \ > stonith-enabled=true \ > placement-strategy=balanced \ > no-quorum-policy=freeze \ > last-lrm-refresh=1488404073 > rsc_defaults rsc-options: \ > resource-stickiness=1 \ > migration-threshold=10 > op_defaults op-options: \ > timeout=600 \ > record-pending=true > > Thanks in advance for your input > > Cheers > > -- > Anne Nicolas > http://mageia.org > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Expected recovery behavior of remote-node guest when corosync ring0 is lost in a passive mode RRP config?
>>> "Scott Greenlese"schrieb am 01.03.2017 um 22:07 in Nachricht : > Hi.. > > I am running a few corosync "passive mode" Redundant Ring Protocol (RRP) > failure scenarios, where > my cluster has several remote-node VirtualDomain resources running on each > node in the cluster, > which have been configured to allow Live Guest Migration (LGM) operations. > > While both corosync rings are active, if I drop ring0 on a given node where > I have remote node (guests) running, > I noticed that the guest will be shutdown / re-started on the same host, > after which the connection is re-established > and the guest proceeds to run on that same cluster node. Could it be you forgot "allow-migrate=true" at the resource level or some migration IP address at the node level? I only have SLES11 here... > > I am wondering why pacemaker doesn't try to "live" migrate the remote node > (guest) to a different node, instead > of rebooting the guest? Is there some way to configure the remote nodes > such that the recovery action is > LGM instead of reboot when the host-to-remote_node connect is lost in an > RRP situation? I guess the > next question is, is it even possible to LGM a remote node guest if the > corosync ring fails over from ring0 to ring1 > (or vise-versa)? > > # For example, here's a remote node's VirtualDomain resource definition. > > [root@zs95kj]# pcs resource show zs95kjg110102_res > Resource: zs95kjg110102_res (class=ocf provider=heartbeat > type=VirtualDomain) > Attributes: config=/guestxml/nfs1/zs95kjg110102.xml > hypervisor=qemu:///system migration_transport=ssh > Meta Attrs: allow-migrate=true remote-node=zs95kjg110102 > remote-addr=10.20.110.102 > Operations: start interval=0s timeout=480 > (zs95kjg110102_res-start-interval-0s) > stop interval=0s timeout=120 > (zs95kjg110102_res-stop-interval-0s) > monitor interval=30s (zs95kjg110102_res-monitor-interval-30s) > migrate-from interval=0s timeout=1200 > (zs95kjg110102_res-migrate-from-interval-0s) > migrate-to interval=0s timeout=1200 > (zs95kjg110102_res-migrate-to-interval-0s) > [root@zs95kj VD]# > > > > > # My RRP rings are active, and configured "rrp_mode="passive" > > [root@zs95kj ~]# corosync-cfgtool -s > Printing ring status. > Local node ID 2 > RING ID 0 > id = 10.20.93.12 > status = ring 0 active with no faults > RING ID 1 > id = 10.20.94.212 > status = ring 1 active with no faults > > > > # Here's the corosync.conf .. > > [root@zs95kj ~]# cat /etc/corosync/corosync.conf > totem { > version: 2 > secauth: off > cluster_name: test_cluster_2 > transport: udpu > rrp_mode: passive > } > > nodelist { > node { > ring0_addr: zs95kjpcs1 > ring1_addr: zs95kjpcs2 > nodeid: 2 > } > > node { > ring0_addr: zs95KLpcs1 > ring1_addr: zs95KLpcs2 > nodeid: 3 > } > > node { > ring0_addr: zs90kppcs1 > ring1_addr: zs90kppcs2 > nodeid: 4 > } > > node { > ring0_addr: zs93KLpcs1 > ring1_addr: zs93KLpcs2 > nodeid: 5 > } > > node { > ring0_addr: zs93kjpcs1 > ring1_addr: zs93kjpcs2 > nodeid: 1 > } > } > > quorum { > provider: corosync_votequorum > } > > logging { > to_logfile: yes > logfile: /var/log/corosync/corosync.log > timestamp: on > syslog_facility: daemon > to_syslog: yes > debug: on > > logger_subsys { > debug: off > subsys: QUORUM > } > } > > > > > # Here's the vlan / route situation on cluster node zs95kj: > > ring0 is on vlan1293 > ring1 is on vlan1294 > > [root@zs95kj ~]# route -n > Kernel IP routing table > Destination Gateway Genmask Flags Metric RefUse > Iface > 0.0.0.0 10.20.93.2540.0.0.0 UG40000 > vlan1293 << default route to guests from ring0 > 9.0.0.0 9.12.23.1 255.0.0.0 UG40000 > vlan508 > 9.12.23.0 0.0.0.0 255.255.255.0 U 40000 > vlan508 > 10.20.92.0 0.0.0.0 255.255.255.0 U 40000 > vlan1292 > 10.20.93.0 0.0.0.0 255.255.255.0 U 0 00 > vlan1293 << ring0 IPs > 10.20.93.0 0.0.0.0 255.255.255.0 U 40000 > vlan1293 > 10.20.94.0 0.0.0.0 255.255.255.0 U 0 00 > vlan1294 << ring1 IPs > 10.20.94.0 0.0.0.0 255.255.255.0 U 40000 > vlan1294 > 10.20.101.0 0.0.0.0 255.255.255.0 U 40000 > vlan1298 > 10.20.109.0 10.20.94.254255.255.255.0 UG40000 > vlan1294 << Route to guests on 10.20.109 from ring1 > 10.20.110.0 10.20.94.254255.255.255.0 UG400
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenbergwrote: > When I recently tried to make use of the DEGRADED monitoring results, > I found out that it does still not work. > > Because LRMD choses to filter them in ocf2uniform_rc(), > and maps them to PCMK_OCF_UNKNOWN_ERROR. > > See patch suggestion below. > > It also filters away the other "special" rc values. > Do we really not want to see them in crmd/pengine? I would think we do. > Why does LRMD think it needs to outsmart the pengine? Because the person that implemented the feature incorrectly assumed the rc would be passed back unmolested. > > Note: I did build it, but did not use this yet, > so I have no idea if the rest of the implementation of the DEGRADED > stuff works as intended or if there are other things missing as well. failcount might be the other place that needs some massaging. specifically, not incrementing it when a degraded rc comes through > > Thougts?\ looks good to me > > diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c > index 724edb7..39a7dd1 100644 > --- a/lrmd/lrmd.c > +++ b/lrmd/lrmd.c > @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char > *stdout_data) > static int > ocf2uniform_rc(int rc) > { > -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { > -return PCMK_OCF_UNKNOWN_ERROR; > +switch (rc) { > +default: > + return PCMK_OCF_UNKNOWN_ERROR; > + > +case PCMK_OCF_OK: > +case PCMK_OCF_UNKNOWN_ERROR: > +case PCMK_OCF_INVALID_PARAM: > +case PCMK_OCF_UNIMPLEMENT_FEATURE: > +case PCMK_OCF_INSUFFICIENT_PRIV: > +case PCMK_OCF_NOT_INSTALLED: > +case PCMK_OCF_NOT_CONFIGURED: > +case PCMK_OCF_NOT_RUNNING: > +case PCMK_OCF_RUNNING_MASTER: > +case PCMK_OCF_FAILED_MASTER: > + > +case PCMK_OCF_DEGRADED: > +case PCMK_OCF_DEGRADED_MASTER: > + return rc; > + > +#if 0 > + /* What about these?? */ yes, these should get passed back as-is too > +/* 150-199 reserved for application use */ > +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by > disconnection of the LRM API to a local or remote node */ > + > +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */ > +PCMK_OCF_UNKNOWN = 193, /* State of the service is unknown - used > for recording in-flight operations */ > +PCMK_OCF_SIGNAL= 194, > +PCMK_OCF_NOT_SUPPORTED = 195, > +PCMK_OCF_PENDING = 196, > +PCMK_OCF_CANCELLED = 197, > +PCMK_OCF_TIMEOUT = 198, > +PCMK_OCF_OTHER_ERROR = 199, /* Keep the same codes as PCMK_LSB */ > +#endif > } > - > -return rc; > } > > static int > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] R: Re: Antw: Re: Ordering Sets of Resources
On 03/01/2017 03:22 PM, iva...@libero.it wrote: > You are right, but i had to use option symmetrical=false because i need to > stop, when all resources are running, even the single primitive with no > impact > to others resources. > > I have also used symmetrical=false with kind=Optional. > The stop of the individual resource does not stop the others resources, but > if > during the startup or shutdown of the resources is used a list of primitives > without any order, the resources will start or stop without respecting the > constraint strictly. > > Regards > Ivan If I understand, you want to be able to specify resources A B C such that they always start in that order, but stopping can be in any combination: * just A * just B * just C * just A and B (in which case B stops then A) * just A and C (in which case C stops then A) * just B and C (in which case C stops then B) * or all (in which case C stops, then B, then A) There may be a fancy way to do it with sets, but my first thought is: * Keep the start constraint you have * Use individual ordering constraints between each resource pair with kind=Optional and action=stop >> Messaggio originale >> Da: "Ken Gaillot">> Data: 01/03/2017 15.57 >> A: "Ulrich Windl" , >> Ogg: Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources >> >> On 03/01/2017 01:36 AM, Ulrich Windl wrote: >> Ken Gaillot schrieb am 26.02.2017 um 20:04 in > Nachricht >>> : On 02/25/2017 03:35 PM, iva...@libero.it wrote: > Hi all, > i have configured a two node cluster on redhat 7. > > Because I need to manage resources stopping and starting singularly when > they are running I have configured cluster using order set constraints. > > Here the example > > Ordering Constraints: > Resource Sets: > set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false > require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true > require-all=true setoptions symmetrical=false > set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true > require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop > sequential=true require-all=true setoptions symmetrical=false > kind=Mandatory > > The constrait work as expected on start but when stopping the resource > don't respect the order. > Any help is appreciated > > Thank and regards > Ivan symmetrical=false means the order only applies for starting >>> >>> From the name (symmetrical) alone it could also mean that it only applies > for stopping ;-) >>> (Another example where better names would be nice) >> >> Well, more specifically, it only applies to the action specified in the >> constraint. I hadn't noticed before that the second constraint here has >> action=stop, so yes, that one would only apply for stopping. >> >> In the above example, the two constraints are identical to a single >> constraint with symmetrical=true, since the second constraint is just >> the reverse of the first. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Cannot clone clvmd resource
On 03/01/2017 03:49 PM, Anne Nicolas wrote: > Hi there > > > I'm testing quite an easy configuration to work on clvm. I'm just > getting crazy as it seems clmd cannot be cloned on other nodes. > > clvmd start well on node1 but fails on both node2 and node3. Your config looks fine, so I'm going to guess there's some local difference on the nodes. > In pacemaker journalctl I get the following message > Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd: > No such file or directory > Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat > /cmirrord: No such file or directory I have no idea where the above is coming from. pidofproc is an LSB function, but (given journalctl) I'm assuming you're using systemd. I don't think anything in pacemaker or resource-agents uses pidofproc (at least not currently, not sure about the older version you're using). > Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd > action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms > Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok > (node=node3, call=233, rc=0, cib-update=541, confirmed=true) > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop > p-dlm_stop_0 on node3 (local) > Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm > action:stop call_id:235 > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop > p-dlm_stop_0 on node2 > > Here is my configuration > > node 739312139: node1 > node 739312140: node2 > node 739312141: node3 > primitive admin_addr IPaddr2 \ > params ip=172.17.2.10 \ > op monitor interval=10 timeout=20 \ > meta target-role=Started > primitive p-clvmd ocf:lvm2:clvmd \ > op start timeout=90 interval=0 \ > op stop timeout=100 interval=0 \ > op monitor interval=30 timeout=90 > primitive p-dlm ocf:pacemaker:controld \ > op start timeout=90 interval=0 \ > op stop timeout=100 interval=0 \ > op monitor interval=60 timeout=90 > primitive stonith-sbd stonith:external/sbd > group g-clvm p-dlm p-clvmd > clone c-clvm g-clvm meta interleave=true > property cib-bootstrap-options: \ > have-watchdog=true \ > dc-version=1.1.13-14.7-6f22ad7 \ > cluster-infrastructure=corosync \ > cluster-name=hacluster \ > stonith-enabled=true \ > placement-strategy=balanced \ > no-quorum-policy=freeze \ > last-lrm-refresh=1488404073 > rsc_defaults rsc-options: \ > resource-stickiness=1 \ > migration-threshold=10 > op_defaults op-options: \ > timeout=600 \ > record-pending=true > > Thanks in advance for your input > > Cheers > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: snapshots in a clvm environment - some questions for proceeding
> > Actually that's what we are doing, but at some point in the past I had ruined > the directory with the VM images (user error). THe problem then was that you > need a running VM to restore files inside the VM. This is when you would like > to have a crash-consistent backup image of your VM. However I found no working > solution yet (we have VM images hosted on OCFS2, hosted on a cLVM LV). > > Regards, > Ulrich > >> With OCFS2 you could snapshot (i think they call it reflink) the Image file. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Cannot clone clvmd resource
Hi there I'm testing quite an easy configuration to work on clvm. I'm just getting crazy as it seems clmd cannot be cloned on other nodes. clvmd start well on node1 but fails on both node2 and node3. In pacemaker journalctl I get the following message Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd: No such file or directory Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat /cmirrord: No such file or directory Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok (node=node3, call=233, rc=0, cib-update=541, confirmed=true) Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop p-dlm_stop_0 on node3 (local) Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm action:stop call_id:235 Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop p-dlm_stop_0 on node2 Here is my configuration node 739312139: node1 node 739312140: node2 node 739312141: node3 primitive admin_addr IPaddr2 \ params ip=172.17.2.10 \ op monitor interval=10 timeout=20 \ meta target-role=Started primitive p-clvmd ocf:lvm2:clvmd \ op start timeout=90 interval=0 \ op stop timeout=100 interval=0 \ op monitor interval=30 timeout=90 primitive p-dlm ocf:pacemaker:controld \ op start timeout=90 interval=0 \ op stop timeout=100 interval=0 \ op monitor interval=60 timeout=90 primitive stonith-sbd stonith:external/sbd group g-clvm p-dlm p-clvmd clone c-clvm g-clvm meta interleave=true property cib-bootstrap-options: \ have-watchdog=true \ dc-version=1.1.13-14.7-6f22ad7 \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=true \ placement-strategy=balanced \ no-quorum-policy=freeze \ last-lrm-refresh=1488404073 rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=10 op_defaults op-options: \ timeout=600 \ record-pending=true Thanks in advance for your input Cheers -- Anne Nicolas http://mageia.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] R: Re: Antw: Re: Ordering Sets of Resources
You are right, but i had to use option symmetrical=false because i need to stop, when all resources are running, even the single primitive with no impact to others resources. I have also used symmetrical=false with kind=Optional. The stop of the individual resource does not stop the others resources, but if during the startup or shutdown of the resources is used a list of primitives without any order, the resources will start or stop without respecting the constraint strictly. Regards Ivan >Messaggio originale >Da: "Ken Gaillot">Data: 01/03/2017 15.57 >A: "Ulrich Windl" , >Ogg: Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources > >On 03/01/2017 01:36 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 26.02.2017 um 20:04 in Nachricht >> : >>> On 02/25/2017 03:35 PM, iva...@libero.it wrote: Hi all, i have configured a two node cluster on redhat 7. Because I need to manage resources stopping and starting singularly when they are running I have configured cluster using order set constraints. Here the example Ordering Constraints: Resource Sets: set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true require-all=true setoptions symmetrical=false set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop sequential=true require-all=true setoptions symmetrical=false kind=Mandatory The constrait work as expected on start but when stopping the resource don't respect the order. Any help is appreciated Thank and regards Ivan >>> >>> symmetrical=false means the order only applies for starting >> >> From the name (symmetrical) alone it could also mean that it only applies for stopping ;-) >> (Another example where better names would be nice) > >Well, more specifically, it only applies to the action specified in the >constraint. I hadn't noticed before that the second constraint here has >action=stop, so yes, that one would only apply for stopping. > >In the above example, the two constraints are identical to a single >constraint with symmetrical=true, since the second constraint is just >the reverse of the first. > > >___ >Users mailing list: Users@clusterlabs.org >http://lists.clusterlabs.org/mailman/listinfo/users > >Project Home: http://www.clusterlabs.org >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Expected recovery behavior of remote-node guest when corosync ring0 is lost in a passive mode RRP config?
Hi.. I am running a few corosync "passive mode" Redundant Ring Protocol (RRP) failure scenarios, where my cluster has several remote-node VirtualDomain resources running on each node in the cluster, which have been configured to allow Live Guest Migration (LGM) operations. While both corosync rings are active, if I drop ring0 on a given node where I have remote node (guests) running, I noticed that the guest will be shutdown / re-started on the same host, after which the connection is re-established and the guest proceeds to run on that same cluster node. I am wondering why pacemaker doesn't try to "live" migrate the remote node (guest) to a different node, instead of rebooting the guest? Is there some way to configure the remote nodes such that the recovery action is LGM instead of reboot when the host-to-remote_node connect is lost in an RRP situation? I guess the next question is, is it even possible to LGM a remote node guest if the corosync ring fails over from ring0 to ring1 (or vise-versa)? # For example, here's a remote node's VirtualDomain resource definition. [root@zs95kj]# pcs resource show zs95kjg110102_res Resource: zs95kjg110102_res (class=ocf provider=heartbeat type=VirtualDomain) Attributes: config=/guestxml/nfs1/zs95kjg110102.xml hypervisor=qemu:///system migration_transport=ssh Meta Attrs: allow-migrate=true remote-node=zs95kjg110102 remote-addr=10.20.110.102 Operations: start interval=0s timeout=480 (zs95kjg110102_res-start-interval-0s) stop interval=0s timeout=120 (zs95kjg110102_res-stop-interval-0s) monitor interval=30s (zs95kjg110102_res-monitor-interval-30s) migrate-from interval=0s timeout=1200 (zs95kjg110102_res-migrate-from-interval-0s) migrate-to interval=0s timeout=1200 (zs95kjg110102_res-migrate-to-interval-0s) [root@zs95kj VD]# # My RRP rings are active, and configured "rrp_mode="passive" [root@zs95kj ~]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 10.20.93.12 status = ring 0 active with no faults RING ID 1 id = 10.20.94.212 status = ring 1 active with no faults # Here's the corosync.conf .. [root@zs95kj ~]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: test_cluster_2 transport: udpu rrp_mode: passive } nodelist { node { ring0_addr: zs95kjpcs1 ring1_addr: zs95kjpcs2 nodeid: 2 } node { ring0_addr: zs95KLpcs1 ring1_addr: zs95KLpcs2 nodeid: 3 } node { ring0_addr: zs90kppcs1 ring1_addr: zs90kppcs2 nodeid: 4 } node { ring0_addr: zs93KLpcs1 ring1_addr: zs93KLpcs2 nodeid: 5 } node { ring0_addr: zs93kjpcs1 ring1_addr: zs93kjpcs2 nodeid: 1 } } quorum { provider: corosync_votequorum } logging { to_logfile: yes logfile: /var/log/corosync/corosync.log timestamp: on syslog_facility: daemon to_syslog: yes debug: on logger_subsys { debug: off subsys: QUORUM } } # Here's the vlan / route situation on cluster node zs95kj: ring0 is on vlan1293 ring1 is on vlan1294 [root@zs95kj ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 0.0.0.0 10.20.93.2540.0.0.0 UG40000 vlan1293 << default route to guests from ring0 9.0.0.0 9.12.23.1 255.0.0.0 UG40000 vlan508 9.12.23.0 0.0.0.0 255.255.255.0 U 40000 vlan508 10.20.92.0 0.0.0.0 255.255.255.0 U 40000 vlan1292 10.20.93.0 0.0.0.0 255.255.255.0 U 0 00 vlan1293 << ring0 IPs 10.20.93.0 0.0.0.0 255.255.255.0 U 40000 vlan1293 10.20.94.0 0.0.0.0 255.255.255.0 U 0 00 vlan1294 << ring1 IPs 10.20.94.0 0.0.0.0 255.255.255.0 U 40000 vlan1294 10.20.101.0 0.0.0.0 255.255.255.0 U 40000 vlan1298 10.20.109.0 10.20.94.254255.255.255.0 UG40000 vlan1294 << Route to guests on 10.20.109 from ring1 10.20.110.0 10.20.94.254255.255.255.0 UG40000 vlan1294 << Route to guests on 10.20.110 from ring1 169.254.0.0 0.0.0.0 255.255.0.0 U 1007 00 enccw0.0.02e0 169.254.0.0 0.0.0.0 255.255.0.0 U 1016 00 ovsbridge1 192.168.122.0 0.0.0.0 255.255.255.0 U 0 00 virbr0 # On remote node, you can see we have a connection back to the host. Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: info: qb_ipcs_us_publish:server name: lrmd
Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources
On 03/01/2017 01:36 AM, Ulrich Windl wrote: Ken Gaillotschrieb am 26.02.2017 um 20:04 in Nachricht > : >> On 02/25/2017 03:35 PM, iva...@libero.it wrote: >>> Hi all, >>> i have configured a two node cluster on redhat 7. >>> >>> Because I need to manage resources stopping and starting singularly when >>> they are running I have configured cluster using order set constraints. >>> >>> Here the example >>> >>> Ordering Constraints: >>> Resource Sets: >>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false >>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true >>> require-all=true setoptions symmetrical=false >>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true >>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop >>> sequential=true require-all=true setoptions symmetrical=false kind=Mandatory >>> >>> The constrait work as expected on start but when stopping the resource >>> don't respect the order. >>> Any help is appreciated >>> >>> Thank and regards >>> Ivan >> >> symmetrical=false means the order only applies for starting > > From the name (symmetrical) alone it could also mean that it only applies for > stopping ;-) > (Another example where better names would be nice) Well, more specifically, it only applies to the action specified in the constraint. I hadn't noticed before that the second constraint here has action=stop, so yes, that one would only apply for stopping. In the above example, the two constraints are identical to a single constraint with symmetrical=true, since the second constraint is just the reverse of the first. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: using IPMI for fencing - configuring IPMI with ipmitool - SOLVED
- On Mar 1, 2017, at 1:41 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: Hi, i managed it: stonith -t external/ipmi hostname=ha-idg-1 ipaddr=146.107.235.15 userid=root passwd=xx passwd_method=param interface=lanplus -S info: external/ipmi device OK. I had to use IPMI v2.0. So i had to use the lanplus interface to connect. IPMI v1.5 is deactivated, so trying to connect via the lan interface can't succeed. ha-idg-1:~ # ipmitool channel authcap 2 4 Channel number : 2 IPMI v1.5 auth types : <== KG status : default (all zeroes) Per message authentication : disabled User level authentication : enabled Non-null user names exist : yes Null user names exist : no Anonymous login enabled: no Channel supports IPMI v1.5 : no <= Channel supports IPMI v2.0 : yes Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Insert delay between the statup of VirtualDomain
Hi Dejan, In my environment, is it possible to launch the check from the hypervisor. A simple telnet against an specific port may be enough tp check if service is ready. In this simple scenario (and check) how can I instruct the second server to wait the mysql server is up? Thanks a lot El 1 mar. 2017 1:08 p. m., "Dejan Muhamedagic"escribió: > Hi, > > On Sat, Feb 25, 2017 at 09:58:01PM +0100, Oscar Segarra wrote: > > Hi, > > > > Yes, > > > > Database server can be considered started up when it accepts mysql client > > connections > > Applications server can be considered started as soon as the listening > port > > is up al accepting connections > > > > ¿Can you provide any example about how to achieve this? > > Is it possible to connect to the database from the supervisor? > Then something like this would do: > > mysql -h vm_ip_address ... < /dev/null > > If not, then if ssh works: > > echo mysql ... | ssh vm_ip_address > > I'm afraid I cannot help you more with mysql details and what to > put in '...' stead above, but it should do whatever is necessary > to test if the database reached the functional state. You can > find an example in ocf:heartbeat:mysql: just look for the > "test_table" parameter. Of course, you'll need to put that in a > script and test output and so on. I guess that there's enough > information in internet on how to do that. > > Good luck! > > Dejan > > > Thanks a lot. > > > > > > 2017-02-25 19:35 GMT+01:00 Dejan Muhamedagic : > > > > > Hi, > > > > > > On Thu, Feb 23, 2017 at 08:51:20PM +0100, Oscar Segarra wrote: > > > > Hi, > > > > > > > > In my environment I have 5 guestes that have to be started up in a > > > > specified order starting for the MySQL database server. > > > > > > > > I have set the order constraints and VirtualDomains start in the > right > > > > order but, the problem I have, is that the second host starts up > faster > > > > than the database server and therefore applications running on the > second > > > > host raise errors due to database connectivity problems. > > > > > > > > I'd like to introduce a delay between the startup of the > VirtualDomain of > > > > the database server and the startup of the second guest. > > > > > > Do you have a way to check if this server is up? If so... > > > The start action of VirtualDomain won't exit until the monitor > > > action returns success. And there's a parameter called > > > monitor_scripts (see the meta-data). Note that these programs > > > (scripts) are run at the supervisor host and not in the guest. > > > It's all a bit involved, but should be doable. > > > > > > Thanks, > > > > > > Dejan > > > > > > > ¿Is it any way to get this? > > > > > > > > Thanks a lot. > > > > > > > ___ > > > > Users mailing list: Users@clusterlabs.org > > > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > > > > > Project Home: http://www.clusterlabs.org > > > > Getting started: http://www.clusterlabs.org/ > doc/Cluster_from_Scratch.pdf > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > ___ > > > Users mailing list: Users@clusterlabs.org > > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/ > doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: using IPMI for fencing - configuring IPMI with ipmitool - HELP
- On Mar 1, 2017, at 10:20 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >> Hi, >> >> i have a HP server ML 350 G9 with an ILO4 card. The riloe stonith agent does > >> not work, i read in a book the recommendation to use the ipmi ressource > agent >> instead. > > Why don't you use SBD (as recommended)? > Hi, i want to have two independent possibilities for fencing. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: snapshots in a clvm environment - some questions for proceeding
- On Mar 1, 2017, at 8:11 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Digimerschrieb am 24.02.2017 um 19:20 in Nachricht > <8762afa9-0f45-04c7-2404-565dcabf9...@alteeve.ca>: > > [...] >> Aside from this, I strongly recommend against snapshots as a backup >> mechanism anyway. There is no way to ensure that that operating system >> and applications are in a clean state when you take the snapshot, so >> using the image is like recovering from sudden power loss. If data was >> in cache but not flushed out, you could have corruption. >> >> If you can't stop your VMs, I'd recommend using a backup application >> inside the VM that knows how to ensure that your apps and the OS are in >> a clean state, particularly for DBs. > > Actually that's what we are doing, but at some point in the past I had ruined > the directory with the VM images (user error). THe problem then was that you > need a running VM to restore files inside the VM. This is when you would like > to have a crash-consistent backup image of your VM. However I found no working > solution yet (we have VM images hosted on OCFS2, hosted on a cLVM LV). > I changed the approach. Inside the vm i have also lv's. When changing configuration inside the vm i will snapshot inside. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] using IPMI for fencing - configuring IPMI with ipmitool - HELP
- On Mar 1, 2017, at 5:03 AM, Andrei Borzenkov arvidj...@gmail.com wrote: > 28.02.2017 20:39, Lentes, Bernd пишет: >> Hi, >> >> i have a HP server ML 350 G9 with an ILO4 card. The riloe stonith agent does >> not >> work, i read in a book the recommendation to use the ipmi ressource agent >> instead. >> I'm trying to configure the respective ILO adapter with ipmitool. > > Why do not you simply go to iLO4 web interface and configure users there? I have users configured there. But are they visible for IPMI ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Oralsnr/Oracle resources agents
Hi, On Sun, Feb 26, 2017 at 09:51:47AM +0300, Andrei Borzenkov wrote: > 25.02.2017 23:18, Jihed M'selmi пишет: > > [DM] I thought that oracle listener is not consuming that many resources. > > At any rate, ocf:heartbeat:oralsnr doesn't support single listener for > > multiple instances. Do you have an idea how to do that? How to deal with > > the tnsping then? Maybe you're better off with the system start script in > > this case. > > > > [JM] According to the dba, it could lead some memory issue when the > > listener serves many instances at the same time (in my experience, I have > > never faced this issue). > > > > What "it" means in the above sentence? "Running single listener for > multiple instances" or "running each instance with own listener"? > > How many instances are we talking about? > > > Let's take a case when the listener is serving multiple instance, and one > > of the instance fails => ocf:heartbeat:oracle will relocate it to another > > node, the listener should follow (especially, when we use > > collocation constraint between RA oracle and oralsnr) this will have a bad > > impact on the rest of instances. > > > > One of the option is to have two listeners (one per node) and configured > > outside the cluster to host the all instance. But, I keep looking for a > > better solution. > > > > [DM] Hmm, what should then the RA do? Skip the instance and report it > > started? > > I'm not sure I follow. > > [JM] The DBA use a flag Y/N to tell if this instance should run or no. It > > could be better, for RA to use this flag too: when it's Y start the > > instance and when It's N, the RA should not start the instance and suitable > > message in log will be usefull to describe the situation. Now, the > > challenge is how to monitor this flag. > > > > DBA still needs to remember to change this flag on each node in the > cluster. In which case it can just as well remember to use different way > to disable automatic startup. Indeed. I wonder what is the difference between editing a file and running say "crm rsc stop db". At any rate, I doubt that there is a sane way for the RA to handle such a case. > > One of the issue that I faced when the DBA when to shutdown the listener > > and the instance (for launch the cold backup) but, the RA keep pushing them > > ON. -- Note the dba team usually don't have an access to pcs to disable the > > resource during this type of operation. > > > > There is nothing new. Once you put application under HA control, you in > general cannot use native application tools to manage it. That is why > SAP introduced "cluster glue" layer that intercepts native requests to > start/stop application and forwards them to cluster for actual processing. > > The solution here is really to make it possible to delegate control of > individual resources to different users, so that DBA can > start/stop/disable/unmanage individual resources (s)he owns. > > If nothing else, it can be done as SUID scripts that implement this check. Users other than root/hacluster can be given access to cluster (essentially editing the CIB). Details escape me now, but there is a concept of roles and then one can define rules on what roles are allowed to do. Thanks, Dejan > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Insert delay between the statup of VirtualDomain
Hi, On Mon, Feb 27, 2017 at 12:38:07PM +0100, Ferenc Wágner wrote: > Oscar Segarrawrites: > > > In my environment I have 5 guestes that have to be started up in a > > specified order starting for the MySQL database server. > > We use a somewhat redesigned resource agent, which connects to the guest > using a virtio channel and waits for a signal before exiting from the > start operation. The signal is sent by an approriately placed startup > script from the guest. This is fully independent from regular network > traffic and does not need any channel configuration. Cool. Maybe you'd like to share the code or, best, do a pull request at github. This is certainly very useful. Thanks, Dejan > -- > Feri > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Insert delay between the statup of VirtualDomain
Hi, On Sat, Feb 25, 2017 at 09:58:01PM +0100, Oscar Segarra wrote: > Hi, > > Yes, > > Database server can be considered started up when it accepts mysql client > connections > Applications server can be considered started as soon as the listening port > is up al accepting connections > > ¿Can you provide any example about how to achieve this? Is it possible to connect to the database from the supervisor? Then something like this would do: mysql -h vm_ip_address ... < /dev/null If not, then if ssh works: echo mysql ... | ssh vm_ip_address I'm afraid I cannot help you more with mysql details and what to put in '...' stead above, but it should do whatever is necessary to test if the database reached the functional state. You can find an example in ocf:heartbeat:mysql: just look for the "test_table" parameter. Of course, you'll need to put that in a script and test output and so on. I guess that there's enough information in internet on how to do that. Good luck! Dejan > Thanks a lot. > > > 2017-02-25 19:35 GMT+01:00 Dejan Muhamedagic: > > > Hi, > > > > On Thu, Feb 23, 2017 at 08:51:20PM +0100, Oscar Segarra wrote: > > > Hi, > > > > > > In my environment I have 5 guestes that have to be started up in a > > > specified order starting for the MySQL database server. > > > > > > I have set the order constraints and VirtualDomains start in the right > > > order but, the problem I have, is that the second host starts up faster > > > than the database server and therefore applications running on the second > > > host raise errors due to database connectivity problems. > > > > > > I'd like to introduce a delay between the startup of the VirtualDomain of > > > the database server and the startup of the second guest. > > > > Do you have a way to check if this server is up? If so... > > The start action of VirtualDomain won't exit until the monitor > > action returns success. And there's a parameter called > > monitor_scripts (see the meta-data). Note that these programs > > (scripts) are run at the supervisor host and not in the guest. > > It's all a bit involved, but should be doable. > > > > Thanks, > > > > Dejan > > > > > ¿Is it any way to get this? > > > > > > Thanks a lot. > > > > > ___ > > > Users mailing list: Users@clusterlabs.org > > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Never join a list without a problem...
Jeffrey Westgatewrites: > We use Nagios to monitor, and once every 20 to 40 hours - sometimes > longer, and we cannot set a clock by it - while the machine is 95% > idle (or more according to 'top'), the host load shoots up to 50 or > 60%. It takes about 20 minutes to peak, and another 30 to 45 minutes > to come back down to baseline, which is mostly 0.00. (attached > hostload.pdf) This happens to both machines, randomly, and is > concerning, as we'd like to find what's causing it and resolve it. Try running atop (http://www.atoptool.nl/). It collects and logs process accounting info, allowing you to step back in time and check resource usage in the past. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Never join a list without a problem...
>>> Kai Dupkeschrieb am 01.03.2017 um 09:55 in Nachricht : > On 02/27/2017 02:26 PM, Jeffrey Westgate wrote: >> We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, > and we cannot set a clock by it - while the machine is 95% idle (or more > according to 'top'), the host load shoots up to 50 or 60%. It takes about 20 > minutes to peak, and another 30 to 45 minutes to come back down to baseline, > which is mostly 0.00. > > So, you have a time window of ~1h where the system is under load, right? > This is somewhat different to what Ulrich had, but his approach might be > useful for you, too. > > Something against running some monitoring and capturing the processes, > process states and load say, every 5 minutes? > > Of course, the peaks might correlate to something in the logs - like > cron, logins, logrotates or whatever. The main issue is "expected load" vs. "unexpected load". In my case the system was expected to be completely idle at night, so I had set the thresholds rather low. Other systems can use different approaches. I hope to hear what caused the problem in your case. Ulrich ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: using IPMI for fencing - configuring IPMI with ipmitool - HELP
>>> "Lentes, Bernd"schrieb am 28.02.2017 um 18:39 in Nachricht <476524732.41182296.1488303562492.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > i have a HP server ML 350 G9 with an ILO4 card. The riloe stonith agent does > not work, i read in a book the recommendation to use the ipmi ressource agent > instead. Why don't you use SBD (as recommended)? > I'm trying to configure the respective ILO adapter with ipmitool. OMG. > Ipmitool drives me crazy. > It's a SLES 11 SP4 node. I did "/etc/init.d/ipmi start", some modules are > loaded: > > ha-idg-1:~ # lsmod|grep -i ipmi > ipmi_devintf 17560 0 > ipmi_si53422 0 > ipmi_msghandler49979 2 ipmi_devintf,ipmi_si > > I have a device file: > > ha-idg-1:~ # ll /dev/ipm* > crw-rw 1 root root 246, 0 Feb 28 13:51 /dev/ipmi0 > > What i found out/did already: > > For channel 2 i have two users configured: > > ipmitool> user list 2 > 1 Administratortruefalse true ADMINISTRATOR > 2 root truefalse true ADMINISTRATOR > 3 (Empty User) truefalse false NO ACCESS > 4 (Empty User) truefalse false NO ACCESS > 5 (Empty User) truefalse false NO ACCESS > 6 (Empty User) truefalse false NO ACCESS > 7 (Empty User) truefalse false NO ACCESS > 8 (Empty User) truefalse false NO ACCESS > 9 (Empty User) truefalse false NO ACCESS > 10 (Empty User) truefalse false NO ACCESS > 11 (Empty User) truefalse false NO ACCESS > 12 (Empty User) truefalse false NO ACCESS > > User root has a passsword which i tested via "user test" and it was ok. > > Channel 2: > > ipmitool> channel info 2 > Channel 0x2 info: > Channel Medium Type : 802.3 LAN > Channel Protocol Type : IPMB-1.0 > Session Support : multi-session > Active Session Count : 0 > Protocol Vendor ID: 7154 > Volatile(active) Settings > Alerting: enabled > Per-message Auth: disabled > User Level Auth : enabled > Access Mode : always available > Non-Volatile Settings > Alerting: enabled > Per-message Auth: disabled > User Level Auth : enabled > Access Mode : always available > > ipmitool> lan print 2 > Set in Progress : Set Complete > Auth Type Support : > Auth Type Enable: Callback : > : User : > : Operator : > : Admin: > : OEM : > IP Address Source : DHCP Address > IP Address : 146.107.235.15 > Subnet Mask : 255.255.255.0 > MAC Address : 70:10:6f:47:0c:48 > SNMP Community String : > BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled > Default Gateway IP : 146.107.235.1 > 802.1q VLAN ID : Disabled > 802.1q VLAN Priority: 0 > RMCP+ Cipher Suites : 0,1,2,3 > Cipher Suite Priv Max : XuuaXXX > : X=Cipher Suite Unused > : c=CALLBACK > : u=USER > : o=OPERATOR > : a=ADMIN > : O=OEM > > How can i grant principal access to channel 2 ? > I tried: > > ipmitool> lan set 2 access on > Set Channel Access for channel 2 failed: Unknown (0x83) > ipmitool> lan set 2 access ON > lan set access > ipmitool> lan set 2 access=ON > lan set access > > Does not seem to work. > > I did "lan set user 2", do not know if it's helpful. > > Also: > > ipmitool> channel authcap 2 4 > Channel number : 2 > IPMI v1.5 auth types : > KG status : default (all zeroes) > Per message authentication : disabled > User level authentication : enabled > Non-null user names exist : yes > Null user names exist : no > Anonymous login enabled: no > Channel supports IPMI v1.5 : no > Channel supports IPMI v2.0 : yes > > Don't know if it helps. > > I found > https://www.thomas-krenn.com/de/wiki/IPMI_Konfiguration_unter_Linux_mittels_i > pmitool (sorry, only in german): > > I did, as proposed: > > ha-idg-1:~ # ipmitool lan set 2 auth ADMIN MD5 > ha-idg-1:~ # ipmitool lan set 2 access on > Set Channel Access for channel 2 failed: Unknown (0x83) <= ??? > > ha-idg-1:~ # ipmitool lan print 2 > Set in Progress : Set Complete > Auth Type Support : > Auth Type Enable: Callback : > : User : > : Operator : > : Admin: > : OEM : > IP Address Source : DHCP Address > IP Address : 146.107.235.15 > Subnet Mask : 255.255.255.0 > MAC
Re: [ClusterLabs] Never join a list without a problem...
On 02/27/2017 02:26 PM, Jeffrey Westgate wrote: > We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, > and we cannot set a clock by it - while the machine is 95% idle (or more > according to 'top'), the host load shoots up to 50 or 60%. It takes about 20 > minutes to peak, and another 30 to 45 minutes to come back down to baseline, > which is mostly 0.00. So, you have a time window of ~1h where the system is under load, right? This is somewhat different to what Ulrich had, but his approach might be useful for you, too. Something against running some monitoring and capturing the processes, process states and load say, every 5 minutes? Of course, the peaks might correlate to something in the logs - like cron, logins, logrotates or whatever. regards, Kai Dupke Senior Product Manager SUSE Linux Enterprise 13 -- Sell not virtue to purchase wealth, nor liberty to purchase power. Phone: +49-(0)5102-9310828 Mail: kdu...@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Never join a list without a problem...
>>> Jeffrey Westgateschrieb am 27.02.2017 um >>> 14:26 in Nachricht : > Thanks, Ken. > > Our late guru was the admin who set all this up, and it's been rock solid > until recent oddities started cropping up. They still function fine - > they've > just developed some... quirks. > > I found the solution before I got your reply, which was essentially what we > did; update all but pacemaker, reboot, stop pacemaker, update pacemaker, > reboot. That process was necessary because they've been running sooo long, > pacemaker would not stop. it would try, then seemingly stall after several > minutes. > > We're good now, up-to-date-wise, and stuck only with the initial issue we > were > hoping to eliminate by updating/patching EVERYthing. And we honestly don't > know what may be causing it. > > We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, > and we cannot set a clock by it - while the machine is 95% idle (or more > according to 'top'), the host load shoots up to 50 or 60%. It takes about 20 > minutes to peak, and another 30 to 45 minutes to come back down to baseline, > which is mostly 0.00. (attached hostload.pdf) This happens to both > machines, randomly, and is concerning, as we'd like to find what's causing it > and resolve it. We use SLES11 here, and it took me a really long time to find out what is causing nightly load peaks on our servers. It turned out tho be the rebuild of the manual database (mandb). It didn't show in Nagios load statistics, but in monit alerts (on some machines we use both). In monit you can run a script when some condition is met. So I constructed a "capture script" to find the guilty parties ;-) However the peaks were so short that it took many runs to find it. Here the load was back to normal already, but monit had reported an event like "cpu system usage of 30.2% matches resource limit [cpu system usage>20.0%]": Sat May 11 01:31:13 CEST 2013 top - 01:31:14 up 2 days, 9:31, 0 users, load average: 0.91, 0.31, 0.15 Tasks: 114 total, 2 running, 112 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1065628k total, 1055292k used,10336k free, 143708k buffers Swap: 2097148k total,0k used, 2097148k free, 578736k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 2832 root 20 0 8916 1060 776 R 0 0.1 0:00.00 top 2910 man 30 10 840 R 0 0.0 0:00.00 mandb Maybe this helps. Regards, Ulrich > > We were hoping "uptime kernel bug", but patching has not helped. There > seems to be no increase in the number of processes running, and the processes > running do not take any more cpu time. They are DNS forwarding resolvers, > but there is no correlation between dns requests and load increase - > sometimes > (like this morning) it rises around 1 AM when the dns load is minimal. > > The oddity is - these are the only two boxes with this issue, and we have a > couple dozen at the same OS and level. Only these two, with this role and > this particular package set have the issue. > > -- > Jeff ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org