Re: [ClusterLabs] trace of Filesystem RA does not log

2019-10-14 Thread Lentes, Bernd
- On Oct 14, 2019, at 6:27 AM, Roger Zhou zz...@suse.com wrote: > The stop failure is very bad, and is crucial for HA system. Yes, that's true. > You can try o2locktop cli to find the potential INODE to be blamed[1]. > > `o2locktop --help` gives you more usage details I will try that.

[ClusterLabs] trace of Filesystem RA does not log

2019-10-11 Thread Lentes, Bernd
Hi, occasionally the stop of a Filesystem resource for an OCFS2 Partition fails to stop. I'm currently tracing this RA hoping to find the culprit. I'm putting one of both nodes into standby, hoping the error appears. Afterwards setting it online again and doing the same procedure with the other

[ClusterLabs] Why is node fenced ?

2019-10-10 Thread Lentes, Bernd
HI, i have a two node cluster running on SLES 12 SP4. I did some testing on it. I put one into standby (ha-idg-2), the other (ha-idg-1) got fenced a few minutes later because i made a mistake. ha-idg-2 was DC. ha-idg-1 made a fresh boot and i started corosync/pacemaker on it. It seems ha-idg-1 d

[ClusterLabs] change of the configuration of a resource which is part of a clone

2019-10-09 Thread Lentes, Bernd
Hi, i finally managed to find out how i can simulate configuration changes and see their results before committing them. OMG. That makes live much more relaxed. I need to change the configuration of a resource which is part of a group, the group is running as a clone on all nodes. Unfortunately

Re: [ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

2019-10-02 Thread Lentes, Bernd
- Am 1. Okt 2019 um 14:29 schrieb Yan Gao y...@suse.com: > On 10/1/19 1:37 PM, Lentes, Bernd wrote: >> >> - On Oct 1, 2019, at 12:26 PM, Yan Gao y...@suse.com wrote: >> >> Currently i'm running SLES 12 SP4. Is it worth thinking about Updating t

Re: [ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

2019-10-01 Thread Lentes, Bernd
- On Oct 1, 2019, at 12:26 PM, Yan Gao y...@suse.com wrote: > On 9/30/19 6:45 PM, Lentes, Bernd wrote: > The behavior/idea about cleanup makes more sense in pacemaker-2.0 > (SLE-HA 15 releases). It does *real* cleanup only if a resource has any > failures. Currently i'

Re: [ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

2019-09-30 Thread Lentes, Bernd
>> >> Hi Yan, >> I had a look in the logs and what happened when i issued a "resource >> cleanup" of >> the GFS2 resource is >> that the cluster deleted an entry in the status section: >> >> Sep 26 14:52:52 [9317] ha-idg-2cib: info: cib_process_request: >> Completed cib_delete operat

Re: [ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

2019-09-27 Thread Lentes, Bernd
- On Sep 26, 2019, at 5:19 PM, Yan Gao y...@suse.com wrote: > Hi, > > On 9/26/19 3:25 PM, Lentes, Bernd wrote: >> HI, >> >> i had two errors with a GSF2 Partition several days ago: >> gfs2_share_monitor_3 on ha-idg-2 'unknown error' (1): ca

[ClusterLabs] cleanup of a resource leads to restart of Virtual Domains

2019-09-26 Thread Lentes, Bernd
HI, i had two errors with a GSF2 Partition several days ago: gfs2_share_monitor_3 on ha-idg-2 'unknown error' (1): call=103, status=Timed Out, exitreason='', last-rc-change='Thu Sep 19 13:44:22 2019', queued=0ms, exec=0ms gfs2_share_monitor_3 on ha-idg-1 'unknown error' (1): call=103

Re: [ClusterLabs] why is node fenced ?

2019-08-15 Thread Lentes, Bernd
- Am 14. Aug 2019 um 19:07 schrieb kgaillot kgail...@redhat.com: >> That's my setting: >> >> expected_votes: 2 >> two_node: 1 >> wait_for_all: 0 >> >> no-quorum-policy=ignore >> >> I did that because i want be able to start the cluster although one >> node has e.g. a hardware pr

Re: [ClusterLabs] why is node fenced ?

2019-08-14 Thread Lentes, Bernd
- On Aug 13, 2019, at 1:19 AM, kgaillot kgail...@redhat.com wrote: > > The key messages are: > > Aug 09 17:43:27 [6326] ha-idg-1 crmd: info: crm_timer_popped: > Election > Trigger (I_DC_TIMEOUT) just popped (2ms) > Aug 09 17:43:27 [6326] ha-idg-1 crmd: warning: do_l

Re: [ClusterLabs] Antw: Re: Antw: Re: why is node fenced ?

2019-08-14 Thread Lentes, Bernd
- On Aug 14, 2019, at 8:25 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > But why do the eth interfaces on both nodes come up the same second > (2019-08-09T17:42:19)? > The respective eth's of the two bonds of the two hosts are connected directly to each other. Just a wir

Re: [ClusterLabs] Antw: Re: why is node fenced ?

2019-08-13 Thread Lentes, Bernd
- On Aug 13, 2019, at 3:14 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > You said you booted the hosts sequentially. From the logs they were starting > in > parallel. > No. last says: ha-idg-1: reboot system boot 4.12.14-95.29-de Fri Aug 9 17:42 - 15:56 (3+22:14) ha-i

Re: [ClusterLabs] why is node fenced ?

2019-08-13 Thread Lentes, Bernd
- On Aug 13, 2019, at 3:34 PM, Matthias Ferdinand m...@14v.de wrote: >> 17:26:35 crm node standby ha-idg1- > > if that is not a copy&paste error (ha-idg1- vs. ha-idg-1), then ha-idg-1 > was not set to standby, and installing updates may have done some > meddling with corosync/pacemaker (li

Re: [ClusterLabs] why is node fenced ?

2019-08-13 Thread Lentes, Bernd
- On Aug 12, 2019, at 7:47 PM, Chris Walker cwal...@cray.com wrote: > When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for > example, > > Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: > pcmk_quorum_notification: > Quorum retained | membership=1320 members=1 > >

Re: [ClusterLabs] Antw: why is node fenced ?

2019-08-13 Thread Lentes, Bernd
- On Aug 13, 2019, at 9:00 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > Personally I feel more save with updates when the whole cluster node is > offline, not standby. When you are going to boot anyway, it won't make much of > a difference. Also you don't have to remember to

[ClusterLabs] why is node fenced ?

2019-08-12 Thread Lentes, Bernd
Hi, last Friday (9th of August) i had to install patches on my two-node cluster. I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), patched it, rebooted, started the cluster (systemctl start pacemaker) again, put the node again online, everything fine. Then i wanted to

Re: [ClusterLabs] "resource cleanup" - but error message does not dissapear

2019-07-30 Thread Lentes, Bernd
- Am 30. Jul 2019 um 21:07 schrieb kgaillot kgail...@redhat.com: > > There was a regression in 1.1.20 and 2.0.0 (fixed in the next versions) > where cleanups of multiple errors would miss some of them. Any chance > you're using one of those? I'm afraid not. Is there another way to get rid o

[ClusterLabs] "resource cleanup" - but error message does not dissapear

2019-07-30 Thread Lentes, Bernd
Hi, i always have on one of my cluster nodes "crm_mon -nfrALm 3" running in a ssh session, which gives a good and short overview of the status of the cluster. I just had some problems in live migrating some VirtualDomains. These are the errors i see: Failed Resource Actions: * vm_genetrap_migrate

[ClusterLabs] tracing stop of clvm resource - but only having one logfile

2019-07-30 Thread Lentes, Bernd
Hi, sometimes my clvm resource does not stop cleanly so the respective node is fenced. To investigate that further i set a "trace stop" on that resource: ha-idg-2:~ # cibadmin -Q |grep -i trace Is that correct ? But setting now already several times both nodes into standby mod

Re: [ClusterLabs] two virtual domains start and stop every 15 minutes

2019-07-05 Thread Lentes, Bernd
- On Jul 4, 2019, at 1:25 AM, kgaillot kgail...@redhat.com wrote: > On Wed, 2019-06-19 at 18:46 +0200, Lentes, Bernd wrote: >> - On Jun 15, 2019, at 4:30 PM, Bernd Lentes >> bernd.len...@helmholtz-muenchen.de wrote: >> >> > - Am 14. Jun 2019 u

Re: [ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

2019-06-24 Thread Lentes, Bernd
- On Jun 23, 2019, at 1:40 PM, Somanath Jeeva somanath.je...@ericsson.com wrote: > Hi All, > I have a two node cluster with multicast (udp) transport . The multicast IP > used > in 224.1.1.1 . > Whenever there is a CPU intensive task the pcs cluster goes into split brain > scenario and doe

Re: [ClusterLabs] two virtual domains start and stop every 15 minutes

2019-06-19 Thread Lentes, Bernd
- On Jun 15, 2019, at 4:30 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > - Am 14. Jun 2019 um 21:20 schrieb kgaillot kgail...@redhat.com: > >> On Fri, 2019-06-14 at 18:27 +0200, Lentes, Bernd wrote: >>> Hi, >>> >>> i had that

Re: [ClusterLabs] two virtual domains start and stop every 15 minutes

2019-06-15 Thread Lentes, Bernd
- Am 14. Jun 2019 um 21:20 schrieb kgaillot kgail...@redhat.com: > On Fri, 2019-06-14 at 18:27 +0200, Lentes, Bernd wrote: >> Hi, >> >> i had that problem already once but still it's not clear for me what >> really happens. >> I had this problem som

[ClusterLabs] two virtual domains start and stop every 15 minutes

2019-06-14 Thread Lentes, Bernd
Hi, i had that problem already once but still it's not clear for me what really happens. I had this problem some days ago: I have a 2-node cluster with several virtual domains as resources. I put one node (ha-idg-2) into standby, and two running virtual domains were migrated to the other node (

Re: [ClusterLabs] Antw: why is node fenced ?

2019-05-23 Thread Lentes, Bernd
- On May 20, 2019, at 8:28 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >>>> "Lentes, Bernd" schrieb am 16.05.2019 > um > 17:10 in Nachricht > <1151882511.6631123.1558019430655.javamail.zim...@helmholtz-muenchen.de>: >> Hi, >&g

[ClusterLabs] why is node fenced ?

2019-05-16 Thread Lentes, Bernd
Hi, my HA-Cluster with two nodes fenced one on 14th of may. ha-idg-1 has been the DC, ha-idg-2 was fenced. It happened around 11:30 am. The log from the fenced one isn't really informative: == 2019-05-14T11:22:09.948980+02:00 ha-idg-2 liblogging-stdlog: -- MARK --

Re: [ClusterLabs] crm_mon output to html-file - is there a way to manipulate the html-file ?

2019-05-03 Thread Lentes, Bernd
- Am 3. Mai 2019 um 22:32 schrieb Bernd Lentes bernd.len...@helmholtz-muenchen.de: >> >> For now, I guess you'll have to post-process it with sed or something. > > I don't know much about cgi-scripts. With -w i can write the output from > crm_mon > to a cgi-script. > Wouldn't it be possib

Re: [ClusterLabs] crm_mon output to html-file - is there a way to manipulate the html-file ?

2019-05-03 Thread Lentes, Bernd
- Am 3. Mai 2019 um 19:37 schrieb Christopher Lumens clum...@redhat.com: >> The output of the webpage is quite nice, you get a short and quick overview. >> Only some inactive resources are displayed in yellow which is impossible to >> read. >> >> Is there a way to configure the output of th

[ClusterLabs] crm_mon output to html-file - is there a way to manipulate the html-file ?

2019-05-03 Thread Lentes, Bernd
Hi, on my cluster nodes i established a systemd service which starts crm_mon which writes cluster information into a html-file so i can see the state of my cluster in a webbrowser. crm_mon is started that way: /usr/sbin/crm_mon -d -i 10 -h /srv/www/hawk/public/crm_mon.html -m3 -nrfotAL -p /va

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-21 Thread Lentes, Bernd
- Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidj...@gmail.com: > 20.04.2019 22:29, Lentes, Bernd пишет: >> >> >> - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com: >> >>> >>> Simply stopping pacemaker and corosync by

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-20 Thread Lentes, Bernd
- Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com: > > Simply stopping pacemaker and corosync by whatever mechanism your > distribution uses (e.g. systemctl) should be sufficient. That works. But strangely is that after a reboot both nodes are shown as UNCLEAN. Does the clus

[ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-18 Thread Lentes, Bernd
Hi, i have a two-node cluster, both servers are buffered by an UPS. If power is gone the UPS sends after a configurable time a signal via network to shutdown the servers. The UPS-Software (APC Power Chute Network Shutdown) gives me on the host the possibility to run scripts before it shuts down.

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Lentes, Bernd
- On Jan 23, 2019, at 3:20 PM, Klaus Wenninger kwenn...@redhat.com wrote: >> I have corosync-2.3.6-9.13.1.x86_64. >> Where can i configure this value ? > > speaking of two_node & wait_for_all? > That is configured in the quorum-section of corosync.conf: > > quorum { > ... >   wait_for_all: 1

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Lentes, Bernd
- On Jan 22, 2019, at 6:35 PM, Andrei Borzenkov arvidj...@gmail.com wrote: > This is another problem - if cluster requires stonith, it won't statr > resources with another node UNCLEAN and fencing attempt apparently failed. Let's assume a running two-node cluster. Now node1 needs to be fe

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Lentes, Bernd
- On Jan 22, 2019, at 9:24 PM, kgaillot kgail...@redhat.com wrote: >> > Good plan, though perhaps there should be some allowance for the >> > case >> > in which only node1 is running when the power dies. Yes, i will take care of that. > Good point, I missed that. If you're sure the target

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-23 Thread Lentes, Bernd
- On Jan 22, 2019, at 6:00 PM, kgaillot kgail...@redhat.com wrote: > On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote: >> Now the restart, which makes me trouble. >> Currently i want to restart the cluster manually, because i'm not >> completly familia

[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-22 Thread Lentes, Bernd
Hi, we have a new UPS which has enough charge to provide our 2-node cluster with the periphery (SAN, switches ...) for a resonable time. I'm currently thinking of the shutdown- and restart-procedure of the complete cluster when the power is lost and does not come back soon. Then cluster is provi

[ClusterLabs] live migration rarely fails seemingly without reason

2018-12-03 Thread Lentes, Bernd
Hi, i have a two node cluster with several VirtualDomains as resources. Normally live migration is no problem. But rarely it fails, without giving any reasonable message in the logs. I tried to migrate several VirtualDmains concurrently from ha-idg-2 to ha-idg-1. One VirtualDomain failed, the

Re: [ClusterLabs] crm resource stop VirtualDomain - how to know when/if VirtualDomain is really stopped ?

2018-10-11 Thread Lentes, Bernd
- On Oct 11, 2018, at 4:26 PM, Kristoffer Grönlund kgronl...@suse.de wrote: > On Thu, 2018-10-11 at 13:59 +0200, Lentes, Bernd wrote: >> Hi, >> >> i'm trying to write a script which shutdown my VirtualDomains in the >> night for a short period to take a cl

[ClusterLabs] crm resource stop VirtualDomain - how to know when/if VirtualDomain is really stopped ?

2018-10-11 Thread Lentes, Bernd
Hi, i'm trying to write a script which shutdown my VirtualDomains in the night for a short period to take a clean snapshot with libvirt. To shut them down i can use "crm resource stop VirtualDomain". But when i do a "crm resource stop VirtualDomain" in my script, the command returns immediately

Re: [ClusterLabs] migrating VirtualDomain starts migration of second VirtualDomain in reverse direction

2018-09-21 Thread Lentes, Bernd
- On Sep 20, 2018, at 6:58 PM, kgaillot kgail...@redhat.com wrote: > OK, drop "default-" and it should work. The names in rsc_defaults are > identical to what you'd use in the resource meta-data. Now it's working. Thanks Ken. Bernd Helmholtz Zentrum München __

Re: [ClusterLabs] migrating VirtualDomain starts migration of second VirtualDomain in reverse direction

2018-09-20 Thread Lentes, Bernd
Ken wrote: > > I think you meant default-resource-stickiness ... and even that's > deprecated in 1.1 and gone in 2.0. :-) > > The proper way is to set resource-stickiness in the rsc_defaults > section (however that's done using your tools of choice). > -- > Ken Gaillot Hi, i did it that way:

[ClusterLabs] migrating VirtualDomain starts migration of second VirtualDomain in reverse direction

2018-09-20 Thread Lentes, Bernd
Hi, i have a two-node cluster with several VirtualDomain resources. Scenario: Two VirtualDomains, running on different nodes. Migrating one VirtualDomain from node 1 to node 2 migrates the other VirtualDomain from node 2 to node 1. These are the scores AFTER the migration: ... vm_mausdb

Re: [ClusterLabs] Antw: VirtualDomain as resources and OCFS2

2018-09-11 Thread Lentes, Bernd
- On Sep 11, 2018, at 8:54 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > Hi Bernd, > > the disappointing answer ist this: With cLVM you cannot make snapshots of the > LVs (easily), and in SLES11 SP4 at least the tool to make snapshots of OCFS2 > also isn't provided. So the m

Re: [ClusterLabs] VirtualDomain as resources and OCFS2

2018-09-11 Thread Lentes, Bernd
- On Sep 11, 2018, at 4:29 AM, Gang He g...@suse.com wrote: > Hello Lentes, > > It does not look like a OCFS2 or pacemaker problem, more like virtualization > problem. > From OCFS2/LVM2 perspective, if you use one LV for one VirtualDomain, that > means > the guest VMs on that VirtualDomai

[ClusterLabs] VirtualDomain as resources and OCFS2

2018-09-10 Thread Lentes, Bernd
Hi, i'm establishing a cluster with virtual guests as resources which should reside in a raw files on OCFS2 formatted logical volumes. My first idea was to create for each VirtualDomain its own logical volume, i thought that would be well-structured. But now i realize that my cluster configurat

Re: [ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-06 Thread Lentes, Bernd
- On Sep 5, 2018, at 6:58 PM, FeldHost™ Admin ad...@feldhost.cz wrote: > Why you use FS for raw image, when you can use directly LV as block device for > your VM > Because i want to make snapshots with virsh or qemu-img. I think i can't do that with a naked block device. Bernd Helmhol

Re: [ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-05 Thread Lentes, Bernd
- On Sep 5, 2018, at 6:28 PM, FeldHost™ Admin ad...@feldhost.cz wrote: > hello, yes, you need ocfs2 or gfs2, but in your case (raw image) probably > better > to use lvm I use cLVM. The fs for the raw image resides on a clustered VG/LV. But nevertheless i still need a cluster fs because of

[ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-05 Thread Lentes, Bernd
Hi guys, just to be sure. I thought (maybe i'm wrong) that having a VM on a shared storage (FC SAN), e.g. in a raw file on an ext3 fs on that SAN allows live-migration because pacemaker takes care that the ext3 fs is at any time only mounted on one node. I tried it, but "live"-migration wasn't

Re: [ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-16 Thread Lentes, Bernd
- On Mar 15, 2018, at 3:47 AM, Gang He g...@suse.com wrote: > Hello Lentes, > > >> Hi, >> >> i have a 2-node-cluster with my services (web, db) running in VirtualDomain >> resources. >> I have a SAN with cLVM, each guest lies in a dedicated logical volume with >> an ext3 fs. >> >> C

Re: [ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-16 Thread Lentes, Bernd
- On Mar 15, 2018, at 3:47 AM, Gang He g...@suse.com wrote: > Just one comments, you have to make sure the VM file integrity before calling > reflink. > Hi Gang, how could i achieve that ? sync ? The disks of the VM's are configured without cache, otherwise they can't be live migrated.

Re: [ClusterLabs] Antw: snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-14 Thread Lentes, Bernd
- On Mar 14, 2018, at 11:54 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > Hi! > > IMHO the only clean solution would be this procedure: > 1) pause the VMs and cause them to flush their disk buffers, or at least make > sure the writes of the VM guest arrived at the VM host's buf

[ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-14 Thread Lentes, Bernd
Hi, i have a 2-node-cluster with my services (web, db) running in VirtualDomain resources. I have a SAN with cLVM, each guest lies in a dedicated logical volume with an ext3 fs. Currently i'm thinking about snapshoting the guests to make a backup in the background. With cLVM that's not possibl

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 10:57 PM, kgaillot kgail...@redhat.com wrote: >> from the Changelog: >> >> Changes since Pacemaker-1.1.15 >>   ... >>   + pengine: do not fence a node in maintenance mode if it shuts down >> cleanly >>   ... >> >> just saying ... may or may not be what you are seeing.

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 9:27 PM, Digimer li...@alteeve.ca wrote: > > I understood what you meant about it getting fenced after stopping > corosync. What I am not clear on is if you are stopping corosync on the > normal node, or the node that is in maintenance mode. > > In either case, as I

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-16 Thread Lentes, Bernd
- On Oct 16, 2017, at 7:37 PM, emmanuel segura emi2f...@gmail.com wrote: > I put a node in maintenance mode? > do you mean you put the cluster in maintenance mode I did "crm node maintenance ". From my understanding that means that i put the node in maintenance mode. Bernd Helmholtz

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-16 Thread Lentes, Bernd
- On Oct 16, 2017, at 7:38 PM, Digimer li...@alteeve.ca wrote: > On 2017-10-16 01:24 PM, Lentes, Bernd wrote: >> Hi, >> >> i have the following behavior: I put a node in maintenance mode, afterwards >> stop >> corosync on that node with /etc/ini

[ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-16 Thread Lentes, Bernd
Hi, i have the following behavior: I put a node in maintenance mode, afterwards stop corosync on that node with /etc/init.d/openais stop. This node is immediately fenced. Is that expected behavior ? I thought putting a node into maintenance does mean the cluster does not care anymore about that

[ClusterLabs] DRBD, dual-primary, Live-Migration - Cluster FS necessary ?

2017-09-06 Thread Lentes, Bernd
Hi, i just want to be sure. I created a DRBD partition in a dual primary setup. I have a VirtualDomain (KVM) resource which resides in the naked DRBD (without FS), and i can live migrate. Are there situations where in this setup a cluster fs is necessary/recommended ? I'd like to avoid it, it c

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-16 Thread Lentes, Bernd
> Hi, > > > What happened: > I tried to configure a simple drbd resource following > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296 > I used this simple snip from the doc: > configure primitive WebData ocf:linbit:drbd param

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-10 Thread Lentes, Bernd
- On Aug 10, 2017, at 2:11 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: > > if you use crmsh "interactively", > crmsh does implicitly use a shadow cib, > and will only commit changes once you "commit", > see "crm configure help commit" > Hi, i tested it: First try: crm(live)# con

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-09 Thread Lentes, Bernd
- Am 8. Aug 2017 um 15:36 schrieb Lars Ellenberg lars.ellenb...@linbit.com: > crm shell in "auto-commit"? > never seen that. i googled for "crmsh autocommit pacemaker" and found that: https://github.com/ClusterLabs/crmsh/blob/master/ChangeLog See line 650. Don't know what that means. > >

Re: [ClusterLabs] Antw: Re: Antw: Re: big trouble with a DRBD resource

2017-08-08 Thread Lentes, Bernd
- On Aug 8, 2017, at 9:42 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > Maybe just be concrete with your questions, so it's much easier to provide > useful answers. > Which question is not concrete ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer G

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-08 Thread Lentes, Bernd
- On Aug 7, 2017, at 10:43 PM, kgaillot kgail...@redhat.com wrote: > > The logs are very useful, but not particularly easy to follow. It takes > some practice and experience, but I think it's worth it if you have to > troubleshoot cluster events often. I will give my best. > > It's on the

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-08 Thread Lentes, Bernd
- On Aug 7, 2017, at 10:26 PM, kgaillot kgail...@redhat.com wrote: > > Unmanaging doesn't stop monitoring a resource, it only prevents starting > and stopping of the resource. That lets you see the current status, even > if you're in the middle of maintenance or what not. You can disable >

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-07 Thread Lentes, Bernd
- On Aug 4, 2017, at 10:19 PM, kgaillot kgail...@redhat.com wrote: > The cluster reacted promptly: > crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params > drbd_resource=idcc-devel \ >> op monitor interval=60 > WARNING: prim_drbd_idcc_devel: default timeout 20s for s

Re: [ClusterLabs] Antw: Re: big trouble with a DRBD resource

2017-08-07 Thread Lentes, Bernd
- On Aug 7, 2017, at 3:43 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >> >>> >>> The "ERROR" message is coming from the DRBD resource agent itself, not >>> pacemaker. Between that message and the two separate monitor operations, >>> it looks like the agent will only run as

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-07 Thread Lentes, Bernd
- On Aug 4, 2017, at 10:19 PM, kgaillot kgail...@redhat.com wrote: > > The "ERROR" message is coming from the DRBD resource agent itself, not > pacemaker. Between that message and the two separate monitor operations, > it looks like the agent will only run as a master/slave clone. Btw: Does

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-07 Thread Lentes, Bernd
- On Aug 6, 2017, at 12:05 PM, Kristoffer Grönlund kgronl...@suse.com wrote: >> What happened: >> I tried to configure a simple drbd resource following >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296 >> I used this simpl

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-07 Thread Lentes, Bernd
- On Aug 4, 2017, at 10:19 PM, Ken Gaillot kgail...@redhat.com wrote: > > Unfortunately no -- logging, and troubleshooting in general, is an area > we are continually striving to improve, but there are more to-do's than > time to do them. sad but comprehensible. Is it worth trying to unders

[ClusterLabs] big trouble with a DRBD resource

2017-08-04 Thread Lentes, Bernd
Hi, first: is there a tutorial or s.th. else which helps in understanding what pacemaker logs in syslog and /var/log/cluster/corosync.log ? I try hard to find out what's going wrong, but they are difficult to understand, also because of the amount of information. Or should i deal more with "crm

Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Lentes, Bernd
- On Aug 2, 2017, at 10:42 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > I thought the cluster does not perform actions that are not defined in the > configuration (e.g. "monitor"). I think the cluster performs and configures automatically start/stop operations if not d

[ClusterLabs] from where does the default value for start/stop op of a resource come ?

2017-08-01 Thread Lentes, Bernd
Hi, i'm wondering from where the default values for operations of a resource come from. I tried to configure: crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params drbd_resource=idcc-devel \ > op monitor interval=60 WARNING: prim_drbd_idcc_devel: default timeout 20s for

Re: [ClusterLabs] Antw: DRBD AND cLVM ???

2017-08-01 Thread Lentes, Bernd
- On Aug 1, 2017, at 8:06 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >>>> "Lentes, Bernd" schrieb am 31.07.2017 > um > 18:51 in Nachricht > <641329685.12981098.1501519915026.javamail.zim...@helmholtz-muenchen.de>: >> Hi, >>

[ClusterLabs] DRBD AND cLVM ???

2017-07-31 Thread Lentes, Bernd
Hi, i'm currently a bit confused. I have several resources running as VirtualDomains, the vm reside on plain logical volumes without fs, these lv's reside themself on a FC SAN. In that scenario i need cLVM to distribute the lvm metadata between the nodes. For playing around a bit and getting us

[ClusterLabs] resources do not migrate although node is going to standby

2017-07-24 Thread Lentes, Bernd
Hi, just to be sure: i have a VirtualDomain resource (called prim_vm_servers_alive) running on one node (ha-idg-2). From reasons i don't remember i have a location constraint: location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started inf: ha-idg-2 Now i try to set this node i

[ClusterLabs] timeout for stop VirtualDomain running Windows 7

2017-07-24 Thread Lentes, Bernd
Hi, i have a VirtualDomian resource running a Windows 7 client. This is the respective configuration: primitive prim_vm_servers_alive VirtualDomain \ params config="/var/lib/libvirt/images/xml/Server_Monitoring.xml" \ params hypervisor="qemu:///system" \ params migration_

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Lentes, Bernd
> > If you have DRBD (PV) -> Clustered VG -> LV per VM, and you have > dual-primary DRBD, you can already do a live migration. > What is about PV -> clustered VG -> LV -> DRBD ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter La

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Lentes, Bernd
>> >> On SLES 11 SP4 HAE DRBD 8.4 is available. Do i need a cluster fs on top of a >> dual primary DRBD ? >> I assume. >> >> >> Bernd > > Depends. > > If you want to have a shared FS, yes. If you want to back VMs though, we > use clustered LVM to manage the DRBD space, creating per-VM LVs, an

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Lentes, Bernd
>>> >> >> Is with DRBD and Virtual Machines live migration possible ? >> >> >> Bernd > > yes, but dual-primary is needed (this is how the Anvil! does live > migration). With DRBD 9, you can set it up to momentarily do > dual-primary to support live migration, though I have not used this > my

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Lentes, Bernd
- On Jul 17, 2017, at 11:51 AM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i established a two node cluster with two HP servers and SLES 11 SP4. I'd like > to start now with a test period. Resources are virtual machines. The vm's > reside on a FC SAN. The SAN has two

Re: [ClusterLabs] did anyone manage to combine ClusterMon RA with HP systems insight manager ? - SOLVED

2017-07-17 Thread Lentes, Bernd
- On Jul 11, 2017, at 7:25 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i established a two node cluster and i'd like to start now a test period with > some not very important resources. > I'd like to monitor the cluster via SNMP, so i realize if he's e.g. migrating

[ClusterLabs] DRBD or SAN ?

2017-07-17 Thread Lentes, Bernd
Hi, i established a two node cluster with two HP servers and SLES 11 SP4. I'd like to start now with a test period. Resources are virtual machines. The vm's reside on a FC SAN. The SAN has two power supplies, two storage controller, two network interfaces for configuration. Each storage control

[ClusterLabs] did anyone manage to combine ClusterMon RA with HP systems insight manager ?

2017-07-11 Thread Lentes, Bernd
Hi, i established a two node cluster and i'd like to start now a test period with some not very important resources. I'd like to monitor the cluster via SNMP, so i realize if he's e.g. migrating. I followed http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/ind

[ClusterLabs] order configured - restart of the cluster - order dissapeared ???

2017-06-27 Thread Lentes, Bernd
Hi, i configured an order so that a simple virtual machine is started after some other resources are started. That was how i configured it: configure order order_clone_group_prim_dlm_prim_vm_idcc-devel clone_group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml prim_vm_idcc_devel I did this twice

[ClusterLabs] what is the best practice for removing a node temporary (e.g. for installing updates) ?

2017-06-19 Thread Lentes, Bernd
Hi, what would you consider to be the best way for removing a node temporary from the cluster, e.g. for installing updates ? I thought "crm node maintenance node" would be the right way, but i was astonished that the resources keep running on it. I would have expected that the resources stop. I

[ClusterLabs] SOLVED: Re: how to set a dedicated fence delay for a stonith agent ?

2017-05-22 Thread Lentes, Bernd
- On May 17, 2017, at 4:24 PM, Dmitri Maziuk dmitri.maz...@gmail.com wrote: > On 2017-05-17 06:24, Lentes, Bernd wrote: >> > ... >> I'd like to know what the software is use is doing. Am i the only one having >> that opinion ? > > No. > >> How

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Lentes, Bernd
- On May 17, 2017, at 2:58 PM, Klaus Wenninger kwenn...@redhat.com wrote: >> I don't see that. > > fence_* are the RHCS-style fence-agents coming mainly from > https://github.com/ClusterLabs/fence-agents. > Ah. Ok, i see that. Do you know if they cooperate with a SuSE HAE ? I found rpm'

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Lentes, Bernd
- On May 17, 2017, at 2:11 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: > 08.05.2017 22:20, Lentes, Bernd wrote: >> Hi, >> >> i remember that digimer often campaigns for a fence delay in a 2-node >> cluster. >> E.g. here: >> http://oss.cluste

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Lentes, Bernd
- On May 10, 2017, at 9:15 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: > On 05/10/2017 01:54 PM, Ken Gaillot wrote: >> On 05/10/2017 12:26 PM, Dimitri Maziuk wrote: > >>> - fencing in 2-node clusters does not work reliably without fixed delay >> >> Not quite. Fixed delay allows a parti

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-09 Thread Lentes, Bernd
- On May 8, 2017, at 9:20 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i remember that digimer often campaigns for a fence delay in a 2-node > cluster. > E.g. here: > http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html > In my eyes it makes sense

[ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-08 Thread Lentes, Bernd
Hi, i remember that digimer often campaigns for a fence delay in a 2-node cluster. E.g. here: http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html In my eyes it makes sense, so i try to establish that. I have two HP servers, each with an ILO card. I have to use the stonith:extern

Re: [ClusterLabs] crm_mon -h (writing to a html-file) not showing all desired information and having trouble with the -d option

2017-05-08 Thread Lentes, Bernd
- On May 8, 2017, at 6:44 PM, Ken Gaillot kgail...@redhat.com wrote: >> >> This is the file without -d: >> >> ha-idg-2:/srv/www/hawk/public # stat crm_mon.html >> File: `crm_mon.html' >> Size: 1963Blocks: 8 IO Block: 4096 regular file >> Device: 1fh/31d Inode: 7

[ClusterLabs] crm_mon -h (writing to a html-file) not showing all desired information and having trouble with the -d option

2017-05-08 Thread Lentes, Bernd
Hi, playing around with my cluster i always have a shell with crm_mon running because it provides me a lot of useful and current information concerning cluster, nodes, resources ... Normally i have a "crm_mon -nrfRAL" running. I'd like to have that output as a web page too. So i tried the option

[ClusterLabs] SOLVED: Antw: Re: Antw: can't live migrate VirtualDomain which is part of a group

2017-05-08 Thread Lentes, Bernd
- On Apr 25, 2017, at 1:37 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: >>>> "Lentes, Bernd" schrieb am 25.04.2017 >>>> um > 11:02 in Nachricht > <406563603.26964612.1493110931994.javamail.zim...@helmholtz-muenchen.de>: >

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-25 Thread Lentes, Bernd
- On Apr 24, 2017, at 11:11 PM, Ken Gaillot kgail...@redhat.com wrote: > On 04/24/2017 02:33 PM, Lentes, Bernd wrote: >> >> - On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote: >> >>>>> primitive prim_vnc_ip_mausdb IPaddr \ >&g

Re: [ClusterLabs] Antw: can't live migrate VirtualDomain which is part of a group

2017-04-25 Thread Lentes, Bernd
- On Apr 25, 2017, at 8:08 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > Berdn > > you are long enough on this list to know that the reason for your failure is > most likely to be found in the logs which you did not provide. Couldn't you > find out yourself from the logs? >

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Lentes, Bernd
- On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote: >>> >>> >>> primitive prim_vnc_ip_mausdb IPaddr \ >>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ >>>meta is-managed=true > > I don't see allow-migrate on the IP. Is this a modified IPaddr? The > st

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Lentes, Bernd
- On Apr 24, 2017, at 8:26 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i have a primitive VirtualDomain resource which i can live migrate without any > problem. > Additionally i have an IP as a resource which i can live mirgate easily too. > If i combine them in a

<    1   2   3   >