Re: [ClusterLabs] crmsh 3.0 availability for RHEL 7..4?

2018-08-02 Thread FeldHostAdmin
https://software.opensuse.org/download.html?project=network%3Aha-clustering%3AStable=crmsh

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby prispôsobíme vám. Máte 
špecifické požiadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 2 Aug 2018, at 20:55, Ron Kerry  wrote:
> 
> Is there a pre-packaged rpm available that will work on RHEL 7.4.
> 
> Right now all I have is a much older crmsh-2.1+git98 package. This does not 
> work for configuration purposes on RHEL7.4. I get these sorts of errors 
> trying to configure resources.
> 
> [root@node2 ~]# crm configure edit
> ERROR: CIB not supported: validator 'pacemaker-2.10', release '3.0.14'
> ERROR: You may try the upgrade command
> ERROR: configure: Missing requirements
> 
> The upgrade command, of course, does not work. I can configure things with 
> pcs, but I already have a pre-defined crm style configuration text file with 
> all my primitives, groups, clones and constraints that I would like to be 
> able to use directly.
> 
> -- 
> 
> Ron Kerry
> ron.ke...@hpe.com
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker confirm that node was fenced successfully

2018-08-11 Thread FeldHostAdmin
Hi all, I have question:

We have Corosync/Pacemaker cluster running for KVM virtualisation. VM Instances 
are managed by external software (Opennebula). To achieve automatic migration 
of running VMs from failed node, external sw need fence node and confirm that 
was fenced successfully. When node fail, it will be fenced by cluster stack, so 
I don't want to do second fencing by external sw, but instead I want to connect 
to alive nodes and get fencing status. Is there some API in pacemaker to get it 
or I need to somehow parse result from pcs/crmsh?

Thank you for reply.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby prispôsobíme vám. Máte 
špecifické požiadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker confirm that node was fenced successfully

2018-08-13 Thread FeldHostAdmin
Hello, thanks for reply, so basiclly, can I leverage existing cli tools and do 
for ex. call crm node fence xyz?

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby prispôsobíme vám. Máte 
špecifické požiadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 13 Aug 2018, at 17:15, Ken Gaillot  wrote:
> 
> On Sat, 2018-08-11 at 17:38 +0200, FeldHost™ Admin wrote:
>> Hi all, I have question:
>> 
>> We have Corosync/Pacemaker cluster running for KVM virtualisation. VM
>> Instances are managed by external software (Opennebula). To achieve
>> automatic migration of running VMs from failed node, external sw need
>> fence node and confirm that was fenced successfully. When node fail,
>> it will be fenced by cluster stack, so I don't want to do second
>> fencing by external sw, but instead I want to connect to alive nodes
>> and get fencing status. Is there some API in pacemaker to get it or I
>> need to somehow parse result from pcs/crmsh?
> 
> Hi,
> 
> You can call stonith_admin, or use the pacemaker fencing client API
> (used for example by DLM). The API documentation is rudimentary:
> 
> http://clusterlabs.org/pacemaker/doxygen/Pacemaker-1.1.18/stonith-ng_8h.html 
> <http://clusterlabs.org/pacemaker/doxygen/Pacemaker-1.1.18/stonith-ng_8h.html>
> 
> Be sure you click on "stonith_api_operations_s" for descriptions of the
> API functions.
> 
> Whether you use the API or stonith_admin, pacemaker will not fence
> twice if a request arrives while another identical request is in
> progress.
> 
>> 
>> Thank you for reply.
>> 
>> S pozdravem Kristián Feldsam
>> Tel.: +420 773 303 353, +421 944 137 535
>> E-mail.: supp...@feldhost.cz
>> 
>> www.feldhost.cz - FeldHost™ – Hostingové služby
>> prispôsobíme vám. Máte špecifické požiadavky? Poradíme si s nimi.
>> 
>> FELDSAM s.r.o.
>> V rohu 434/3
>> Praha 4 – Libuš, PSČ 142 00
>> IČ: 290 60 958, DIČ: CZ290 60 958
>> C 200350 vedená u Městského soudu v Praze
>> 
>> Banka: Fio banka a.s.
>> Číslo účtu: 2400330446/2010
>> BIC: FIOBCZPPXX
>> IBAN: CZ82 2010  0024 0033 0446
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
>> pdf
>> Bugs: http://bugs.clusterlabs.org
> -- 
> Ken Gaillot mailto:kgail...@redhat.com>>
> ___
> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users 
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] 2 Node Active-Passive DRBD , fallback fencing issues.

2018-08-17 Thread FeldHostAdmin
Would you share your corosync.conf and pacemaker config (pcs cluster cib)?

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby prispôsobíme vám. Máte 
špecifické požiadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 17 Aug 2018, at 17:43, Jayesh Shinde  wrote:
> 
> Hello All , 
> 
> I have configured 2 node Active - Passive DRBD cluster with HP ProLiant DL380 
> Gen9  , CentOS 7.3  Pacemaker + corosync + pcsd  + fence_ilo4_ssh 
> 
> When I am rebooting "Active server"  all resources are getting move the 
> "Slave server"  and starting all respective services properly. 
> But when "Active server" boot again  then it fencing "Slave server" ( i.e 
> reboot ) and all resources are getting fall back to "Active server" . 
> 
> I want to know whether its default behaviors or issue .   
> 
> My requirement is once "Master" switch to "Slave" , then all services need to 
> work from "Slave" only.  
> If in-case "Slave server" down or reboot , then "Master Server" should take 
> care of all resources and vice-versa ...
> 
> Please guide. 
> 
> Below are config details 
> 
> [root@master ~]# pcs property list --all|grep stonith
>  stonith-action: reboot
>  stonith-enabled: true
>  stonith-max-attempts: 10
>  stonith-timeout: 60s
>  stonith-watchdog-timeout: (null)
> 
> [root@master ~]# pcs property list --all | grep no-quorum-policy
>  no-quorum-policy: ignore
> 
> [root@master ~]# pcs resource defaults
> resource-stickiness: 100
> 
> Below is Package Information :-- 
> 
> kernel-3.10.0-862.9.1.el7.x86_64
> 
> pacemaker-libs-1.1.18-11.el7_5.3.x86_64
> corosync-2.4.3-2.el7_5.1.x86_64
> pacemaker-cli-1.1.18-11.el7_5.3.x86_64
> pacemaker-cluster-libs-1.1.18-11.el7_5.3.x86_64
> pacemaker-1.1.18-11.el7_5.3.x86_64
> corosynclib-2.4.3-2.el7_5.1.x86_64
> 
> drbd90-utils-9.3.1-1.el7.elrepo.x86_64
> kmod-drbd90-9.0.14-1.el7_5.elrepo.x86_64
> 
> 
> Regards
> Jayesh Shinde
> / 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-05 Thread FeldHostAdmin
hello, yes, you need ocfs2 or gfs2, but in your case (raw image) probably 
better to use lvm

> On 5 Sep 2018, at 18:13, Lentes, Bernd  
> wrote:
> 
> Hi guys,
> 
> just to be sure. I thought (maybe i'm wrong) that having a VM on a shared 
> storage (FC SAN), e.g. in a raw file on an ext3 fs on that SAN allows 
> live-migration because pacemaker takes care that the ext3 fs is at any time 
> only mounted on one node. I tried it, but "live"-migration wasn't possible. 
> The vm was always shutdown before migration. Or do i need OCFS2 ?
> Could anyone clarifies this ?
> 
> 
> Bernd
> 
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.len...@helmholtz-muenchen.de | 
> bernd.len...@helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | http://www.helmholtz-muenchen.de/idg 
> ] 
> 
> wer Fehler macht kann etwas lernen 
> wer nichts macht kann auch nichts lernen
> 
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias H. Tschoep, Heinrich 
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] SAN, pacemaker, KVM: live-migration with ext3 ?

2018-09-05 Thread FeldHostAdmin
Why you use FS for raw image, when you can use directly LV as block device for 
your VM

> On 5 Sep 2018, at 18:34, Lentes, Bernd  
> wrote:
> 
> 
> 
> - On Sep 5, 2018, at 6:28 PM, FeldHost™ Admin ad...@feldhost.cz wrote:
> 
>> hello, yes, you need ocfs2 or gfs2, but in your case (raw image) probably 
>> better
>> to use lvm
> 
> I use cLVM. The fs for the raw image resides on a clustered VG/LV.
> But nevertheless i still need a cluster fs because of the concurrent access ?
> 
> Bernd
> 
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias H. Tschoep, Heinrich 
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread FeldHostAdmin
Hello Gang He,

which type of corosync rrp_mode you use? Passive or Active? Did you try test 
both? Also, what kernel version you use? I see some SCTP fixes in latest 
kernels.

> On 8 Mar 2018, at 08:52, Gang He  wrote:
> 
> Hello list and David Teigland,
> 
> I got a problem under a two rings cluster, the problem can be reproduced with 
> the below steps.
> 1) setup a two rings cluster with two nodes.
> e.g. 
> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
> 
> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
> restart pacemaker service on that node.
> ifconfig eth0 down
> rcpacemaker restart
> 
> 3) the whole cluster still work well (that means corosync is very smooth to 
> switch to the other ring).
> Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
> mount /dev/sda /mnt/ocfs2 
> 
> 4) Next, I do the same mount on node clvm1, the mount command will be hanged 
> for about 5 mins, and finally the mount command is done.
> But, if we setup a ocfs2 file system resource in pacemaker,
> the pacemaker resource agent will consider ocfs2 file system resource startup 
> failure before this command returns,
> the pacemaker will fence node clvm1. 
> This problem is impacting our customer's estimate, since they think the two 
> rings can be switched smoothly.
> 
> According to this problem, I can see the mount command is hanged with the 
> below back trace,
> clvm1:/ # cat /proc/6688/stack
> [] new_lockspace+0x92d/0xa70 [dlm]
> [] dlm_new_lockspace+0x69/0x160 [dlm]
> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
> [] mount_bdev+0x1a0/0x1e0
> [] mount_fs+0x3a/0x170
> [] vfs_kern_mount+0x62/0x110
> [] do_mount+0x213/0xcd0
> [] SyS_mount+0x85/0xd0
> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
> [] 0x
> 
> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
> 1075
> 1076 log_print("connecting to %d", con->nodeid);
> 1077
> 1078 /* Turn off Nagle's algorithm */
> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
> 1080   sizeof(one));
> 1081
> 1082 result = sock->ops->connect(sock, (struct sockaddr *), 
> addr_len,
> 1083O_NONBLOCK);  <<= here, this invoking 
> will cost > 5 mins before return ETIMEDOUT(-110).
> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
> 1085
> 1086 if (result == -EINPROGRESS)
> 1087 result = 0;
> 1088 if (result == 0)
> 1089 goto out;
> 
> Then, I want to know if this problem was found/fixed before? 
> it looks DLM can not switch the second ring very quickly, this will impact 
> the above application (e.g. CLVM, ocfs2) to create a new lock space before 
> it's startup.
> 
> Thanks
> Gang
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread FeldHostAdmin
Hi, so try to use active mode.

https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installation_terms.html

That fixes I saw in 4.14.*

> On 8 Mar 2018, at 09:12, Gang He  wrote:
> 
> Hi Feldhost,
> 
> 
 
>> Hello Gang He,
>> 
>> which type of corosync rrp_mode you use? Passive or Active? 
> clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
>rrp_mode:   passive
> 
> Did you try test both?
> No, only this mode. 
> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
> clvm1:/etc/corosync # uname -r
> 4.4.114-94.11-default
> It looks that sock->ops->connect() function is blocked for too long time 
> before return, under broken network situation. 
> In normal network, sock->ops->connect() function returns very quickly.
> 
> Thanks
> Gang
> 
>> 
>>> On 8 Mar 2018, at 08:52, Gang He  wrote:
>>> 
>>> Hello list and David Teigland,
>>> 
>>> I got a problem under a two rings cluster, the problem can be reproduced 
>> with the below steps.
>>> 1) setup a two rings cluster with two nodes.
>>> e.g. 
>>> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
>>> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
>>> 
>>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
>> restart pacemaker service on that node.
>>> ifconfig eth0 down
>>> rcpacemaker restart
>>> 
>>> 3) the whole cluster still work well (that means corosync is very smooth to 
>> switch to the other ring).
>>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
>>> mount /dev/sda /mnt/ocfs2 
>>> 
>>> 4) Next, I do the same mount on node clvm1, the mount command will be 
>>> hanged 
>> for about 5 mins, and finally the mount command is done.
>>> But, if we setup a ocfs2 file system resource in pacemaker,
>>> the pacemaker resource agent will consider ocfs2 file system resource 
>> startup failure before this command returns,
>>> the pacemaker will fence node clvm1. 
>>> This problem is impacting our customer's estimate, since they think the two 
>> rings can be switched smoothly.
>>> 
>>> According to this problem, I can see the mount command is hanged with the 
>> below back trace,
>>> clvm1:/ # cat /proc/6688/stack
>>> [] new_lockspace+0x92d/0xa70 [dlm]
>>> [] dlm_new_lockspace+0x69/0x160 [dlm]
>>> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>>> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>>> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>> [] mount_bdev+0x1a0/0x1e0
>>> [] mount_fs+0x3a/0x170
>>> [] vfs_kern_mount+0x62/0x110
>>> [] do_mount+0x213/0xcd0
>>> [] SyS_mount+0x85/0xd0
>>> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>> [] 0x
>>> 
>>> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
>>> 1075
>>> 1076 log_print("connecting to %d", con->nodeid);
>>> 1077
>>> 1078 /* Turn off Nagle's algorithm */
>>> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
>>> 1080   sizeof(one));
>>> 1081
>>> 1082 result = sock->ops->connect(sock, (struct sockaddr *), 
>> addr_len,
>>> 1083O_NONBLOCK);  <<= here, this 
>>> invoking 
>> will cost > 5 mins before return ETIMEDOUT(-110).
>>> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
>>> 1085
>>> 1086 if (result == -EINPROGRESS)
>>> 1087 result = 0;
>>> 1088 if (result == 0)
>>> 1089 goto out;
>>> 
>>> Then, I want to know if this problem was found/fixed before? 
>>> it looks DLM can not switch the second ring very quickly, this will impact 
>> the above application (e.g. CLVM, ocfs2) to create a new lock space before 
>> it's startup.
>>> 
>>> Thanks
>>> Gang
>>> 
>>> 
>>> ___
>>> Users mailing list: Users@clusterlabs.org 
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>> 
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] GFS2 + KVM with qcow2 recommended IO mode

2018-02-28 Thread FeldHostAdmin
Hello, thank you for reply. I use GFS2 on compute nodes and VMs using qcow2 
images sits on it.

I just found this https://bugzilla.redhat.com/show_bug.cgi?id=1356632 
<https://bugzilla.redhat.com/show_bug.cgi?id=1356632>

but no explanation.

> On 28 Feb 2018, at 16:14, Andrew Price <anpr...@redhat.com> wrote:
> 
> On 28/02/18 14:03, FeldHost™ Admin wrote:
>> Hello all,
>> I have GFS2 cluster with KVM using qcow2 images. What is recommended IO 
>> mode? threads or native? I will be happy for recommendation and explanation.
> 
> You'd likely get a better answer from the QEMU folks, but:
> 
> As I understand it, with io=native qemu submits IO using OS-based aio 
> (io_submit()) and with io=threads it handles its own async io using 
> threading. To see what works best for you, it might be a good idea to do some 
> perf testing of your intended workload with each i/o mode.
> 
> This talk covers the subject https://www.youtube.com/watch?v=Jx93riUF5_I
> 
> You didn't mention whether you're using qcow2 _for_ gfs2 but be aware that 
> recent versions of libvirt will prevent non-raw images being made shareable 
> due to the potential for corruption. See 
> https://bugzilla.redhat.com/show_bug.cgi?id=1511480 for details.
> 
> Hope this helps,
> 
> Andy

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread FeldHostAdmin
Probably you need to enable_startup_fencing = 0 instead of enable_fencing = 0.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – Hostingové služby přizpůsobíme Vám Máte 
specifické požadavky? Poradíme si s nimi.

FELDSAM s.r.o.
V Chotejně 765/15
Praha 10 – Hostivař, PSČ 102 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 1 Oct 2018, at 18:55, Patrick Whitney  wrote:
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't.
> 
> As a matter of fact, DLM has a setting "enable_fencing=0|1" for what that's 
> worth.   
>  
> You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> We do have fencing enabled in the cluster; we've tested both node level 
> fencing and resource fencing; DLM behaved identically in both scenarios, 
> until we set it to 'enable_fencing=0' in the dlm.conf file. 
>  
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> This isn't quite what I was seeing in the logs.  The "failed" node would be 
> fenced off, pacemaker appeared to be sane, reporting services running on the 
> running nodes, but once the failed node was seen as missing by dlm 
> (dlm_controld), dlm would request fencing, from what I can tell by the log 
> entry.  Here is an example of the suspect log entry:
> Sep 26 09:41:35 pcmk-test-1 dlm_controld[837]: 38 fence request 2 pid 1446 
> startup time 1537969264 fence_all dlm_stonith
>  
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> Can you speak more to what "proper fencing" is for DLM? 
> 
> Best,
> -Pat
> 
>   
> 
> On Mon, Oct 1, 2018 at 12:30 PM Digimer  > wrote:
> On 2018-10-01 12:04 PM, Ferenc Wágner wrote:
> > Patrick Whitney mailto:pwhit...@luminoso.com>> 
> > writes:
> > 
> >> I have a two node (test) cluster running corosync/pacemaker with DLM
> >> and CLVM.
> >>
> >> I was running into an issue where when one node failed, the remaining node
> >> would appear to do the right thing, from the pcmk perspective, that is.
> >> It would  create a new cluster (of one) and fence the other node, but
> >> then, rather surprisingly, DLM would see the other node offline, and it
> >> would go offline itself, abandoning the lockspace.
> >>
> >> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> >> our tests are now working as expected.
> > 
> > I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
> > is, they are started by systemd, not by Pacemaker).  I've seen weird DLM
> > fencing behavior, but not what you describe above (though I ran with
> > more than two nodes from the very start).  Actually, I don't even
> > understand how it occured to you to disable DLM fencing to fix that...
> 
> Fencing in clustering is always required, but unlike pacemaker that lets
> you turn it off and take your chances, DLM doesn't. You must have
> working fencing for DLM (and anything using it) to function correctly.
> 
> Basically, cluster config changes (node declared lost), dlm informed and
> blocks, fence attempt begins and loops until it succeeds, on success,
> informs DLM, dlm reaps locks held by the lost node and normal operation
> continues.
> 
> This isn't a question of node count or other configuration concerns.
> It's simply that you must have proper fencing for DLM.
> 
> >> I'm a little concern I have masked an issue by doing this, as in all
> >> of the tutorials and docs I've read, there is no mention of having to
> >> configure DLM whatsoever.
> > 
> > Unfortunately it's very hard to come by any reliable info about DLM.  I
> > had a couple of enlightening exchanges with David Teigland (its primary
> > author) on this list, he is very helpful indeed, but I'm still very far
> > from having a working understanding of it.
> > 
> > But I've been running with --enable_fencing=0 for years without issues,
> > leaving all fencing to Pacemaker.  Note that manual cLVM operations are
> > the only users of DLM here, so delayed fencing does not cause any
> > problems, the cluster services do not depend on DLM being operational (I
> > mean it can stay frozen for several days -- as it happened in a couple
> > of pathological cases).  GFS2 would be a very different thing, I guess.
> > 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/ 
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and