Re: [ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-12 Thread Guoqing Jiang
On 03/08/2018 07:24 PM, Ulrich Windl wrote: Hi! What surprises me most is that a connect(...O_NONBLOCK) actually blocks: EINPROGRESS The socket is non-blocking and the connection cannot be com- pleted immediately. Maybe it is because that the socket is cre

Re: [ClusterLabs] Resources stopped due to unmanage

2018-03-12 Thread Ken Gaillot
On Mon, 2018-03-12 at 22:36 +0300, Pavel Levshin wrote: > Hello. > > > I've just expiriensed a fault in my pacemaker-based cluster. > Seriously,  > I'm completely disoriented after this. Hopefully someone can give me > a  > hint... > > > Two-node cluster runs few VirtualDomains along with their

[ClusterLabs] Resources stopped due to unmanage

2018-03-12 Thread Pavel Levshin
Hello. I've just expiriensed a fault in my pacemaker-based cluster. Seriously, I'm completely disoriented after this. Hopefully someone can give me a hint... Two-node cluster runs few VirtualDomains along with their common infrastructure (libvirtd, NFS and so on). It is Pacemaker 1.1.16 c

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 04:31:46PM +0100, Klaus Wenninger wrote: > Nope. Whenever the cluster is completely down... > Otherwise nodes would come up - if not seeing each other - > happily with both starting all services because they don't > know what already had been running on the other node. > Tec

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 04:17 PM, Valentin Vidic wrote: > On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: >> But isn't dlm directly interfering with corosync so >> that it would get the quorum state from there? >> As you have 2-node set probably on a 2-node-cluster >> this would - after both

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: > But isn't dlm directly interfering with corosync so > that it would get the quorum state from there? > As you have 2-node set probably on a 2-node-cluster > this would - after both nodes down - wait for all > nodes up first. Isn't

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 01:44 PM, Muhammad Sharfuddin wrote: > Hi Klaus, > > primitive sbd-stonith stonith:external/sbd \ >     op monitor interval=3000 timeout=20 \ >     op start interval=0 timeout=240 \ >     op stop interval=0 timeout=100 \ >     params sbd_device="/dev/mapper/sbd" \ >   

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hi Klaus, primitive sbd-stonith stonith:external/sbd \     op monitor interval=3000 timeout=20 \     op start interval=0 timeout=240 \     op stop interval=0 timeout=100 \     params sbd_device="/dev/mapper/sbd" \     meta target-role=Started property cib-bootstrap-options: \

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
Hi Muhammad! Could you be a little bit more elaborate on your fencing-setup! I read about you using SBD but I don't see any sbd-fencing-resource. For the case you wanted to use watchdog-fencing with SBD this would require stonith-watchdog-timeout property to be set. But watchdog-fencing relies on

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
@Ulrich, issue I am facing is that when both nodes get crashed and then if I keep one node offline, the online node doesn't start the ocfs2 resources. -- Regards, Muhammad Sharfuddin On 3/12/2018 4:51 PM, Muhammad Sharfuddin wrote: Hello Gang, as informed, previously cluster was fixed to st

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, as informed, previously cluster was fixed to start the ocfs2 resources by a) crm resource start dlm b) mount/umount  the ocfs2 file system manually. (this step was the fix) and then starting the clone group(which include dlm, ocfs2 file systems) worked fine: c) crm resource star

[ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-12 Thread Ulrich Windl
Hi! I didn't read the logs carefully, but I remember one pitfall (SLES 11): If I formatted the filesystem when the OCFS serveices were not running, I was unable to mount it; I had to reformat the filesystem when the OCFS services were running. Maybe that helps. Regards, Ulrich >>> "Gang He"

Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-03-12 Thread Thomas Lamprecht
Hi, On 3/9/18 5:26 PM, Jan Friesse wrote: > ... > >> TotemConfchgCallback: ringid (1.1436) >> active processors 3: 1 2 3 >> EXIT >> Finalize  result is 1 (should be 1) >> >> >> Hope I did both test right, but as it reproduces multiple times >> with testcpg, our cpg usage in our filesystem, this s

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
>>> > Hello Gang, > > to follow your instructions, I started the dlm resource via: > > crm resource start dlm > > then mount/unmount the ocfs2 file system manually..(which seems to be > the fix of the situation). > > Now resources are getting started properly on a single node.. I am h

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, to follow your instructions, I started the dlm resource via:     crm resource start dlm then mount/unmount the ocfs2 file system manually..(which seems to be the fix of the situation). Now resources are getting started properly on a single node.. I am happy as the issue is fixed