Re: [ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-12 Thread Guoqing Jiang



On 03/08/2018 07:24 PM, Ulrich Windl wrote:

Hi!

What surprises me most is that a connect(...O_NONBLOCK) actually blocks:

EINPROGRESS
   The  socket  is  non-blocking  and the connection cannot be com-
   pleted immediately.



Maybe it is because that the socket is created by sock_create_kern, and
O_NONBLOCK flag is not worked  since __sctp_connect has the following
description.

   /* in-kernel sockets don't generally have a file allocated to them
 * if all they do is call sock_create_kern().
 */
    if (sk->sk_socket->file)
    f_flags = sk->sk_socket->file->f_flags;

        timeo = sock_sndtimeo(sk, f_flags & O_NONBLOCK);

Thanks,
Guoqing
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He



>>> "Ulrich Windl"  03/08/18 7:24 PM >>>
Hi!

What surprises me most is that a connect(...O_NONBLOCK) actually blocks:

EINPROGRESS
  The  socket  is  non-blocking  and the connection cannot be com-
  pleted immediately.
Yes, the behavior does not follow the O_NONBLOCK flag, it is too long for 5 
mins.

Thanks
Gang

Regards,
Ulrich


>>> "Gang He"  schrieb am 08.03.2018 um 10:48 in Nachricht
<5aa1776502f9000ad...@prv-mh.provo.novell.com>:
> Hi Feldhost,
> 
> I use active rrp_mode in corosync.conf and reboot the cluster to let the 
> configuration effective.
> But, the about 5 mins hang in new_lockspace() function is still here.
> 
> Thanks
> Gang
>  
> 
 
>> Hi, so try to use active mode.
>> 
>> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installatio 
> 
>> n_terms.html
>> 
>> That fixes I saw in 4.14.*
>> 
>>> On 8 Mar 2018, at 09:12, Gang He  wrote:
>>> 
>>> Hi Feldhost,
>>> 
>>> 
>> 
 Hello Gang He,
 
 which type of corosync rrp_mode you use? Passive or Active? 
>>> clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
>>>rrp_mode:   passive
>>> 
>>> Did you try test both?
>>> No, only this mode. 
>>> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
>>> clvm1:/etc/corosync # uname -r
>>> 4.4.114-94.11-default
>>> It looks that sock->ops->connect() function is blocked for too long time 
>>> before 
>> return, under broken network situation. 
>>> In normal network, sock->ops->connect() function returns very quickly.
>>> 
>>> Thanks
>>> Gang
>>> 
 
> On 8 Mar 2018, at 08:52, Gang He  wrote:
> 
> Hello list and David Teigland,
> 
> I got a problem under a two rings cluster, the problem can be reproduced 
 with the below steps.
> 1) setup a two rings cluster with two nodes.
> e.g. 
> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
> 
> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
 restart pacemaker service on that node.
> ifconfig eth0 down
> rcpacemaker restart
> 
> 3) the whole cluster still work well (that means corosync is very smooth 
> to 
 switch to the other ring).
> Then, I can mount ocfs2 file system on node clvm2 quickly with the 
> command 
> mount /dev/sda /mnt/ocfs2 
> 
> 4) Next, I do the same mount on node clvm1, the mount command will be 
> hanged 
> 
>> 
 for about 5 mins, and finally the mount command is done.
> But, if we setup a ocfs2 file system resource in pacemaker,
> the pacemaker resource agent will consider ocfs2 file system resource 
 startup failure before this command returns,
> the pacemaker will fence node clvm1. 
> This problem is impacting our customer's estimate, since they think the 
> two 
 rings can be switched smoothly.
> 
> According to this problem, I can see the mount command is hanged with the 
 below back trace,
> clvm1:/ # cat /proc/6688/stack
> [] new_lockspace+0x92d/0xa70 [dlm]
> [] dlm_new_lockspace+0x69/0x160 [dlm]
> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
> [] mount_bdev+0x1a0/0x1e0
> [] mount_fs+0x3a/0x170
> [] vfs_kern_mount+0x62/0x110
> [] do_mount+0x213/0xcd0
> [] SyS_mount+0x85/0xd0
> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
> [] 0x
> 
> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
> 1075
> 1076 log_print("connecting to %d", con->nodeid);
> 1077
> 1078 /* Turn off Nagle's algorithm */
> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
> 1080   sizeof(one));
> 1081
> 1082 result = sock->ops->connect(sock, (struct sockaddr *), 
 addr_len,
> 1083O_NONBLOCK);  <<= here, this 
> invoking 
 will cost > 5 mins before return ETIMEDOUT(-110).
> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", 
> result);
> 1085
> 1086 if (result == -EINPROGRESS)
> 1087 result = 0;
> 1088 if (result == 0)
> 1089 goto out;
> 
> Then, I want to know if this problem was found/fixed before? 
> it looks DLM can not switch the second ring very quickly, this will 
> impact 
 the above application (e.g. CLVM, ocfs2) to create a new lock space before 
 it's startup.
> 
> Thanks
> Gang
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org