Re: [Cluster-devel] How to enable daemon_debug for dlm_controld
On 06/27/2018 09:49 PM, David Teigland wrote: On Wed, Jun 27, 2018 at 04:07:18PM +0800, Guoqing Jiang wrote: But by default, seems dlm_controld just run with "-s 0". And I tried to add "daemon_debug=1" to /etc/dlm/dlm.conf, then dlm resource can't start at all. Could you tell me how to enable this option? Thanks in advance! That option doesn't work in dlm.conf, see https://pagure.io/dlm/c/9fe973004d8dbffc07bbecbdbb6c828bfb832137?branch=master What you want instead is: log_debug=1 debug_logfile=1 Thanks for the help, I was misguided by the previous man page. Then look in /var/log/dlm_controld/dlm_controld.log However, I can't find the file with the above change during split-brain test, perhaps I still missed something ..., thanks. linux-u121:~ # cat /etc/dlm/dlm.conf log_debug=1 debug_logfile=1 linux-u121:~ # ls /var/log/dlm_controld/dlm_controld.log ls: cannot access '/var/log/dlm_controld/dlm_controld.log': No such file or directory linux-u121:~ # find / -name dlm_controld /run/dlm_controld /usr/sbin/dlm_controld Regards, Guoqing
[Cluster-devel] How to enable daemon_debug for dlm_controld
Hi David, I want enable the option to see why dlm_controld got repeatly launched repeatly by systemd, these logs happened in split brain case. Jun 27 15:40:24 gqj-bsc1098449-2 systemd[1]: Started Process Core Dump (PID 3149/UID 0). Jun 27 15:40:24 gqj-bsc1098449-2 systemd-coredump[3150]: Process 3148 (dlm_stonith) of user 0 dumped core. Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence result 172204701 pid 3148 result -1 term signal 6 Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence status 172204701 receive -1 from 172204758 walltime 1530085225 local 392 Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence request 172204701 pid 3154 nodedown time 1530085207 fence_all dlm_stonith [snip] Jun 27 15:40:29 gqj-bsc1098449-2 systemd[1]: Started Process Core Dump (PID 3207/UID 0). Jun 27 15:40:29 gqj-bsc1098449-2 systemd-coredump[3208]: Process 3206 (dlm_stonith) of user 0 dumped core. Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence result 172204701 pid 3206 result -1 term signal 6 Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence status 172204701 receive -1 from 172204758 walltime 1530085230 local 398 Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence request 172204701 pid 3212 nodedown time 1530085207 fence_all dlm_stonith But by default, seems dlm_controld just run with "-s 0". And I tried to add "daemon_debug=1" to /etc/dlm/dlm.conf, then dlm resource can't start at all. Could you tell me how to enable this option? Thanks in advance! Regards, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
On 04/24/2018 08:54 PM, Steven Whitehouse wrote: Hi, On 24/04/18 04:54, Guoqing Jiang wrote: Hi Steve, Thanks for your reply. On 04/24/2018 11:03 AM, Steven Whitehouse wrote: Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. Yes, no fs can guarantee it. I am wondering if using GFS2 as a local filesystem, and gfs2 runs on top of raid5, is it possible that gfs2 can write to two places simultaneously while the two places belong to one stripe? Yes Based on the possibility, I guess it is not recommend to run gfs2 on raid5. The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. Please correct me, does it mean there are two types of locking granularity? per-rgrp is for allocate rgrp, and per-inode if for R/W IO, thanks. It depends what operation is being undertaken. The per-inode glock covers all the blocks related to the inode, but during allocation and deallocation, the responsibility for the allocated and deallocate blocks passes between the rgrp and inode to which they relate. So the situation is more complicated than when no allocation/deallocation is involved, Thanks a lot for your explanation. Regards, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
On 04/24/2018 01:13 PM, Gang He wrote: Stripe unit is logical volume concepts, for file system, it should not know this, for file system, the access unit is block (power of disk sector size). IMHO, It is true for typical fs, but I think zfs and btrfs can know about it well, though no cluster fs can support it now. Thanks, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
Hi Steve, Thanks for your reply. On 04/24/2018 11:03 AM, Steven Whitehouse wrote: Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. Yes, no fs can guarantee it. I am wondering if using GFS2 as a local filesystem, and gfs2 runs on top of raid5, is it possible that gfs2 can write to two places simultaneously while the two places belong to one stripe? The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. Please correct me, does it mean there are two types of locking granularity? per-rgrp is for allocate rgrp, and per-inode if for R/W IO, thanks. I don't think it would be easy to try and make them correspond with raid stripes and getting gfs2 to work with md would be non-trivial, Totally agree :-). Regards,, Guoqing
[Cluster-devel] About locking granularity of gfs2
Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing
[Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()
Call schedule() here could make the thread miss wake up from kthread_stop(), so it is better to recheck kthread_should_stop() before call schedule(), a symptom happened when I run indefinite test (which mostly created clustered raid1, assemble it in other nodes, then stop them) of clustered raid. linux175:~ # ps aux|grep md|grep D root 4211 0.0 0.0 19760 2220 ?Ds 02:58 0:00 mdadm -Ssq linux175:~ # cat /proc/4211/stack [] kthread_stop+0x4d/0x150 [] dlm_recoverd_stop+0x15/0x20 [dlm] [] dlm_release_lockspace+0x2ab/0x460 [dlm] [] leave+0xbf/0x150 [md_cluster] [] md_cluster_stop+0x18/0x30 [md_mod] [] bitmap_free+0x12e/0x140 [md_mod] [] bitmap_destroy+0x7f/0x90 [md_mod] [] __md_stop+0x21/0xa0 [md_mod] [] do_md_stop+0x15f/0x5c0 [md_mod] [] md_ioctl+0xa65/0x18a0 [md_mod] [] blkdev_ioctl+0x49e/0x8d0 [] block_ioctl+0x41/0x50 [] do_vfs_ioctl+0x96/0x5b0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1e/0xad This maybe not resolve the issue completely since the KTHREAD_SHOULD_STOP flag could be set between "break" and "schedule", but at least the chance for the symptom happen could be reduce a lot (The indefinite test runs more than 20 hours without problem and it happens easily without the change). Signed-off-by: Guoqing Jiang <gqji...@suse.com> --- fs/dlm/recoverd.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c index 6859b4b..9fab490 100644 --- a/fs/dlm/recoverd.c +++ b/fs/dlm/recoverd.c @@ -290,8 +290,11 @@ static int dlm_recoverd(void *arg) while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); if (!test_bit(LSFL_RECOVER_WORK, >ls_flags) && - !test_bit(LSFL_RECOVER_DOWN, >ls_flags)) + !test_bit(LSFL_RECOVER_DOWN, >ls_flags)) { + if (kthread_should_stop()) + break; schedule(); + } set_current_state(TASK_RUNNING); if (test_and_clear_bit(LSFL_RECOVER_DOWN, >ls_flags)) { -- 2.6.6
Re: [Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock
On 08/08/2017 03:10 AM, Bob Peterson wrote: | | Signed-off-by: Guoqing Jiang <gqji...@suse.com> | | --- | | fs/dlm/lowcomms.c | 2 +- | | 1 file changed, 1 insertion(+), 1 deletion(-) | | | | diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c | | index 9382db9..4813d0e 100644 | | --- a/fs/dlm/lowcomms.c | | +++ b/fs/dlm/lowcomms.c | | @@ -729,7 +729,7 @@ static int tcp_accept_from_sock(struct connection *con) | | mutex_unlock(_lock); | | | | memset(, 0, sizeof(peeraddr)); | | - result = sock_create_kern(_net, dlm_local_addr[0]->ss_family, | | + result = sock_create_lite(dlm_local_addr[0]->ss_family, | | SOCK_STREAM, IPPROTO_TCP, ); | | if (result < 0) | | return -ENOMEM; | | Isn't this also a problem for the sctp equivalent, sctp_connect_to_sock? | | Regards, | | Bob Peterson | Red Hat File Systems | In fact, I see 5 different calls to sock_create_kern in DLM. Shouldn't it be done to all of them? Only this one called accept immediately after the socket is created, so others probably are safe. Plus, the sock is used after sock_create_kern, so I am not sure it can be replaced with sock_create_lite. result = sock_create_kern(_net, dlm_local_addr[0]->ss_family, SOCK_STREAM, IPPROTO_SCTP, ); ... sock->sk->sk_user_data = con; One could also argue that sock_create_kern should itself be fixed, not its callers. Pls see https://patchwork.ozlabs.org/patch/780356/ for more infos. Thanks, GUoqing
[Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock
With commit 0ffdaf5b41cf ("net/sock: add WARN_ON(parent->sk) in sock_graft()"), a calltrace happened as follows: [ 457.018340] WARNING: CPU: 0 PID: 15623 at ./include/net/sock.h:1703 inet_accept+0x135/0x140 ... [ 457.018381] RIP: 0010:inet_accept+0x135/0x140 [ 457.018381] RSP: 0018:c90001727d18 EFLAGS: 00010286 [ 457.018383] RAX: 0001 RBX: 880012413000 RCX: 0001 [ 457.018384] RDX: 018a RSI: fe01 RDI: 8156fae8 [ 457.018384] RBP: c90001727d38 R08: R09: 4305 [ 457.018385] R10: 0001 R11: 4304 R12: 880035ae7a00 [ 457.018386] R13: 88001282af10 R14: 880034e4e200 R15: [ 457.018387] FS: () GS:88003fc0() knlGS: [ 457.018388] CS: 0010 DS: ES: CR0: 80050033 [ 457.018389] CR2: 7fdec22f9000 CR3: 02b5a000 CR4: 06f0 [ 457.018395] Call Trace: [ 457.018402] tcp_accept_from_sock.part.8+0x12d/0x449 [dlm] [ 457.018405] ? vprintk_emit+0x248/0x2d0 [ 457.018409] tcp_accept_from_sock+0x3f/0x50 [dlm] [ 457.018413] process_recv_sockets+0x3b/0x50 [dlm] [ 457.018415] process_one_work+0x138/0x370 [ 457.018417] worker_thread+0x4d/0x3b0 [ 457.018419] kthread+0x109/0x140 [ 457.018421] ? rescuer_thread+0x320/0x320 [ 457.018422] ? kthread_park+0x60/0x60 [ 457.018424] ret_from_fork+0x25/0x30 Since newsocket created by sock_create_kern sets it's sock by the path: sock_create_kern -> __sock_creat ->pf->create => inet_create -> sock_init_data Then WARN_ON is triggered by "con->sock->ops->accept => inet_accept -> sock_graft", it also means newsock->sk is leaked since sock_graft will replace it with a new sk. To resolve the issue, we need to use sock_create_lite instead of sock_create_kern, like commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket") did. Signed-off-by: Guoqing Jiang <gqji...@suse.com> --- fs/dlm/lowcomms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 9382db9..4813d0e 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -729,7 +729,7 @@ static int tcp_accept_from_sock(struct connection *con) mutex_unlock(_lock); memset(, 0, sizeof(peeraddr)); - result = sock_create_kern(_net, dlm_local_addr[0]->ss_family, + result = sock_create_lite(dlm_local_addr[0]->ss_family, SOCK_STREAM, IPPROTO_TCP, ); if (result < 0) return -ENOMEM; -- 2.10.0
Re: [Cluster-devel] About dlm_unlock (kernel space)
On 06/13/2016 10:56 PM, David Teigland wrote: On Mon, Jun 13, 2016 at 07:15:09AM -0400, Guoqing Jiang wrote: Hi, In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it possible that the convert queue could be NULL or not NULL while perform unlock? I think there are two different cases would appear when call dlm_unlock: 1. the lock logic is in convert stage. 2. convert queue is null. For 1, seems need to cancel the lock request first (dlm_unlock+CANCEL), then call dlm_unlock. And just need to call dlm_unlock directly for case 2. Please correct me if I am wrong. And what could happen if cancel a lock which has a empty convert queue? Like call dlm_unlock+CANCEL for case 2, is something wrong could happen? The last question, is there a dlm_unlock_* variant which could do unlock finally for both case1 and case2 (or does the variant make sense)? Convert is not a stable state, which means that cancel always involves a race: the convert could complete before the cancel is processed. After convert has finished, or after cancel has finished, then you know that the lock is not converting and a simple unlock will work. I suggest you test these combinations to see how they behave in practice. Doing unlock with FORCEUNLOCK is also an option (that works even if the lock is on the waiting or convert queue.) I'd be judicious about using either CANCEL or FORCEUNLOCK. Thanks a lot for detailed infos and suggestions! Looks FORCEUNLOCK flag is perfect since it is suitable for both 1 and 2, I will use it and see what will happen in practice. Cheers, Guoqing
[Cluster-devel] About dlm_unlock (kernel space)
Hi, In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it possible that the convert queue could be NULL or not NULL while perform unlock? I think there are two different cases would appear when call dlm_unlock: 1. the lock logic is in convert stage. 2. convert queue is null. For 1, seems need to cancel the lock request first (dlm_unlock+CANCEL), then call dlm_unlock. And just need to call dlm_unlock directly for case 2. Please correct me if I am wrong. And what could happen if cancel a lock which has a empty convert queue? Like call dlm_unlock+CANCEL for case 2, is something wrong could happen? The last question, is there a dlm_unlock_* variant which could do unlock finally for both case1 and case2 (or does the variant make sense)? Thanks & Regards, Guoqing
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Hi David, David Teigland wrote: On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote: Do you consider take the following clean up? If yes, I will send a formal patch, otherwise pls ignore it. On first glance, the old and new code do not appear to do the same thing, so let's leave it as it is. - to_nodeid = dlm_dir_nodeid(r); Sorry, seems it is the only different thing, if combines previous change with below modification, then the behavior is same. @@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) struct dlm_mhandle *mh; int to_nodeid, error; - to_nodeid = r-res_nodeid; + if (mstype == DLM_MSG_LOOKUP) + to_nodeid = dlm_dir_nodeid(r); + else + to_nodeid = r-res_nodeid; And for create_message, the second parameter (lkb) is not effective to create three type msgs (REQUEST/LOOKUP/REMOVE). Thanks, Guoqing
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Hi David, David Teigland wrote: On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote: The remove_from_waiters could only be invoked after failed to create_message, right? Since send_message always returns 0, this patch doesn't touch anything about the failure path, and it also doesn't change the original semantic. I'm not inclined to take any patches unless there's a problem identified. . Do you consider take the following clean up? If yes, I will send a formal patch, otherwise pls ignore it. diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 35502d4..7c822f7 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -3747,30 +3747,7 @@ static int send_bast(struct dlm_rsb *r, struct dlm_lkb *lkb, int mode) static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb) { - struct dlm_message *ms; - struct dlm_mhandle *mh; - int to_nodeid, error; - - to_nodeid = dlm_dir_nodeid(r); - - error = add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid); - if (error) - return error; - - error = create_message(r, NULL, to_nodeid, DLM_MSG_LOOKUP, ms, mh); - if (error) - goto fail; - - send_args(r, lkb, ms); - - error = send_message(mh, ms); - if (error) - goto fail; - return 0; - - fail: - remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY); - return error; + return send_common(r, lkb, DLM_MSG_LOOKUP); } Thanks, Guoqing
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Bob Peterson wrote: - Original Message - We don't need the redundant logic since send_message always returns 0. Signed-off-by: Guoqing Jiang gqji...@suse.com --- fs/dlm/lock.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 35502d4..6fc3de9 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) send_args(r, lkb, ms); -error = send_message(mh, ms); -if (error) -goto fail; -return 0; +return send_message(mh, ms); Hi Guoqing, Sorry, I was momentarily confused. I think you misunderstood what I was saying. What I meant was: Instead of doing: + return send_message(mh, ms); ...where send_message returns 0, it might be better to have: static void send_message(struct dlm_mhandle *mh, struct dlm_message *ms) { dlm_message_out(ms); dlm_lowcomms_commit_buffer(mh); } ...And in send_common, do (in both places): + send_message(mh, ms); + return 0; Since it's so short, it might even be better to code send_message as a macro, or at least an inline function. Hi Bob, Got it, thanks. It is a better solution but it is not a bug fix or similar thing, so maybe just leave it as it is. Regards, Guoqing
[Cluster-devel] [PATCH] dlm: remove unnecessary error check
We don't need the redundant logic since send_message always returns 0. Signed-off-by: Guoqing Jiang gqji...@suse.com --- fs/dlm/lock.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 35502d4..6fc3de9 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) send_args(r, lkb, ms); - error = send_message(mh, ms); - if (error) - goto fail; - return 0; + return send_message(mh, ms); fail: remove_from_waiters(lkb, msg_reply_type(mstype)); @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb) send_args(r, lkb, ms); - error = send_message(mh, ms); - if (error) - goto fail; - return 0; + return send_message(mh, ms); fail: remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY); -- 1.7.12.4
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Hi Bob, Bob Peterson wrote: - Original Message - We don't need the redundant logic since send_message always returns 0. Signed-off-by: Guoqing Jiang gqji...@suse.com --- fs/dlm/lock.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 35502d4..6fc3de9 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) send_args(r, lkb, ms); -error = send_message(mh, ms); -if (error) -goto fail; -return 0; +return send_message(mh, ms); fail: remove_from_waiters(lkb, msg_reply_type(mstype)); @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb) send_args(r, lkb, ms); -error = send_message(mh, ms); -if (error) -goto fail; -return 0; +return send_message(mh, ms); fail: remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY); -- 1.7.12.4 Hi, The patch looks okay, but if remove_from_waiters() always returns 0, wouldn't it be better to change the function from int to void and return 0 here? The advantage is that code spelunkers wouldn't need to back-track one more level (not to mention the instruction or two it might save). Seems remove_from_waiters is not always returns 0, the return value could be -1 or 0 which depends on _remove_from_waiters. BTW, I found that there are no big difference between send_common and send_lookup, since the send_common can also be use to send lookup message, I guess send_lookup can be removed as well. Thanks, Guoqing
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Bob Peterson wrote: - Original Message - Hi Bob, Bob Peterson wrote: - Original Message - We don't need the redundant logic since send_message always returns 0. Signed-off-by: Guoqing Jiang gqji...@suse.com --- fs/dlm/lock.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 35502d4..6fc3de9 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) send_args(r, lkb, ms); - error = send_message(mh, ms); - if (error) - goto fail; - return 0; + return send_message(mh, ms); fail: remove_from_waiters(lkb, msg_reply_type(mstype)); @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb) send_args(r, lkb, ms); - error = send_message(mh, ms); - if (error) - goto fail; - return 0; + return send_message(mh, ms); fail: remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY); -- 1.7.12.4 Hi, The patch looks okay, but if remove_from_waiters() always returns 0, wouldn't it be better to change the function from int to void and return 0 here? The advantage is that code spelunkers wouldn't need to back-track one more level (not to mention the instruction or two it might save). Seems remove_from_waiters is not always returns 0, the return value could be -1 or 0 which depends on _remove_from_waiters. BTW, I found that there are no big difference between send_common and send_lookup, since the send_common can also be use to send lookup message, I guess send_lookup can be removed as well. Thanks, Guoqing Hi Guoqing, If remove_from_waiters can return -1, then the patch would prevent the code from calling remove_from_waiters. So the patch still doesn't look right to me. Hi Bob, The remove_from_waiters could only be invoked after failed to create_message, right? Since send_message always returns 0, this patch doesn't touch anything about the failure path, and it also doesn't change the original semantic. Thanks, Guoqing