Re: [Cluster-devel] How to enable daemon_debug for dlm_controld

2018-06-27 Thread Guoqing Jiang




On 06/27/2018 09:49 PM, David Teigland wrote:

On Wed, Jun 27, 2018 at 04:07:18PM +0800, Guoqing Jiang wrote:

But by default, seems dlm_controld just run with "-s 0". And I tried to add
"daemon_debug=1" to /etc/dlm/dlm.conf,
then dlm resource can't start at all. Could you tell me how to enable this
option? Thanks in advance!

That option doesn't work in dlm.conf, see
https://pagure.io/dlm/c/9fe973004d8dbffc07bbecbdbb6c828bfb832137?branch=master

What you want instead is:
log_debug=1
debug_logfile=1


Thanks for the help, I was misguided by the previous man page.


Then look in /var/log/dlm_controld/dlm_controld.log


However, I can't find the file with the above change during split-brain
test, perhaps I still missed something ..., thanks.

linux-u121:~ # cat /etc/dlm/dlm.conf
log_debug=1
debug_logfile=1
linux-u121:~ # ls /var/log/dlm_controld/dlm_controld.log
ls: cannot access '/var/log/dlm_controld/dlm_controld.log': No such file 
or directory

linux-u121:~ # find / -name dlm_controld
/run/dlm_controld
/usr/sbin/dlm_controld

Regards,
Guoqing



[Cluster-devel] How to enable daemon_debug for dlm_controld

2018-06-27 Thread Guoqing Jiang

Hi David,

I want enable the option to see why dlm_controld got repeatly launched 
repeatly by systemd, these logs happened in split

brain case.

Jun 27 15:40:24 gqj-bsc1098449-2 systemd[1]: Started Process Core Dump 
(PID 3149/UID 0).
Jun 27 15:40:24 gqj-bsc1098449-2 systemd-coredump[3150]: Process 3148 
(dlm_stonith) of user 0 dumped core.
Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence result 
172204701 pid 3148 result -1 term signal 6
Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence status 
172204701 receive -1 from 172204758 walltime 1530085225 local 392
Jun 27 15:40:25 gqj-bsc1098449-2 dlm_controld[2487]: 392 fence request 
172204701 pid 3154 nodedown time 1530085207 fence_all dlm_stonith

[snip]
Jun 27 15:40:29 gqj-bsc1098449-2 systemd[1]: Started Process Core Dump 
(PID 3207/UID 0).
Jun 27 15:40:29 gqj-bsc1098449-2 systemd-coredump[3208]: Process 3206 
(dlm_stonith) of user 0 dumped core.
Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence result 
172204701 pid 3206 result -1 term signal 6
Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence status 
172204701 receive -1 from 172204758 walltime 1530085230 local 398
Jun 27 15:40:30 gqj-bsc1098449-2 dlm_controld[2487]: 398 fence request 
172204701 pid 3212 nodedown time 1530085207 fence_all dlm_stonith


But by default, seems dlm_controld just run with "-s 0". And I tried to 
add "daemon_debug=1" to /etc/dlm/dlm.conf,
then dlm resource can't start at all. Could you tell me how to enable 
this option? Thanks in advance!


Regards,
Guoqing



Re: [Cluster-devel] About locking granularity of gfs2

2018-04-24 Thread Guoqing Jiang



On 04/24/2018 08:54 PM, Steven Whitehouse wrote:

Hi,


On 24/04/18 04:54, Guoqing Jiang wrote:

Hi Steve,

Thanks for your reply.

On 04/24/2018 11:03 AM, Steven Whitehouse wrote:

Hi,


On 24/04/18 03:52, Guoqing Jiang wrote:

Hi,

Since gfs2 can "allow parallel allocation from different nodes 
simultaneously
as the locking granularity is one lock per resource group" per 
section 3.2 of [1].


Could it possible to make the locking granularity also applies to 
R/W IO? Then,
with the help of "sunit" and "swidth", we basically can lock a 
stripe, so all nodes
can write to different stripes in parallel, so the basic IO unit is 
one stripe.
Since I don't know gfs2 well,  I am wondering it is possible to do 
it or it doesn't
make sense at all for the idea due to some reasons. Any thoughts 
would be

appreciated, thanks.

I am asking the question because if people want to add the cluster 
support for
md/raid5, then it is better to get the help from filesystem level 
to ensure only one
node can access a stripe at a time, otherwise we have to locking a 
stripe in md

layer which could cause performance issue.

[1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf

Regards,
Guoqing



It is not just performance, it would be correctness too, since there 
is no guarantee that two nodes are not writing to the same stripe at 
the same time.


Yes, no fs can guarantee it. I am wondering if using GFS2 as a local 
filesystem, and gfs2 runs on top
of raid5, is it possible that gfs2 can write to two places 
simultaneously while the two places belong

to one stripe?


Yes



Based on the possibility, I guess it is not recommend to run gfs2 on raid5.

The locking granularity is per-inode generally, but also per-rgrp in 
case of rgrps, but that refers only to the header/bitmap since the 
allocated blocks are subject to the per-inode glocks in general.


Please correct me, does it mean there are two types of locking 
granularity? per-rgrp is for allocate rgrp,

and per-inode if for R/W IO, thanks.
It depends what operation is being undertaken. The per-inode glock 
covers all the blocks related to the inode, but during allocation and 
deallocation, the responsibility for the allocated and deallocate 
blocks passes between the rgrp and inode to which they relate. So the 
situation is more complicated than when no allocation/deallocation is 
involved,


Thanks a lot for your explanation.

Regards,
Guoqing



Re: [Cluster-devel] About locking granularity of gfs2

2018-04-24 Thread Guoqing Jiang



On 04/24/2018 01:13 PM, Gang He wrote:

Stripe unit is logical volume concepts, for file system, it should not know 
this,
for file system, the access unit is block (power of disk sector size).


IMHO, It is true for typical fs, but I think zfs and btrfs can know 
about it well, though

no cluster fs can support it now.

Thanks,
Guoqing



Re: [Cluster-devel] About locking granularity of gfs2

2018-04-23 Thread Guoqing Jiang

Hi Steve,

Thanks for your reply.

On 04/24/2018 11:03 AM, Steven Whitehouse wrote:

Hi,


On 24/04/18 03:52, Guoqing Jiang wrote:

Hi,

Since gfs2 can "allow parallel allocation from different nodes 
simultaneously
as the locking granularity is one lock per resource group" per 
section 3.2 of [1].


Could it possible to make the locking granularity also applies to R/W 
IO? Then,
with the help of "sunit" and "swidth", we basically can lock a 
stripe, so all nodes
can write to different stripes in parallel, so the basic IO unit is 
one stripe.
Since I don't know gfs2 well,  I am wondering it is possible to do it 
or it doesn't
make sense at all for the idea due to some reasons. Any thoughts 
would be

appreciated, thanks.

I am asking the question because if people want to add the cluster 
support for
md/raid5, then it is better to get the help from filesystem level to 
ensure only one
node can access a stripe at a time, otherwise we have to locking a 
stripe in md

layer which could cause performance issue.

[1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf

Regards,
Guoqing



It is not just performance, it would be correctness too, since there 
is no guarantee that two nodes are not writing to the same stripe at 
the same time.


Yes, no fs can guarantee it. I am wondering if using GFS2 as a local 
filesystem, and gfs2 runs on top
of raid5, is it possible that gfs2 can write to two places 
simultaneously while the two places belong

to one stripe?

The locking granularity is per-inode generally, but also per-rgrp in 
case of rgrps, but that refers only to the header/bitmap since the 
allocated blocks are subject to the per-inode glocks in general.


Please correct me, does it mean there are two types of locking 
granularity? per-rgrp is for allocate rgrp,

and per-inode if for R/W IO, thanks.

I don't think it would be easy to try and make them correspond with 
raid stripes and getting gfs2 to work with md would be non-trivial,


Totally agree :-).

Regards,,
Guoqing



[Cluster-devel] About locking granularity of gfs2

2018-04-23 Thread Guoqing Jiang

Hi,

Since gfs2 can "allow parallel allocation from different nodes 
simultaneously
as the locking granularity is one lock per resource group" per section 
3.2 of [1].


Could it possible to make the locking granularity also applies to R/W 
IO? Then,
with the help of "sunit" and "swidth", we basically can lock a stripe, 
so all nodes
can write to different stripes in parallel, so the basic IO unit is one 
stripe.
Since I don't know gfs2 well,  I am wondering it is possible to do it or 
it doesn't

make sense at all for the idea due to some reasons. Any thoughts would be
appreciated, thanks.

I am asking the question because if people want to add the cluster 
support for
md/raid5, then it is better to get the help from filesystem level to 
ensure only one
node can access a stripe at a time, otherwise we have to locking a 
stripe in md

layer which could cause performance issue.

[1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf

Regards,
Guoqing



[Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()

2017-09-25 Thread Guoqing Jiang
Call schedule() here could make the thread miss wake
up from kthread_stop(), so it is better to recheck
kthread_should_stop() before call schedule(), a symptom
happened when I run indefinite test (which mostly created
clustered raid1, assemble it in other nodes, then stop
them) of clustered raid.

linux175:~ # ps aux|grep md|grep D
root  4211  0.0  0.0  19760  2220 ?Ds   02:58   0:00 mdadm -Ssq
linux175:~ # cat /proc/4211/stack
[] kthread_stop+0x4d/0x150
[] dlm_recoverd_stop+0x15/0x20 [dlm]
[] dlm_release_lockspace+0x2ab/0x460 [dlm]
[] leave+0xbf/0x150 [md_cluster]
[] md_cluster_stop+0x18/0x30 [md_mod]
[] bitmap_free+0x12e/0x140 [md_mod]
[] bitmap_destroy+0x7f/0x90 [md_mod]
[] __md_stop+0x21/0xa0 [md_mod]
[] do_md_stop+0x15f/0x5c0 [md_mod]
[] md_ioctl+0xa65/0x18a0 [md_mod]
[] blkdev_ioctl+0x49e/0x8d0
[] block_ioctl+0x41/0x50
[] do_vfs_ioctl+0x96/0x5b0
[] SyS_ioctl+0x79/0x90
[] entry_SYSCALL_64_fastpath+0x1e/0xad

This maybe not resolve the issue completely since the
KTHREAD_SHOULD_STOP flag could be set between "break"
and "schedule", but at least the chance for the symptom
happen could be reduce a lot (The indefinite test runs
more than 20 hours without problem and it happens easily
without the change).

Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 fs/dlm/recoverd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index 6859b4b..9fab490 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -290,8 +290,11 @@ static int dlm_recoverd(void *arg)
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
if (!test_bit(LSFL_RECOVER_WORK, >ls_flags) &&
-   !test_bit(LSFL_RECOVER_DOWN, >ls_flags))
+   !test_bit(LSFL_RECOVER_DOWN, >ls_flags)) {
+   if (kthread_should_stop())
+   break;
schedule();
+   }
set_current_state(TASK_RUNNING);
 
if (test_and_clear_bit(LSFL_RECOVER_DOWN, >ls_flags)) {
-- 
2.6.6



Re: [Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock

2017-08-07 Thread Guoqing Jiang



On 08/08/2017 03:10 AM, Bob Peterson wrote:

| | Signed-off-by: Guoqing Jiang <gqji...@suse.com>
| | ---
| |  fs/dlm/lowcomms.c | 2 +-
| |  1 file changed, 1 insertion(+), 1 deletion(-)
| |
| | diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
| | index 9382db9..4813d0e 100644
| | --- a/fs/dlm/lowcomms.c
| | +++ b/fs/dlm/lowcomms.c
| | @@ -729,7 +729,7 @@ static int tcp_accept_from_sock(struct connection *con)
| | mutex_unlock(_lock);
| |
| | memset(, 0, sizeof(peeraddr));
| | -   result = sock_create_kern(_net, dlm_local_addr[0]->ss_family,
| | +   result = sock_create_lite(dlm_local_addr[0]->ss_family,
| |   SOCK_STREAM, IPPROTO_TCP, );
| | if (result < 0)
| | return -ENOMEM;
|
| Isn't this also a problem for the sctp equivalent, sctp_connect_to_sock?
|
| Regards,
|
| Bob Peterson
| Red Hat File Systems
|

In fact, I see 5 different calls to sock_create_kern in DLM.
Shouldn't it be done to all of them?


Only this one called accept immediately after the socket is created, so 
others
probably are safe. Plus, the sock is used after sock_create_kern, so I 
am not

sure it can be replaced with sock_create_lite.

result = sock_create_kern(_net, dlm_local_addr[0]->ss_family,
  SOCK_STREAM, IPPROTO_SCTP, );
...
sock->sk->sk_user_data = con;


One could also argue that sock_create_kern should itself be fixed,
not its callers.


Pls see https://patchwork.ozlabs.org/patch/780356/ for more infos.

Thanks,
GUoqing



[Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock

2017-08-07 Thread Guoqing Jiang
With commit 0ffdaf5b41cf ("net/sock: add WARN_ON(parent->sk)
in sock_graft()"), a calltrace happened as follows:

[  457.018340] WARNING: CPU: 0 PID: 15623 at ./include/net/sock.h:1703 
inet_accept+0x135/0x140
...
[  457.018381] RIP: 0010:inet_accept+0x135/0x140
[  457.018381] RSP: 0018:c90001727d18 EFLAGS: 00010286
[  457.018383] RAX: 0001 RBX: 880012413000 RCX: 0001
[  457.018384] RDX: 018a RSI: fe01 RDI: 8156fae8
[  457.018384] RBP: c90001727d38 R08:  R09: 4305
[  457.018385] R10: 0001 R11: 4304 R12: 880035ae7a00
[  457.018386] R13: 88001282af10 R14: 880034e4e200 R15: 
[  457.018387] FS:  () GS:88003fc0() 
knlGS:
[  457.018388] CS:  0010 DS:  ES:  CR0: 80050033
[  457.018389] CR2: 7fdec22f9000 CR3: 02b5a000 CR4: 06f0
[  457.018395] Call Trace:
[  457.018402]  tcp_accept_from_sock.part.8+0x12d/0x449 [dlm]
[  457.018405]  ? vprintk_emit+0x248/0x2d0
[  457.018409]  tcp_accept_from_sock+0x3f/0x50 [dlm]
[  457.018413]  process_recv_sockets+0x3b/0x50 [dlm]
[  457.018415]  process_one_work+0x138/0x370
[  457.018417]  worker_thread+0x4d/0x3b0
[  457.018419]  kthread+0x109/0x140
[  457.018421]  ? rescuer_thread+0x320/0x320
[  457.018422]  ? kthread_park+0x60/0x60
[  457.018424]  ret_from_fork+0x25/0x30

Since newsocket created by sock_create_kern sets it's
sock by the path:

sock_create_kern -> __sock_creat
 ->pf->create => inet_create
 -> sock_init_data

Then WARN_ON is triggered by "con->sock->ops->accept =>
inet_accept -> sock_graft", it also means newsock->sk
is leaked since sock_graft will replace it with a new
sk.

To resolve the issue, we need to use sock_create_lite
instead of sock_create_kern, like commit 0933a578cd55
("rds: tcp: use sock_create_lite() to create the accept
socket") did.

Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 fs/dlm/lowcomms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 9382db9..4813d0e 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -729,7 +729,7 @@ static int tcp_accept_from_sock(struct connection *con)
mutex_unlock(_lock);
 
memset(, 0, sizeof(peeraddr));
-   result = sock_create_kern(_net, dlm_local_addr[0]->ss_family,
+   result = sock_create_lite(dlm_local_addr[0]->ss_family,
  SOCK_STREAM, IPPROTO_TCP, );
if (result < 0)
return -ENOMEM;
-- 
2.10.0



Re: [Cluster-devel] About dlm_unlock (kernel space)

2016-06-13 Thread Guoqing Jiang



On 06/13/2016 10:56 PM, David Teigland wrote:

On Mon, Jun 13, 2016 at 07:15:09AM -0400, Guoqing Jiang wrote:

Hi,

In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it
possible that the convert
queue could be NULL or not NULL while perform unlock? I think there
are two different
cases would appear when call dlm_unlock:

1. the lock logic is in convert stage.
2. convert queue is null.

For 1, seems need to cancel the lock request first
(dlm_unlock+CANCEL), then call
dlm_unlock. And just need to call dlm_unlock directly for case 2.
Please correct me
if I am wrong.

And what could happen if cancel a lock which has a empty convert
queue? Like call
dlm_unlock+CANCEL for case 2, is something wrong could happen?

The last question, is there a dlm_unlock_* variant which could do
unlock finally for
both case1 and case2 (or does the variant make sense)?

Convert is not a stable state, which means that cancel always involves a
race:  the convert could complete before the cancel is processed.  After
convert has finished, or after cancel has finished, then you know that the
lock is not converting and a simple unlock will work.  I suggest you test
these combinations to see how they behave in practice.  Doing unlock with
FORCEUNLOCK is also an option (that works even if the lock is on the
waiting or convert queue.)  I'd be judicious about using either CANCEL or
FORCEUNLOCK.




Thanks a lot for detailed infos and suggestions! Looks FORCEUNLOCK
flag is perfect since it is suitable for both 1 and 2, I will use it and see
what will happen in practice.

Cheers,
Guoqing



[Cluster-devel] About dlm_unlock (kernel space)

2016-06-13 Thread Guoqing Jiang

Hi,

In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it possible 
that the convert
queue could be NULL or not NULL while perform unlock? I think there are 
two different

cases would appear when call dlm_unlock:

1. the lock logic is in convert stage.
2. convert queue is null.

For 1, seems need to cancel the lock request first (dlm_unlock+CANCEL), 
then call
dlm_unlock. And just need to call dlm_unlock directly for case 2. Please 
correct me

if I am wrong.

And what could happen if cancel a lock which has a empty convert queue? 
Like call

dlm_unlock+CANCEL for case 2, is something wrong could happen?

The last question, is there a dlm_unlock_* variant which could do unlock 
finally for

both case1 and case2 (or does the variant make sense)?

Thanks & Regards,
Guoqing



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-16 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
 On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
   
 Do you consider take the following clean up? If yes, I will send a
 formal patch, otherwise pls ignore it.
 

 On first glance, the old and new code do not appear to do the same thing,
 so let's leave it as it is.

   
 -   to_nodeid = dlm_dir_nodeid(r);
 
Sorry, seems it is the only different thing, if combines previous change
with below modification, then the behavior is same.

@@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mstype)
struct dlm_mhandle *mh;
int to_nodeid, error;
 
-   to_nodeid = r-res_nodeid;
+   if (mstype == DLM_MSG_LOOKUP)
+   to_nodeid = dlm_dir_nodeid(r);
+   else
+   to_nodeid = r-res_nodeid;

And for create_message, the second parameter (lkb) is not effective to
create three type msgs (REQUEST/LOOKUP/REMOVE).

Thanks,
Guoqing



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
 On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote:
   
 The remove_from_waiters could  only be invoked after failed to
 create_message, right?
 Since send_message always returns 0, this patch doesn't touch anything
 about the failure
 path, and it also doesn't change the original semantic.
 

 I'm not inclined to take any patches unless there's a problem identified.

 .  
   
Do you consider take the following clean up? If yes, I will send a
formal patch,
otherwise pls ignore it.

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..7c822f7 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3747,30 +3747,7 @@ static int send_bast(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mode)
 
 static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb)
 {
-   struct dlm_message *ms;
-   struct dlm_mhandle *mh;
-   int to_nodeid, error;
-
-   to_nodeid = dlm_dir_nodeid(r);
-
-   error = add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid);
-   if (error)
-   return error;
-
-   error = create_message(r, NULL, to_nodeid, DLM_MSG_LOOKUP, ms,
mh);
-   if (error)
-   goto fail;
-
-   send_args(r, lkb, ms);
-
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
-
- fail:
-   remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-   return error;
+   return send_common(r, lkb, DLM_MSG_LOOKUP);
 }

Thanks,
Guoqing



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread Guoqing Jiang
Bob Peterson wrote:

 
 
 - Original Message -
   
   
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
 

 Hi Guoqing,

 Sorry, I was momentarily confused. I think you misunderstood what I was 
 saying.
 What I meant was: Instead of doing:

 + return send_message(mh, ms);
 ...where send_message returns 0, it might be better to have:

 static void send_message(struct dlm_mhandle *mh, struct dlm_message *ms)
 {
   dlm_message_out(ms);
   dlm_lowcomms_commit_buffer(mh);
 }

 ...And in send_common, do (in both places):
 + send_message(mh, ms);
 + return 0;

 Since it's so short, it might even be better to code send_message as a macro,
 or at least an inline function.

   
Hi Bob,

Got it, thanks. It is a better solution but it is not a bug fix or
similar thing, so maybe just leave it as it is.

Regards,
Guoqing




[Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
We don't need the redundant logic since send_message always returns 0.

Signed-off-by: Guoqing Jiang gqji...@suse.com
---
 fs/dlm/lock.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..6fc3de9 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb 
*lkb, int mstype)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, msg_reply_type(mstype));
@@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb 
*lkb)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-- 
1.7.12.4



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Hi Bob,

Bob Peterson wrote:
 - Original Message -
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
  
   fail:
  remove_from_waiters(lkb, msg_reply_type(mstype));
 @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
 dlm_lkb *lkb)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
  
   fail:
  remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
 --
 1.7.12.4
 

 Hi,

 The patch looks okay, but if remove_from_waiters() always returns 0,
 wouldn't it be better to change the function from int to void and
 return 0 here? The advantage is that code spelunkers wouldn't need
 to back-track one more level (not to mention the instruction or two
 it might save).

   
Seems remove_from_waiters is not always returns 0, the return value
could  be -1 or 0 which depends on _remove_from_waiters.

BTW, I found that there are no big difference between send_common
and send_lookup, since the send_common can also be use to send
lookup message, I guess send_lookup can be removed as well.

Thanks,
Guoqing



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Bob Peterson wrote:
 - Original Message -
   
 Hi Bob,

 Bob Peterson wrote:
 
 - Original Message -
   
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
send_args(r, lkb, ms);
  
 -  error = send_message(mh, ms);
 -  if (error)
 -  goto fail;
 -  return 0;
 +  return send_message(mh, ms);
  
   fail:
remove_from_waiters(lkb, msg_reply_type(mstype));
 @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
 dlm_lkb *lkb)
  
send_args(r, lkb, ms);
  
 -  error = send_message(mh, ms);
 -  if (error)
 -  goto fail;
 -  return 0;
 +  return send_message(mh, ms);
  
   fail:
remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
 --
 1.7.12.4
 
 
 Hi,

 The patch looks okay, but if remove_from_waiters() always returns 0,
 wouldn't it be better to change the function from int to void and
 return 0 here? The advantage is that code spelunkers wouldn't need
 to back-track one more level (not to mention the instruction or two
 it might save).

   
   
 Seems remove_from_waiters is not always returns 0, the return value
 could  be -1 or 0 which depends on _remove_from_waiters.

 BTW, I found that there are no big difference between send_common
 and send_lookup, since the send_common can also be use to send
 lookup message, I guess send_lookup can be removed as well.

 Thanks,
 Guoqing
 

 Hi Guoqing,

 If remove_from_waiters can return -1, then the patch would prevent the
 code from calling remove_from_waiters. So the patch still doesn't look
 right to me.

   
Hi Bob,

The remove_from_waiters could  only be invoked after failed to
create_message, right?
Since send_message always returns 0, this patch doesn't touch anything
about the failure
path, and it also doesn't change the original semantic.

Thanks,
Guoqing