Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-12-01 Thread Eric Ren
Hi David, On 12/01/2016 12:16 AM, David Teigland wrote: On Wed, Nov 30, 2016 at 05:07:22PM +0800, Eric Ren wrote: a. Should we put recover_lvb() even before recover_conversion()? if not, why? Yes, I think you're right. The lvb decision should be made using the original lock modes, no

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-30 Thread Eric Ren
Hi David, On 11/16/2016 11:08 PM, David Teigland wrote: convert(R1, EX) get LVB Qustion: what is the LVB then? x or y? == Is this a valid question? or am I missing something? It's a good question, and it's been enough years that the details are now hazy. I think the current behavior emula

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-17 Thread Eric Ren
Hi! On 11/16/2016 11:08 PM, David Teigland wrote: convert(R1, EX) get LVB Qustion: what is the LVB then? x or y? == Is this a valid question? or am I missing something? It's a good question, and it's been enough years that the details are now hazy. I think the current behavior emula

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-16 Thread Eric Ren
On 11/16/2016 04:29 PM, Eric Ren wrote: Hi David and all, I am debugging an issue of ocfs2 that relates to LVB value. I will try to make it a pure DLM question: Two nodes (N1, N2) try to truncate the same file(R1) concurrently. N1 N2 lock(R1, EX

[Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-16 Thread Eric Ren
Hi David and all, I am debugging an issue of ocfs2 that relates to LVB value. I will try to make it a pure DLM question: Two nodes (N1, N2) try to truncate the same file(R1) concurrently. N1 N2 lock(R1, EX) changing LVB: x

Re: [Cluster-devel] [PATCH -next] dlm: fix error return code in sctp_accept_from_sock()

2016-10-25 Thread Eric Ren
Hi, On 10/25/2016 01:03 PM, weiyongjun (A) wrote: Hi Eric, -Original Message- From: Eric Ren [mailto:z...@suse.com] Sent: Tuesday, October 25, 2016 10:52 AM To: Wei Yongjun ; Christine Caulfield ; David Teigland ; cluster- de...@redhat.com; weiyongjun (A) Subject: Re: [Cluster-devel

Re: [Cluster-devel] [PATCH -next] dlm: fix error return code in sctp_accept_from_sock()

2016-10-24 Thread Eric Ren
Hi, Coding style patch is always rejected;-) Eric On 10/22/2016 10:37 PM, Wei Yongjun wrote: From: Wei Yongjun Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun --- fs/dlm/lowcomms.c | 3 ++- 1 fi

[Cluster-devel] [PATCH] dlm: fix malfunction of dlm_tool caused by debugfs changes

2016-08-25 Thread Eric Ren
_create_file() with debugfs_create_file_unsafe(); 2nd, make different table_open#() accordingly. The 1st one is neat, but I don't thoroughly understand its risk. Maybe someone has a better one. Signed-off-by: Eric Ren --- fs/dlm/debug_fs.c | 62 ++

Re: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-20 Thread Eric Ren
Hi David, On 05/19/2016 02:50 AM, David Teigland wrote: On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: Q1: what's stateful merged node? Q2: what if we add the stateful merged nodes to dlm_controld daemon cpg instead of fencing them? The details here are fundamental to th

[Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging

2016-05-20 Thread Eric Ren
uot; becomes true. David advised me to do the right thing;-) Thanks a lot! Signed-off-by: Eric Ren --- dlm_controld/daemon_cpg.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c index 356e80d..0d55027 10064

Re: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-17 Thread Eric Ren
node, it looks like every node own this volume; so corruption may happen? Thanks a lot, Eric On 05/17/2016 08:10 PM, Eric Ren wrote: Hi David, This is just a draft patch for you to review;-) There's an issue I'm not sure: where should we clear "stateful_merge_wait"?

Re: [Cluster-devel] [DLM PATCH] dlm_controld: add option of enable_force_kick

2016-05-17 Thread Eric Ren
Hello David: On 05/17/2016 01:12 AM, David Teigland wrote: This looks good. Would you still use this patch if we add the new dlm_tool output from the other email? Please hold back this for now;-) I prefer to drop this method if the latter one works better. And I'm trying to working this out

[Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful merging

2016-05-17 Thread Eric Ren
uot; becomes true. David advised me to do the right thing;-) Thanks a lot! Signed-off-by: Eric Ren --- dlm_controld/daemon_cpg.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c index 356e80d..8f6434f 100644 ---

[Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-17 Thread Eric Ren
Hi David, This is just a draft patch for you to review;-) There's an issue I'm not sure: where should we clear "stateful_merge_wait"? And I need more communications with pacemaker guys and more time for testing. I will send you the formal patch if things get done;-) Thanks, Eric

[Cluster-devel] [DLM PATCH] dlm_controld: add option of enable_force_kick

2016-05-16 Thread Eric Ren
e merged partitions are kicking the other out of the cluster at the same time. Signed-off-by: Eric Ren --- dlm_controld/daemon_cpg.c | 6 +- dlm_controld/dlm.conf.5 | 2 ++ dlm_controld/dlm_controld.8 | 5 + dlm_controld/dlm_daemon.h | 1 + dlm_controld/main.c | 6 +++

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-16 Thread Eric Ren
Hi David, On 05/13/2016 11:49 PM, David Teigland wrote: If both sides of the merged partition are kicking the other out of the cluster at the same time, it's hard to predict which nodes will remain (and it could be none). To resolve an even partition merge, you need to remove/restart the nodes

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-12 Thread Eric Ren
Hi David, Thanks very very much for explaining this to me in such nice way;-) On 05/13/2016 12:51 AM, David Teigland wrote: T = time in seconds, A,B,C = cluster nodes. At T=1 A,B,C become members and have quorum. At T=10 a partition creates A,B | C. At T=11 it merges and creates A,B,C. At T=

[Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-12 Thread Eric Ren
time". We now skip this chance of telling corosync to kill cluster for stateful merge. As a result, any fencing cannot proceed further. Signed-off-by: Eric Ren --- dlm_controld/daemon_cpg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dlm_controld/daemon_cpg.c b/dlm_contro

[Cluster-devel] [PATCH] dlm: make dlm_posix_lock comply with posix file lock semanteme

2015-10-14 Thread Eric Ren
lable with wait_event_interruptible can fix this issue. Signed-off-by: Eric Ren Acked-by: David Teigland --- fs/dlm/plock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 5532f09..88f1036 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c

[Cluster-devel] [PATCH] dlm: make dlm_posix_lock comply with posix file lock semanteme

2015-10-14 Thread Eric Ren
lable with wait_event_interruptible can fix this issue. Signed-off-by: Eric Ren Acked-by: David Teigland --- fs/dlm/plock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 5532f09..88f1036 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c

[Cluster-devel] dlm: make dlm_posix_lock comply with posix file lock semanteme

2015-10-14 Thread Eric Ren
Hi David and all, After using this patch, deadlock test for posix file lock, both on local and on cluster mode for ocfs2, acts the same now as other local fs like ext4. But, I have no gfs2 environment to verify if this issue would happen and can be fixed by this patch. Hope anyone working on gfs2

Re: [Cluster-devel] problem about dlm posix file lock (sorry for missing subject)

2015-10-13 Thread Eric Ren
Hi Divad, Please see comments in lines;-) David Teigland write: > >> On Tue, Oct 13, 2015 at 04:30:53AM -0600, Zhen Ren wrote: >> It expects alarm timeout to send SIGALRM, and wake up the sleep process, >> as "man fcntl" says: "If a signal is caught while waiting, then >> the call is inter