[Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

2016-01-19 Thread Eric Ren
BAST dc thread requeue=yes R1(clear OCFS2_LOCK_UPCONVERT_FINISHING,wait) R2(wait) ... dlmglue deadlock util dc thread woken up by others This fix is to clear OCFS2_LOCK_UPCONVERT_FINISHING util OCFS2_LOCK_BUSY has been cleared and every waiters has been looped. Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 4 ++--

Re: [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

2016-01-19 Thread Eric Ren
ED being cleared and OCFS2_LOCK_BUSY being set, we should do things like that. But is any chance that both OCFS2_LOCK_BUSY and OCFS2_LOCK_BLOCKED are set at the same time? If not, I prefer this one. What do you think? Any comment would be appreciated. Thanks, Eric On Wed, Jan 20, 2016 at 12:46:

Re: [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

2016-01-21 Thread Eric Ren
nline( at least 4.4), right? I have found this patch in maillist and it looks good! I'd like to test it right now and give feadback! Thanks again, Eric > > Thanks, > Junxiao. > On 01/20/2016 12:46 AM, Eric Ren wrote: > > This problem was introduced by co

Re: [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

2016-01-21 Thread Eric Ren
Hi all, On Thu, Jan 21, 2016 at 03:05:58PM -0800, Andrew Morton wrote: > On Thu, 21 Jan 2016 16:18:38 +0800 Junxiao Bi wrote: > > > On 01/21/2016 04:10 PM, Eric Ren wrote: > > > Hi Junxiao, > > > > > > On Thu, Jan 21, 2016 at 03:10:20PM +

Re: [Ocfs2-devel] [PATCH 2/6] ocfs2: o2hb: add NEGO_TIMEOUT message

2016-01-24 Thread Eric Ren
On Wed, Jan 20, 2016 at 11:13:35AM +0800, Junxiao Bi wrote: > This message is sent to master node when non-master nodes's > negotiate timer expired. Master node records these nodes in > a bitmap which is used to do write timeout timer re-queue > decision. > > Signed-off-by: Junxiao Bi > Reviewed

Re: [Ocfs2-devel] [PATCH 4/6] ocfs2: o2hb: add some user/debug log

2016-01-24 Thread Eric Ren
Hi Junxiao, On Wed, Jan 20, 2016 at 11:13:37AM +0800, Junxiao Bi wrote: > Signed-off-by: Junxiao Bi > Reviewed-by: Ryan Ding > --- > fs/ocfs2/cluster/heartbeat.c | 39 --- > 1 file changed, 32 insertions(+), 7 deletions(-) > > diff --git a/fs/ocfs2/cluste

Re: [Ocfs2-devel] [PATCH 2/6] ocfs2: o2hb: add NEGO_TIMEOUT message

2016-01-24 Thread Eric Ren
On Mon, Jan 25, 2016 at 12:28:08PM +0800, Junxiao Bi wrote: > On 01/25/2016 11:18 AM, Eric Ren wrote: > >> > >> > @@ -2039,13 +2086,30 @@ static struct config_item > >> > *o2hb_heartbeat_group_make_item(struct config_group *g > >> > > >&g

Re: [Ocfs2-devel] [PATCH 4/6] ocfs2: o2hb: add some user/debug log

2016-01-24 Thread Eric Ren
Hi Juxiao, On Mon, Jan 25, 2016 at 12:29:05PM +0800, Junxiao Bi wrote: > On 01/25/2016 11:28 AM, Eric Ren wrote: > >> @@ -449,7 +470,11 @@ static int o2hb_nego_timeout_handler(struct o2net_msg > >> *msg, u32 len, void *data, > >> > static int o2hb_nego_appr

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: move lock to the tail of grant queue while doing in-place convert

2016-01-29 Thread Eric Ren
Hello jiufei, On Wed, Jan 27, 2016 at 05:52:05PM +0800, xuejiufei wrote: > We have found a bug when two nodes doing umount one after another. > 1) Node 1 migrate a lockres that has 3 locks in grant queue such as > N2(PR)<->N3(NL)<->N4(PR) to N2. After migration, lvb of the lock N3(NL) > and N4(PR

Re: [Ocfs2-devel] ocfs2-test for v4.3 done

2016-02-16 Thread Eric Ren
Hi Junxiao, > >> I have setup a test env to build and auto do ocfs2 test. With it, Ocfs2 > >> for mainline and linux-next will be test regularly, the test status and > >> bugs will be reported to ocfs2-devel. Feel free to take any bug if you > >> are interested, it will be a good start point with

Re: [Ocfs2-devel] ocfs2-test for v4.3 done

2016-02-18 Thread Eric Ren
boot up even if we've built kernel RPM and installed it. Did you have this problem? Any suggestion;-) What I can think of is to try opensuse tumbleweed distribution(a rolling release). > > On 02/16/2016 05:54 PM, Eric Ren wrote: > > Hi Junxiao, > > > >> Fou

Re: [Ocfs2-devel] ocfs2-test for v4.3 done

2016-02-23 Thread Eric Ren
Hi Junxiao, On 02/24/2016 09:48 AM, Junxiao Bi wrote: > Hi Eric, > > On 02/19/2016 11:01 AM, Eric Ren wrote: >> Hi Junxiao, >> >> On Wed, Feb 17, 2016 at 10:15:56AM +0800, Junxiao Bi wrote: >>> Hi Eric, >>> >>> I remember i described it befo

Re: [Ocfs2-devel] [PATCH v4 2/5] ocfs2: sysfile interfaces for online file check

2016-03-08 Thread Eric Ren
On 02/29/2016 01:17 PM, Gang He wrote: > Implement online file check sysfile interfaces, e.g. > how to create the related sysfile according to device name, > how to display/handle file check request from the sysfile. > > Signed-off-by: Gang He Tested-by: Eric Ren > --- >

Re: [Ocfs2-devel] [PATCH v4 4/5] ocfs2: check/fix inode block for online file check

2016-03-08 Thread Eric Ren
On 02/29/2016 01:18 PM, Gang He wrote: > Implement online check or fix inode block during > reading a inode block to memory. > > Signed-off-by: Gang He Tested-by: Eric Ren > --- > fs/ocfs2/inode.c | 225 > +++-- >

Re: [Ocfs2-devel] [Ocfs2-tools-devel] ocfs2 tools

2016-03-18 Thread Eric Ren
> Hello Mark, > > Could you help to add these accounts in ocfs2-tools project on github? > > Gang He - SUSE: ganghe > Junxiao - Oracle: biger410 > Joseph Qi - Huawei: josephhz Maybe, we can also create an ocfs2 organization on github;-) And I'm thinking of if we can have an IRC channe

[Ocfs2-devel] Welcome to join in: an ocfs2 IRC channel is setup for quick communications

2016-04-01 Thread Eric Ren
Hello Mark and all, Last week, I proposed to have an IRC channel for OCFS2 in tools maillist [1]. I'm afraid most people probably didn't even notice it. So, I bring up here with a right subject. Really hope it helps: #ocfs2 on freenode, you can login through website[1], or Thunderbird[2]. Welcom

Re: [Ocfs2-devel] Welcome to join in: an ocfs2 IRC channel is setup for quick communications

2016-04-04 Thread Eric Ren
Hello Goldwyn, On 04/04/2016 07:39 PM, Goldwyn Rodrigues wrote: > Hello Eric, > > On 04/01/2016 08:50 PM, Eric Ren wrote: >> Hello Mark and all, >> >> Last week, I proposed to have an IRC channel for OCFS2 in tools maillist >> [1]. I'm afraid most

Re: [Ocfs2-devel] Dead lock and cluster blocked, any advices will be appreciated.

2016-05-08 Thread Eric Ren
Hello Zhonghua, Thanks for reporting this. On 05/07/2016 07:30 PM, Guozhonghua wrote: > Hi, we had find one dead lock scenario. > > Suddenly, the Node 2 is rebooted(fenced) for IO error accessing storage. So > its slot 2 is remained valid on storage disk. > The node 1 which is in the same cluste

Re: [Ocfs2-devel] Reflink hangs with kernel 4.4

2016-05-09 Thread Eric Ren
Hello: On 05/09/2016 09:20 PM, 서정우 wrote: Hi all. I built up ocfs2 on drbd dual primary. Each node has 12 disks of Raid 10 with mdadm chuck size 4096k. Cluster size of filesystem is 1048576 bytes. Main purpose of use is reflink files on drbd. I reflinked files from 1TB file and exported them

[Ocfs2-devel] [PATCH] ocfs2: fix a redundant re-initialization

2016-05-22 Thread Eric Ren
Obviously, memset() has zeroed the whole struct locking_max_version. So, it's no need to zero its two fields individually. Signed-off-by: Eric Ren --- fs/ocfs2/stackglue.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 5d965e8..85

[Ocfs2-devel] [PATCH] ocfs2: fix improper handling of return errno

2016-05-22 Thread Eric Ren
Signed-off-by: Eric Ren --- fs/ocfs2/inode.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index ba495be..fee5ec6 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -176,12 +176,7 @@ struct inode *ocfs2_iget(struct ocfs2_super

[Ocfs2-devel] [PATCH] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-06-22 Thread Eric Ren
g when ENOSPC is returned. And we cannot reuse the deleted blocks before the transaction committed. Fortunately, we already have a function to do this - ocfs2_try_to_free_truncate_log(). Just need to remove the "static" modifier and put it into a right place. Sign

Re: [Ocfs2-devel] [PATCH] ocfs2: remove obscure BUG_ON in dlmglue

2016-07-03 Thread Eric Ren
Good catch, thanks! Reviewed-by: Eric Ren On 07/01/2016 05:10 PM, Joseph Qi wrote: > These BUG_ON(!inode) are obscure because we have already used inode to > get osb. And actually we can guarantee here inode is valid in the > context. So we can safely remove them. > > Signed-of

Re: [Ocfs2-devel] ocfs2: cleanup implemented prototypes

2016-07-03 Thread Eric Ren
Hi Joseph, Please see comments inline;-) On 07/01/2016 05:27 PM, Joseph Qi wrote: > Several prototypes in inode.h are just defined but not actually > implemented and used, so remove them. > > Signed-off-by: Joseph Qi > --- > fs/ocfs2/inode.h | 7 --- > fs/ocfs2/super.c | 1 - > 2 files c

Re: [Ocfs2-devel] ocfs2: cleanup implemented prototypes

2016-07-03 Thread Eric Ren
anks, > Joseph > > On 2016/7/4 11:36, Eric Ren wrote: >> Hi Joseph, >> >> Please see comments inline;-) >> >> On 07/01/2016 05:27 PM, Joseph Qi wrote: >>> Several prototypes in inode.h are just defined but not actually >>> implemented and us

Re: [Ocfs2-devel] [PATCH] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-06 Thread Eric Ren
Hi Joseph, On 07/06/2016 12:21 PM, Joseph Qi wrote: > NAK, if ocfs2_try_to_free_truncate_log fails, it will lead to double > ocfs2_inode_unlock and then BUG. Thanks for pointing out this! Will fix this and resend. Eric > > On 2016/6/22 17:07, Eric Ren wrote: >> The testcase

[Ocfs2-devel] [PATCH v2] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-06 Thread Eric Ren
code isn't elegant, but looks no better option. v2: 1. Lock allocator inode again if ocfs2_schedule_truncate_log_flush() fails. -- spotted by Joseph Qi Signed-off-by: Eric Ren --- fs/ocfs2/alloc.c| 37 + fs/ocfs2/alloc.h| 2 ++ fs/ocfs2/aops

Re: [Ocfs2-devel] [PATCH v2] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-07 Thread Eric Ren
Hi Joseph, On 07/07/2016 09:00 AM, Joseph Qi wrote: >> @@ -1164,7 +1164,8 @@ static int ocfs2_reserve_clusters_with_limit(struct >> ocfs2_super *osb, >> int flags, >> struct ocfs2_alloc_context **ac) >> { >> -

[Ocfs2-devel] [PATCH v3] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-08 Thread Eric Ren
e isn't elegant, but looks no better option. v3: 1. Also need to lock allocator inode when "= 0" is returned from ocfs2_schedule_truncate_log_flush(), which means no space really. -- spotted by Joseph Qi v2: 1. Lock allocator inode again if ocfs2_schedule_truncate_log_flush() fa

Re: [Ocfs2-devel] A question on AST

2016-07-25 Thread Eric Ren
Hi, On 07/23/2016 04:54 PM, Gechangwei wrote: > Hi, > I have a question on AST related procedure. > If a lock request has been sent to lock resource’s owner node right before > this owner node crashes. > Then no one will send back to the requested node with AST, this will cause > the requested n

Re: [Ocfs2-devel] [PATCH] A bug in the end of DLM recovery

2016-08-07 Thread Eric Ren
Hi, On 08/06/2016 01:58 PM, Gechangwei wrote: > Hi, > > I found an issue in the end of DLM recovery. What's the detailed steps of reproduction? > When DLM recovery comes to the end of recovery procedure, it will remaster > all locks in other nodes. > Right after a request message is sent to a n

[Ocfs2-devel] OCFS2 test report for linux vanilla kernel V4.7.0

2016-08-21 Thread Eric Ren
Hi, The test report below is agaist vanilla kernel v4.7.0. Some highlights: 1. As you can see from logs attached, pcmk stack is used with "blocksize=4096, clustersize=32768"; 2. "inline" testcase on multiple nodes failed 3/3 times so far; seems to be a regression issue; 3. Two cases are s

Re: [Ocfs2-devel] OCFS2 test report for linux vanilla kernel V4.8.0-rc2

2016-08-22 Thread Eric Ren
Sorry, actually it's already: 4.8.0-rc2-173-g184ca82-1.gacbdb4b-vanilla Eric On 08/22/2016 10:42 AM, Eric Ren wrote: > Hi, > > > The test report below is agaist vanilla kernel v4.7.0. Some highlights: > > 1. As you can see from logs attached, pcmk stack is used

Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-08-28 Thread Eric Ren
Hi, Thanks for this fix. I'd like to reproduce this issue locally and test this patch, could you elaborate the detailed steps of reproduction? Thanks, Eric On 08/27/2016 07:04 AM, Ashish Samant wrote: > If we punch a hole on a reflink such that following conditions are met: > > 1. start offset

Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-08-29 Thread Eric Ren
1048576 count=1 | hexdump -C cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd || * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s 0010 3. debugfs.ocfs2 -R stats /dev/sdb ... Block Size Bits: 12 Cluster Size Bits: 20 ...

Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-08-30 Thread Eric Ren
d cd || I'm not familiar with this code. So why is the output "cd ..."? because we didn't write anything into "10MBfile". Is it a magic number when reading from a hole? Eric > * > 1+0 records in > 1+0 records out > 0010 > > Th

Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-08-30 Thread Eric Ren
cd cd cd cd cd cd cd cd || * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0933082 s, 11.2 MB/s 0010 > > On 08/30/2016 12:38 AM, Eric Ren wrote: >> Hi, >> >> I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-) >> &g

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-10 Thread Eric Ren
in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Fix this issue by unlocking the target page after we fail to allocate enough space at the first time. Jan Kara helps me clear out the JBD2 part, and suggest the hint for ro

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-11 Thread Eric Ren
Hi Joseph, On 09/12/2016 09:37 AM, Joseph Qi wrote: Hi Eric, On 2016/9/10 17:55, Eric Ren wrote: The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-11 Thread Eric Ren
; Eric > >> >> Thanks, >> Joseph >> >>> Fix this issue by unlocking the target page after we fail to allocate >>> enough space at the first time. >>> >>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root >&

Re: [Ocfs2-devel] [Ocfs2-users] ocf2 mount point hangs

2016-09-13 Thread Eric Ren
Hi, On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote: > Hi All, > > I have an ocfs2 mount point of 3 ceph cluster nodes and suddenly I > cannot read and write to the mount point although the cluster is clean > and showing no errors. 1. What is your ocfs2 shared disk? I mean it's a shared disk export

Re: [Ocfs2-devel] [PATCH] ocfs2: free the mle while the res had one, to avoid mle memory leak.

2016-09-13 Thread Eric Ren
Hi, On 09/13/2016 03:52 PM, Guozhonghua wrote: > In the function dlm_migrate_request_handler, while the ret is --EEXIST, the > mle should be freed, otherwise the memory will be leaked. Keep your commit comments within 75 or 78 (I don't remember clearly but git will warn if you don't keep its rule

Re: [Ocfs2-devel] [PATCH] ocfs2: fix double unlock in case retry after free truncate log

2016-09-14 Thread Eric Ren
Hello Joseph, Thanks for fixing up this. On 09/14/2016 12:15 PM, Joseph Qi wrote: > If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to > free truncate log and then retry. Since ocfs2_try_to_free_truncate_log > will lock/unlock global bitmap inode, we have to unlock it before >

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren
rror code? Thanks, Eric Thanks, Joseph Fix this issue by unlocking the target page after we fail to allocate enough space at the first time. Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Signed-off-by: Eric Ren --- fs/ocfs2/aops.c | 7 +++ 1 file change

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren
Hi, On 09/12/2016 11:06 AM, Eric Ren wrote: > Hi, >>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and >>> w_target_locked is set to true, and then will be unlocked by >>> ocfs2_unlock_pages in ocfs2_free_write_ctxt. >>> So I&#x

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren
he mmapped page should be unlocked as long as we cannot return VM_FAULT_LOCKED to do_page_mkpage(). Otherwise, the deadlock will happen in do_page_mkpage(). Please see the recent 2 mails;-) Eric > > Thanks, > Joseph > > On 2016/9/14 16:04, Eric Ren wrote: >> Hi Joseph, >>>&g

[Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-17 Thread Eric Ren
and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into considera

Re: [Ocfs2-devel] [PATCH] ocfs2: fix double unlock in case retry after free truncate log

2016-09-17 Thread Eric Ren
Hello Joseph, On 09/14/2016 04:13 PM, Joseph Qi wrote: > Hi Eric, > > On 2016/9/14 15:57, Eric Ren wrote: >> Hello Joseph, >> >> Thanks for fixing up this. >> >> On 09/14/2016 12:15 PM, Joseph Qi wrote: >>> If ocfs2_reserve_cluster_bitmap_bits fai

Re: [Ocfs2-devel] [PATCH v2 RESEND] ocfs2: fix double unlock in case retry after free truncate log

2016-09-17 Thread Eric Ren
id return value overwritten issue. > > Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in > truncate log" > Signed-off-by: Joseph Qi > Signed-off-by: Jiufei Xue LGTM Reviewed-by: Eric Ren > --- > fs/ocfs2/suballoc.c | 14 -- > 1

Re: [Ocfs2-devel] [PATCH] ocfs2: Fix double put of recount tree in ocfs2_lock_refcount_tree()

2016-09-17 Thread Eric Ren
ode_for_refcount+0x115/0x200 [ocfs2] > [] ? ocfs2_inode_unlock+0xd4/0x140 [ocfs2] > [] ocfs2_prepare_inode_for_write+0x33b/0x470 [ocfs2] > [] ? ocfs2_rw_lock+0x80/0x190 [ocfs2] > [] ocfs2_file_write_iter+0x220/0x8c0 [ocfs2] > [] ? mempool_free_slab+0x17/0x20 > [] ? bio_free+0x61/0x

Re: [Ocfs2-devel] [PATCH] ocfs2: fix undefined struct variable in inode.h

2016-09-21 Thread Eric Ren
LGTM Reviewed-by: Eric Ren --- fs/ocfs2/inode.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h index 50cc550..5af68fc 100644 --- a/fs/ocfs2/inode.h +++ b/fs/ocfs2/inode.h @@ -123,8 +123,6 @@ static inline struct ocfs2_inode_info *OCFS2_I(struct inode

[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren
Hi Junxiao, As the subject, the testing hung there on a kernel without your patches: "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang" and "ocfs2: fix posix_acl_create deadlock" The stack trace is: ``` ocfs2cts1:~ # pstree -pl 24133 discontig_runne(24133)───activate_discon(

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren
Hi Junxiao, On 10/11/2016 10:58 AM, Junxiao Bi wrote: >> Do you think this issue can be fixed by your patches? > Looks not. Those two patches are to fix recursive locking deadlock. But > from above call trace, there is no recursive lock. OK, thanks a lot! Eric > > Thanks, > Junxiao. >> I will try

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-11 Thread Eric Ren
Hi Junxiao, > Hi Eric, > > On 10/11/2016 10:42 AM, Eric Ren wrote: >> Hi Junxiao, >> >> As the subject, the testing hung there on a kernel without your patches: >> >> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang" >>

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-11 Thread Eric Ren
among mutilple nodes at the same time. Thanks, Eric On 10/12/2016 09:23 AM, Eric Ren wrote: > Hi Junxiao, > >> Hi Eric, >> >> On 10/11/2016 10:42 AM, Eric Ren wrote: >>> Hi Junxiao, >>> >>> As the subject, the testing hung there on a kernel without y

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren
Hi Junxiao, On 10/12/2016 02:47 PM, Junxiao Bi wrote: > On 10/12/2016 10:36 AM, Eric Ren wrote: >> Hi, >> >> When backporting those patches, I find that they are already in our >> product kernel, maybe >> via "stable kernel" policy, although our prod

Re: [Ocfs2-devel] ocfs2-test passed on linux-next/next-20161006

2016-10-12 Thread Eric Ren
Hi Junxiao, On 10/12/2016 02:54 PM, Junxiao Bi wrote: > Hi all, > > I just finished a full ocfs2 test(single/multiple/discontig) on > linux-next/next-20161006. All test case passed. That's a good sign of > quality. Thank you for your effort. Great! Thanks for your efforts! Eric > > Thanks, > Ju

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren
Hi, On 10/12/2016 05:45 PM, Junxiao Bi wrote: > On 10/12/2016 05:34 PM, Eric Ren wrote: >> Hi Junxiao, >> >> On 10/12/2016 02:47 PM, Junxiao Bi wrote: >>> On 10/12/2016 10:36 AM, Eric Ren wrote: >>>> Hi, >>>> >>>> When backporti

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-14 Thread Eric Ren
nks, Eric On 10/12/2016 09:23 AM, Eric Ren wrote: > Hi Junxiao, > >> Hi Eric, >> >> On 10/11/2016 10:42 AM, Eric Ren wrote: >>> Hi Junxiao, >>> >>> As the subject, the testing hung there on a kernel without your patches: >>> >>> &

[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-18 Thread Eric Ren
Hi all! Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in another deadlock as we have discussed in the recent thread: https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html Before this one, a similiar deadlock has been fixed by Junxiao: co

[Ocfs2-devel] [DRAFT 1/2] ocfs2/dlmglue: keep track of the processes who take/put a cluster lock

2016-10-18 Thread Eric Ren
o can help debug cluster locking issue. Unfortunately, this may incur some performance lost. Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 60 ++ fs/ocfs2/dlmglue.h | 13 fs/ocfs2/ocfs2.h | 1 + 3 files changed, 74 inser

[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-18 Thread Eric Ren
EX request comes between two ocfs2_inode_lock() Fix by checking if the cluster lock has been acquired aready in the call-chain path. Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren --- fs/ocfs2/acl.c | 39 +++

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-19 Thread Eric Ren
Hi Junxiao, On 10/19/2016 02:57 PM, Junxiao Bi wrote: > I had ever implemented generic recursive locking support, please check the > patch at > https://oss.oracle.com/pipermail/ocfs2-devel/2015-December/011408.html > , >

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-24 Thread Eric Ren
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote: > The thrid one is to revert that problematic commit! It looks like > get/set_acl() > are always been called by other vfs callback like ocfs2_permission(). I think > we can do this if it's true, right? Anyway, I'll try to wo

[Ocfs2-devel] BUG_ON(le64_to_cpu(fe->i_size) != i_size_read(inode)) is triggered in ocfs2_truncate_file()

2016-10-27 Thread Eric Ren
Hi all, Any one ever see this BUG_ON() assertion (https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L460) triggered? (paste log message at the end). I cannot reproduced it so far. fallocate with FALLOC_FL_KEEP_SIZE flag (man 2 fallocate) can result in "le64_to_cpu(fe->i_size) != i

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-28 Thread Eric Ren
Hi Christoph! Thanks for your attention. On 10/28/2016 02:20 PM, Christoph Hellwig wrote: Hi Eric, I've added linux-fsdevel to the cc list as this should get a bit broader attention. On Wed, Oct 19, 2016 at 01:19:40PM +0800, Eric Ren wrote: Mostly, we can avoid recursive locking by wr

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-31 Thread Eric Ren
Hi, On 10/31/2016 06:55 PM, piaojun wrote: > Hi Eric, > > On 2016-10-19 13:19, Eric Ren wrote: >> The deadlock issue happens when running discontiguous block >> group testing on multiple nodes. The easier way to reproduce >> is to do "chmod -R 777 /mnt/ocfs2"

[Ocfs2-devel] what is g_f_a_w_n() short for? thanks

2016-11-07 Thread Eric Ren
Hello Mark, There is a piece of comment that confused me, please correct me: https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L2274 ``` ocfs2_file_write_iter() { ... /* * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io * function pointer which i

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-11-08 Thread Eric Ren
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote: ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). The problem is that the call chain of ocfs2_permission() includes *_acl(). Possibly, there are three solutions I can think of. The first one is to implement the inode

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-08 Thread Eric Ren
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote: diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..7e3544e 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int

Re: [Ocfs2-devel] [PATCH 1/6] ocfs2: convert inode refcount test to a helper

2016-11-09 Thread Eric Ren
On 11/10/2016 06:51 AM, Darrick J. Wong wrote: > Replace the open-coded inode refcount flag test with a helper function > to reduce the potential for bugs. Thanks for this series;-) Some comments inline below: > > Signed-off-by: Darrick J. Wong > --- > fs/ocfs2/refcounttree.c | 28

Re: [Ocfs2-devel] ocfs2: A race about mle is unlinked and freed for the dead node, BUG

2016-11-09 Thread Eric Ren
Hi, I am not familiar with ocfs2/dlm code, but I am trying to... On 11/09/2016 06:17 PM, Zhangguanghui wrote: > Hi All, > > when the mle have been used in dlm_get_lock_resouce, other nodes dead at the > same time, > the mle that is block type may be unlinked and freed repeatedly for dead > node

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-10 Thread Eric Ren
Hi, On 11/10/2016 06:49 PM, piaojun wrote: > Hi Eric, > > On 2016-11-1 9:45, Eric Ren wrote: >> Hi, >> >> On 10/31/2016 06:55 PM, piaojun wrote: >>> Hi Eric, >>> >>> On 2016-10-19 13:19, Eric Ren wrote: >>>> The deadlock is

Re: [Ocfs2-devel] [PATCH 0/6] ocfs2: wire up {clone, copy, dedupe}_range

2016-11-10 Thread Eric Ren
Hi, On 11/10/2016 06:51 AM, Darrick J. Wong wrote: > Hi all, > > These patches wire up the existing ocfs2 reflinking capabilities to > the new(ish) VFS {copy,clone,dedupe}_range interface. The first few > patches clean up some minor bugs that I found; the last kernel patch > contains the new code

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren
Hi, A few issues obvious to me: On 11/10/2016 06:51 AM, Darrick J. Wong wrote: > Connect the new VFS clone_range, copy_range, and dedupe_range features > to the existing reflink capability of ocfs2. Compared to the existing > ocfs2 reflink ioctl We have to do things a little differently to suppo

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren
On 11/11/2016 02:20 PM, Darrick J. Wong wrote: > On Fri, Nov 11, 2016 at 01:49:48PM +0800, Eric Ren wrote: >> Hi, >> >> A few issues obvious to me: >> >> On 11/10/2016 06:51 AM, Darrick J. Wong wrote: >>> Connect the new VFS clone_range, copy_range, and

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren
Hi, On 11/14/2016 01:42 PM, piaojun wrote: > Hi Eric, > > > OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST > (ocfs2_generic_handle_bast) when downconvert is needed > on behalf of remote lock request. > > The recursive cluster lock (the second one) will be blocked in > __ocfs2_cluster_loc

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren
Hi, > Thanks for your attention. Actually, I tried different versions of draft > patch locally. > Either of them can satisfy myself so far. Sorry, I meat "neither of them". Eric > Some rules I'd like to follow: > 1) check and avoid recursive cluster locking, rather than allow it which > Junxiao

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-15 Thread Eric Ren
Hi Dan, On 11/15/2016 06:36 PM, Dan Carpenter wrote: > Ryan's email is dead. But this is buggy. Someone please fix it. > > regards, > dan carpenter > > On Tue, Nov 15, 2016 at 01:33:30PM +0300, Dan Carpenter wrote: >> I never got a response on this. I was looking at it today and it still >> loo

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-16 Thread Eric Ren
Hi, On 11/16/2016 06:45 PM, Dan Carpenter wrote: > On Wed, Nov 16, 2016 at 10:33:49AM +0800, Eric Ren wrote: > That silences the warning, of course, but I feel like the code is buggy. > How do we know that we don't hit that exit path? Sorry, I missed your point. Do you mean the belo

[Ocfs2-devel] [Bug Report] multiple node reflink: kernel BUG at ../fs/ocfs2/suballoc.c:1989!

2016-11-23 Thread Eric Ren
Hi all, FYI, Reflink testcase in multiple nodes mode failed with the backtrace below: --- 2016-11-02T16:43:41.862247+08:00 ocfs2cts2 kernel: [25429.622914] [ cut here ] 2016-11-02T16:43:41.862273+08:00 ocfs2cts2 kernel: [25429.622979] kernel BUG at ../fs/ocfs2/suballoc.

Re: [Ocfs2-devel] [PATCH] ocfs2: Optimization of code while free dead locks.

2016-11-27 Thread Eric Ren
Hi, On 11/26/2016 08:15 PM, Guozhonghua wrote: > The three loops can be optimized into one loop and its sub loops, so as small > code can do the same work. > The patch is based on the linux-4.9-rc6. > > Signed-off-by: Guozhonghua > > > --- ocfs2.orig/dlm/dlmrecovery.c2016-11-26 19:13:04.

Re: [Ocfs2-devel] [PATCH] ocfs2: Optimization of code while free dead locks, changed for reviews.

2016-11-28 Thread Eric Ren
Hi, I am tired telling you things about patch format... won't do any response until you really model after a correct patch. Eric On 11/28/2016 05:05 PM, Guozhonghua wrote: > Changed the free order and code styles with reviews. Based on Linux-4.9-rc6. > Thanks. > > Signed-off-by: guozhonghua

Re: [Ocfs2-devel] [PATCH 0/7] quota: Use s_umount for quota on/off serialization

2016-11-30 Thread Eric Ren
some multi-node setup (I have tested just with a single node), especially > whether quota file recovery for other nodes still works as expected. Thanks. With this patch set, the quota file recovery works well for ocfs2 on multiple nodes. Tested-by:Eric Ren Thanks, Eric > >

[Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren
dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 10 ++ fs/ocfs2/stackglue.c | 6 ++ fs/ocfs2/stackglue.h | 3 +++ 3 files changed, 19 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c ind

[Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren
dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 10 ++ fs/ocfs2/stackglue.c | 6 ++ fs/ocfs2/stackglue.h | 3 +++ 3 files changed, 19 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c ind

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren
Sorry, this email is not delivered to Mark successfully because of one weird character trailing his email address somehow. So, resend later... Thanks, Eric On 12/09/2016 05:24 PM, Eric Ren wrote: > The crash happens rather often when we reset some cluster > nodes while nodes contend fi

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-11 Thread Eric Ren
RSB1: EX >>reset Node2 >> dlm_recover_rsbs() >>recover_lvb() >> >> /* The LVB is not trustable if the node with EX fails and >> * no lock >= PR is left. We should set RSB_VALNOTVALID f

[Ocfs2-devel] [PATCH 06/17] multi_mmap: make log messages go to right place

2016-12-12 Thread Eric Ren
The option "--logfile" is missing now. Thus, log messages go into "o2t.log", which is a apparent mistake. Signed-off-by: Eric Ren --- programs/python_common/multiple_run.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/programs/python_common/multipl

[Ocfs2-devel] [PATCH 01/17] ocfs2 test: correct the check on testcase if supported

2016-12-12 Thread Eric Ren
Signed-off-by: Eric Ren --- programs/python_common/multiple_run.sh | 2 +- programs/python_common/single_run-WIP.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/programs/python_common/multiple_run.sh b/programs/python_common/multiple_run.sh index dd9603f..c4a7da9

[Ocfs2-devel] [PATCH 00/17] ocfs2-test: misc improvements and trivial fixes

2016-12-12 Thread Eric Ren
- Misc trivial fixes: [PATCH 01/17] ocfs2 test: correct the check on testcase if supported [PATCH 02/17] Single Run: kernel building is little broken now [PATCH 03/17] Trivial: better not to depend on where we issue testing [PATCH 04/17] Trivial: fix a typo mistake [PATCH 05/17] Trivial: fix check

[Ocfs2-devel] [PATCH 02/17] Single Run: kernel building is little broken now

2016-12-12 Thread Eric Ren
Only check kernel source if we specify "buildkernel" test case. The original kernel source web-link cannot be reached, so give a new link instead but the md5sum check is missing now. Signed-off-by: Eric Ren --- programs/python_common/single_run-WIP.sh | 56 -

[Ocfs2-devel] [PATCH 13/17] Save punch_hole details into logfile for debugging convenience

2016-12-12 Thread Eric Ren
Signed-off-by: Eric Ren --- programs/discontig_bg_test/discontig_runner.sh | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/programs/discontig_bg_test/discontig_runner.sh b/programs/discontig_bg_test/discontig_runner.sh index 3be39c8..4c13adb 100755 --- a

[Ocfs2-devel] [PATCH 05/17] Trivial: fix checking empty return value

2016-12-12 Thread Eric Ren
We now get the below error even if "reserve space" testcase succeeds: "Error in log_end()" This is because we passed Nil to log_end. Signed-off-by: Eric Ren --- programs/python_common/single_run-WIP.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/programs/pyth

[Ocfs2-devel] [PATCH 17/17] discontig bg: give single and multiple node test different log file name

2016-12-12 Thread Eric Ren
Signed-off-by: Eric Ren --- programs/discontig_bg_test/discontig_runner.sh | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/programs/discontig_bg_test/discontig_runner.sh b/programs/discontig_bg_test/discontig_runner.sh index 182ca3a..f3a69f5 100755 --- a/programs

[Ocfs2-devel] [PATCH 09/17] Single run: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren
It takes too long to get the result of a round testing. This can shorten a lot time by eliminating 2-layer loops with blocksize and clustersize. Now blocksize defaults to 4096, while clustersize to 32768 if not specified. Signed-off-by: Eric Ren --- programs/backup_super/test_backup_super.sh

[Ocfs2-devel] [PATCH 10/17] Multiple run: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren
It takes too long to get the result of a round testing. This can shorten a lot time by eliminating 2-layer loops with blocksize and clustersize. Now blocksize defaults to 4096, while clustersize to 32768 if not specified. Signed-off-by: Eric Ren --- programs/inline-data/multi-inline-run.sh

[Ocfs2-devel] [PATCH 11/17] discontig bg: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren
Add "-b blocksize" and "-c clustersize" as optional parameters. It will keep the original behavior if we don't specify their values. Signed-off-by: Eric Ren --- programs/discontig_bg_test/discontig_runner.sh | 51 +- 1 file changed, 33 in

[Ocfs2-devel] [PATCH 03/17] Trivial: better not to depend on where we issue testing

2016-12-12 Thread Eric Ren
If we issue testing outsides directory where executives are, error likes the below may occur: "./config.sh No such file or directory". So let's depend on PATH environment rather that. Signed-off-by: Eric Ren --- programs/dirop_fileop_racer/racer.sh

[Ocfs2-devel] [PATCH 04/17] Trivial: fix a typo mistake

2016-12-12 Thread Eric Ren
Signed-off-by: Eric Ren --- programs/mkfs-tests/mkfs-test.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/programs/mkfs-tests/mkfs-test.sh b/programs/mkfs-tests/mkfs-test.sh index 3fc93a4..8fdd02a 100755 --- a/programs/mkfs-tests/mkfs-test.sh +++ b/programs/mkfs-tests/mkfs

  1   2   >