from:"Eric Ren"

[Ocfs2-devel] FYI: all testcases of ocfs2-test passed on 4.15.0-rc8-1.g05e4405-vanilla

2018-01-18 Thread Eric Ren


Hi,

As the subject, ocfs2-test ran with "-b 4096 -c 32768" parameters and 
fsdlm plugin,  passed

all cases on the recent upstream kernel. The overall results attached.

Eric


2018-01-18-20-33-09-discontig-bg-single-run.log
Description: Binary data


single_run.log
Description: Binary data


2018-01-19-04-28-18-discontig-bg-multiple-run.log
Description: Binary data


multiple-run-x86_64-2018-01-18-18-03-21.log
Description: Binary data
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

2018-01-12 Thread Eric Ren

Hi,

On 01/12/2018 11:43 AM, Shichangkuo wrote:
> Hi all,
> 　　Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with 
> umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows:
> journal recovery work:
> [] call_rwsem_down_read_failed+0x14/0x30
> [] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
> [] ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
> [] process_one_work+0x130/0x350
> [] worker_thread+0x46/0x3b0
> [] kthread+0x101/0x140
> [] ret_from_fork+0x1f/0x30
> [] 0x
>
> /bin/umount:
> [] flush_workqueue+0x104/0x3e0
> [] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
> [] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
> [] ocfs2_put_super+0x31/0xa0 [ocfs2]
> [] generic_shutdown_super+0x6d/0x120
> [] kill_block_super+0x2d/0x60
> [] deactivate_locked_super+0x51/0x90
> [] cleanup_mnt+0x3b/0x70
> [] task_work_run+0x86/0xa0
> [] exit_to_usermode_loop+0x6d/0xa9
> [] do_syscall_64+0x11d/0x130
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 　　
> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was 
> already locked by umount thread, then get a deadlock.

Good catch, thanks for reporting.  Is it reproducible? Can you please 
share the steps for reproducing this issue?
> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 
> 5f530de63cfc6ca8571cbdf58af63fb166cc6517.
> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already 
> removed.
> Shall we add a new mutex?

@Jan, I don't look into the code yet, could you help me understand why 
we need to get sb->s_umount in ocfs2_finish_quota_recovery?
Is it because that the quota recovery process will start at umounting? 
or some where else?

Thanks,
Eric



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Eric Ren

Hi,


On 12/27/2017 05:29 PM, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>
>dump_stack+0x5c/0x82
>panic+0xd5/0x21e
>watchdog_timer_fn+0x208/0x210
>? watchdog_park_threads+0x70/0x70
>__hrtimer_run_queues+0xcc/0x200
>hrtimer_interrupt+0xa6/0x1f0
>smp_apic_timer_interrupt+0x34/0x50
>apic_timer_interrupt+0x96/0xa0
>
>   RIP: 0010:unlock_page+0x17/0x30
>   RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>   RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>   RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>   RBP:  R08:  R09: af154080bb00
>   R10: af154080bc30 R11: 0040 R12: 993749a39518
>   R13:  R14: f21e009f5300 R15: f21e009f5300
>ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>ocfs2_readpage+0x41/0x2d0 [ocfs2]
>? pagecache_get_page+0x30/0x200
>filemap_fault+0x12b/0x5c0
>? recalc_sigpending+0x17/0x50
>? __set_task_blocked+0x28/0x70
>? __set_current_blocked+0x3d/0x60
>ocfs2_fault+0x29/0xb0 [ocfs2]
>__do_fault+0x1a/0xa0
>__handle_mm_fault+0xbe8/0x1090
>handle_mm_fault+0xaa/0x1f0
>__do_page_fault+0x235/0x4b0
>trace_do_page_fault+0x3c/0x110
>async_page_fault+0x28/0x30
>   RIP: 0033:0x7fa75ded638e
>   RSP: 002b:7ffd6657db18 EFLAGS: 00010287
>   RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
>   RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
>   RBP: 0003 R08: 000e R09: 
>   R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
>   R13: 000e R14: 1770 R15: 
>
> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
> Signed-off-by: Gang He 

On most linux server, CONFIG_PREEMPT is not set for better system-wide 
throughtput.
The long-time retry logic for getting page lock and inode lock can 
easily cause softlock,
resulting in real time task like corosync when using pcmk stack cannot 
be scheduled
on time.

When multiple nodes concurrently write the same file, the performance 
cannot be good
anyway, and it's also less possibility.

The trick for avoiding the busy loop looks good to me.

Reviewed-by: z...@suse.com

Thanks,
Eric

> ---
>   fs/ocfs2/dlmglue.c | 9 +
>   1 file changed, 9 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5193218 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>   if (ret == -EAGAIN) {
>   unlock_page(page);
> + /*
> +  * If we can't get inode lock immediately, we should not return
> +  * directly here, since this will lead to a softlockup problem.
> +  * The method is to get a blocking lock and immediately unlock
> +  * before returning, this can avoid CPU resource waste due to
> +  * lots of retries, and benefits fairness in getting lock.
> +  */
> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> + ocfs2_inode_unlock(inode, ex);
>   ret = AOP_TRUNCATED_PAGE;
>   }
>   


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-12-14 Thread Eric Ren


Hi,


On 12/05/2017 11:19 PM, BASSAGET Cédric wrote:

Hello
Retried from scratch; and still have an error when trying to bring up 
the second cluster :


root@LAB-virtm6:/# o2cb register-cluster ocfs2new
o2cb: Internal logic failure while registering cluster 'ocfs2new'

root@LAB-virtm6:/mnt/vol1_iscsi_san1# o2cb list-clusters
ocfs2
ocfs2new

Can anybody help me please ?


I don't know if o2cb stack can have multiple clusters on the same node. 
With pacemaker
stack, once we setup a pacemaker cluster stack, we can have multiple 
ocfs2 instance, i.e

mkfs and mount multiple ocfs2 FS on the same node.

Why do you want to setup multiple o2cb cluster on the same node?  I 
never know this need

of usage so far :)

Maybe, others can help on your question.

Thanks,
Eric




2017-11-29 8:28 GMT+01:00 BASSAGET Cédric 
>:


Hello,
I guess I did something wrong the first time.
I retried three times, and it worked three times. So I guess ocfs
1.6 and 1.8 are compatibles :)

Not it's time to set up a second ocfs2 cluster on my debian 9
server (ocfs 1.8), and I get this error when trying to mkfs.ocfs2 :

root@LAB-virtm6:~# mkfs.ocfs2 /dev/mapper/data_san_2
mkfs.ocfs2 1.8.4
On disk cluster (o2cb,ocfs2new,0) does not match the active
cluster (o2cb,ocfs2,0).
mkfs.ocfs2 will not be able to determine if this operation can be
done safely.
To skip this check, use --force or -F


The running cluster on this host is :
root@LAB-virtm6:~# o2cluster -r
o2cb,ocfs2,local

I'm trying to add an "ocfs2new" cluster :
root@LAB-virtm6:~# o2cb add-cluster ocfs2new
root@LAB-virtm6:~# o2cb add-node --ip 192.168.0.12 --port 
--number 1 ovfs2new LAB-virtm6
root@LAB-virtm6:~# o2cb add-node --ip 192.168.0.13 --port 
--number 2 ovfs2new LAB-virtm7

root@LAB-virtm6:~# o2cb list-clusters
ocfs2
ocfs2new

root@LAB-virtm6:~# o2cb cluster-status
Cluster 'ocfs2' is online

Even if I restart services or reboot, cluster 'ocfs2new' never
goes online.

What am I doing wrong ?



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-11-23 Thread Eric Ren


Hi,


On 11/23/2017 09:42 PM, BASSAGET Cédric wrote:

Tried to mkfs.ocfs2 on the debian 9 side (ocfs 1.8) :

root@LAB-virtm6:~# o2cluster -o /dev/mapper/data_san_2
o2cb,ocfs2new,local


Sorry, my fault. For mount failure, first thing is to check if your o2cb 
stack

is running via o2cb init service `rco2cb status` IIRC. Then, it's probably
a compatible issue: see the "FEATURE FLAGS" section in `man ocfs2`
for more instructions.



root@LAB-virtm6:~# mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2
*mount.ocfs2: Cluster name is invalid while trying to join the group*


Quote from `man ocfs2`:
"""
DETECTING FEATURE INCOMPATIBILITY

  Say one tries to mount a volume with an incompatible 
feature. What happens then? How does
  one detect the problem? How does one know the name of 
that incompatible feature?


  To begin with, one should look for error messages in 
dmesg(8). Mount  failures  that  are
  due to an incompatible feature will always result in an 
error message like the following:


  ERROR: couldn't mount because of unsupported optional 
features (200).


  Here  the  file  system is unable to mount the volume due 
to an unsupported optional fea-
  ture. That means that that feature is an Incompat 
feature.  By  referring  to  the  table
  above,  one can then deduce that the user failed to mount 
a volume with the xattr feature

  enabled. (The value in the error message is in hexadecimal.)
"""

Please show your dmesg.

Eric


root@LAB-virtm6:~# cat /etc/ocfs2/cluster.conf
node:
        ip_port = 
        ip_address = 192.168.0.11
        number = 1
        name = LAB-virtm5
        cluster = ocfs2new
node:
        ip_port = 
        ip_address = 192.168.0.12
        number = 2
        name = LAB-virtm6
        cluster = ocfs2new
node:
        ip_port = 
        ip_address = 192.168.0.13
        number = 3
        name = LAB-virtm7
        cluster = ocfs2new
cluster:
        node_count = 5
        name = ocfs2new




2017-11-23 14:02 GMT+01:00 BASSAGET Cédric 
<cedric.bassaget...@gmail.com <mailto:cedric.bassaget...@gmail.com>>:


Hi Eric
on debian 9 (ocfs2 v1.8) :

# o2cluster -o /dev/mapper/data_san_2
default

on debian 8 (ocfs2 v1.6), I don't have the "o2cluster" tool

:(



2017-11-23 13:41 GMT+01:00 Eric Ren <z...@suse.com
<mailto:z...@suse.com>>:

Hi,

On 11/23/2017 06:23 PM, BASSAGET Cédric wrote:

hello,
I'm trying to set-up an OCFS2 cluster between hosts running
debian8 and debian9

2*debian 8 : ocfs2-tools 1.6.4-3
1*debian 9 : ocfs2-tools 1.8.4-4

I created the FS on debian 8 node :
 mkfs.ocfs2 -L "ocfs2_new" -N 5 /dev/mapper/data_san_2

then mounted it without problem
mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2/

I mounted it on second debian 8 host too, without problem.

Trying to mount in on debian9 returns :
mount.ocfs2: Cluster name is invalid while trying to join the
group

I saw in "man mkfs.ocfs2" that debian9 version
has --cluster-stack and --cluster-name options.

Is this option mandatory on ocfs2 1.8 ? That would say that
ocfs2 1.6 and 1.8 are not compatible ? Nothing is said about
1.8 on https://oss.oracle.com/projects/ocfs2/'re
<https://oss.oracle.com/projects/ocfs2/>


Not sure if they're compatible. So can you try again with
--cluster-stack and --cluster-name?

# o2cluster -o /dev/sda1
pcmk,cluster,none

pcmk is the cluster-stack, cluster is the name.

Usually, these two option is optional, the tools will detect
the right cluster stack automatically.

Eric





___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-11-23 Thread Eric Ren


Hi,

On 11/23/2017 06:23 PM, BASSAGET Cédric wrote:

hello,
I'm trying to set-up an OCFS2 cluster between hosts running debian8 
and debian9


2*debian 8 : ocfs2-tools 1.6.4-3
1*debian 9 : ocfs2-tools 1.8.4-4

I created the FS on debian 8 node :
 mkfs.ocfs2 -L "ocfs2_new" -N 5 /dev/mapper/data_san_2

then mounted it without problem
mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2/

I mounted it on second debian 8 host too, without problem.

Trying to mount in on debian9 returns :
mount.ocfs2: Cluster name is invalid while trying to join the group

I saw in "man mkfs.ocfs2" that debian9 version has --cluster-stack and 
--cluster-name options.


Is this option mandatory on ocfs2 1.8 ? That would say that ocfs2 1.6 
and 1.8 are not compatible ? Nothing is said about 1.8 on 
https://oss.oracle.com/projects/ocfs2/'re




Not sure if they're compatible. So can you try again with 
--cluster-stack and --cluster-name?


# o2cluster -o /dev/sda1
pcmk,cluster,none

pcmk is the cluster-stack, cluster is the name.

Usually, these two option is optional, the tools will detect the right 
cluster stack automatically.


Eric
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] Adding new node to an online OCFS2 cluster

2017-11-09 Thread Eric Ren

Hi,

On 11/09/2017 06:56 PM, BASSAGET Cédric wrote:
> Hello,
> As I did not get help on users mailing list, I allow myself to post my 
> question here. Sorry if it's not the right place, but I can't find any 
> documentation.

Not at all. Actually, other mail lists are under very low traffic now. I 
think it's not bad to ask question on this devel list.

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] How to share my notes about ocfs2 source code?

2017-11-09 Thread Eric Ren

Hi Larry,

Awesome,  can you share the charts in this thread as attachments?


On 11/08/2017 04:48 PM, Larry Chen wrote:
> Hi everyone,
>
> Recently, I have read a lot of ocfs2 source code, and made several notes
> meanwhile. Since I found that there are not enough docs describing
> how ocfs2 inside works, I would like to share my notes, hoping that
> they could help other new beginners. The notes is made through draw.io
> (a good drawing website), and I think it does well in illustrating 
> internal
> data structure and there relationships.
> The examples could be found attached.
> But I have no idea where and how to put them and how to ask everyone 
> to review them.
> Could anyone give me some instructions?

You can just share the plain charts with us,  or write blogs or article 
on lwn.net like [1] :)

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_402287_=DwIDaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=rQ7ettRalN3Q_HzqipCl3kKWWYWWcQ4W-hpSpZKI1ao=KDlVI0uJ2wDTexISARaimd2Qlku8AOEFvbLNolO_VVk=

Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test result on 4.14.0-rc7-1.gdbf3e9b-vanilla kernel

2017-11-06 Thread Eric Ren

Hi,

On 11/07/2017 10:43 AM, Changwei Ge wrote:
> Hi Eric,
>
> On 2017/11/7 10:33, Eric Ren wrote:
>> Hi,
>>
>> The testing result against the recent kernel looks good. The attachments are
>> overall results. If the detailed logs are needed, please let me know.
>>
>> Pattern  Failed  Passed  Skipped Total
>> DiscontigBgMultiNode 0   4   0   4
>> DiscontigBgSingleNode0   5   0   5
>> MultipleNodes0   9   1   10
>> SingleNode   0   18  1   19
> 18 SingleNode cases were failed. Is that normal?

It's 18 passed. Sorry, the format is little messy, please see the 
attachment.

>
>> Notes:
>> - This testing only use blocksize=4096 and clustersize=32768 to reduce the 
>> time;
>> - The 2 skipped cases are on purpose: filecheck and lvb_torture.
>>
> I wonder why case - 'lvb_torture' is skipped on purpose?
We are using fsdlm, unfortunately this test case uses some user space 
API that fsdlm plugin doesn't implement :)

Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: mknod: fix recursive locking hung

2017-10-23 Thread Eric Ren

Hi,

On 10/18/2017 12:44 PM, Junxiao Bi wrote:
> On 10/18/2017 12:41 PM, Gang He wrote:
>> Hi Junxiao,
>>
>> The problem looks easy to reproduce?
>> Could you share the trigger script/code for this issue?
> Please run ocfs2-test multiple reflink test.
Hmm, strange, we do run ocfs2-test quite often.

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: the ip_alloc_sem should be taken in ocfs2_get_block()

2017-10-22 Thread Eric Ren

Hi,

On 10/20/2017 05:03 PM, alex chen wrote:
> The ip_alloc_sem should be taken in ocfs2_get_block() when reading file
> in DIRECT mode to prevent concurrent access to extent tree with
> ocfs2_dio_end_io_write(), which may cause BUGON in
> ocfs2_get_clusters_nocache()->BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))

This maybe seem a obvious fix, but it would be great if you can
write a more detailed commit log, like paste the crash backtrace
here so that people can pick this fix easily when they see the same issue.

Thanks,
Eric
>
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
>
> ---
>   fs/ocfs2/aops.c | 21 +++--
>   1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 88a31e9..5cb939f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -134,6 +134,19 @@ static int ocfs2_symlink_get_block(struct inode *inode, 
> sector_t iblock,
>   return err;
>   }
>
> +static int ocfs2_get_block_lock(struct inode *inode, sector_t iblock,
> + struct buffer_head *bh_result, int create)
> +{
> + int ret;
> + struct ocfs2_inode_info *oi = OCFS2_I(inode);
> +
> + down_read(>ip_alloc_sem);
> + ret = ocfs2_get_block(inode, iblock, bh_result, create);
> + up_read(>ip_alloc_sem);
> +
> + return ret;
> +}
> +
>   int ocfs2_get_block(struct inode *inode, sector_t iblock,
>   struct buffer_head *bh_result, int create)
>   {
> @@ -2154,12 +2167,8 @@ static int ocfs2_dio_get_block(struct inode *inode, 
> sector_t iblock,
>* while file size will be changed.
>*/
>   if (pos + total_len <= i_size_read(inode)) {
> - down_read(>ip_alloc_sem);
>   /* This is the fast path for re-write. */
> - ret = ocfs2_get_block(inode, iblock, bh_result, create);
> -
> - up_read(>ip_alloc_sem);
> -
> + ret = ocfs2_get_block_lock(inode, iblock, bh_result, create);
>   if (buffer_mapped(bh_result) &&
>   !buffer_new(bh_result) &&
>   ret == 0)
> @@ -2424,7 +2433,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   return 0;
>
>   if (iov_iter_rw(iter) == READ)
> - get_block = ocfs2_get_block;
> + get_block = ocfs2_get_block_lock;
>   else
>   get_block = ocfs2_dio_get_block;
>
> -- 1.9.5.msysgit.1
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test supports Python 3

2017-09-27 Thread Eric Ren

Hi,

> As you know, some Linux distributions(e.g. SUSE Enterprise Linux 15) will 
> introduce Python3 as the default,
> our Python scripts in ocfs2-test still use Python2, we will have to do proper 
> modifications to migration to Python3 (Larry Chen has worked on this 
> investigation).
> But, the problem is how to maintain the ocfs2-test code between Python2 and 
> Python3? since some existing Linux distributions(include their further SP) 
> use Python2.
> there are two options,
> 1) one code branch, in which the code is compatible with Python2 and Python3, 
> but I feel it is difficult after I talked with Larry Chen.
> 2) two code branches, one branch is for Python2, the other branch is for 
> Python3.
> what are your thoughts? please let us know, then we can select a way to work 
> on this task.
I think we can make a "snapshot" branch of current master, i.e. name it 
as "snapshot-2017.9", and don't change that branch anymore.
For users who want to test ocfs2 on OS with python2, they can use it 
without problems, and we should document this in README.txt.

For the python3 and other changes,  we can merge them into master 
branch, making master move on forward...

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-22 Thread Eric Ren

Hi Andrew,

On 06/23/2017 05:24 AM, Andrew Morton wrote:
> On Thu, 22 Jun 2017 14:10:38 +0800 Joseph Qi <jiangqi...@gmail.com> wrote:
>
>> Looks good.
>> Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
> Should this fix be backported into -stable kernels?

No, I think, because the previous patches that this one needs to be on,

- commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid 
recursive cluster lock").
- commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs 
entry points")

is not in --stable too.

I don't know if it's possible to make them all into stable.

Thanks,
Eric

>
>> On 17/6/22 09:47, Eric Ren wrote:
>>> Another deadlock path caused by recursive locking is reported.
>>> This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
>>> take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
>>> have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
>>> taking inode lock at vfs entry points"). Yes, we intend to fix this
>>> kind of case in incremental way, because it's hard to find out all
>>> possible paths at once.
>>>
>>> This one can be reproduced like this. On node1, cp a large file from
>>> home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
>>> Both nodes will hang up there. The backtraces:
>>>
>>> On node1:
>>> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
>>> [] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
>>> [] generic_perform_write+0xa9/0x180
>>> [] __generic_file_write_iter+0x1aa/0x1d0
>>> [] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
>>> [] __vfs_write+0xc3/0x130
>>> [] vfs_write+0xb1/0x1a0
>>> [] SyS_write+0x46/0xa0
>>>
>>> On node2:
>>> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
>>> [] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
>>> [] ocfs2_set_acl+0x22d/0x260 [ocfs2]
>>> [] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
>>> [] set_posix_acl+0x75/0xb0
>>> [] posix_acl_xattr_set+0x49/0xa0
>>> [] __vfs_setxattr+0x69/0x80
>>> [] __vfs_setxattr_noperm+0x72/0x1a0
>>> [] vfs_setxattr+0xa7/0xb0
>>> [] setxattr+0x12d/0x190
>>> [] path_setxattr+0x9f/0xb0
>>> [] SyS_setxattr+0x14/0x20
>>>
>>> Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
>>> exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
>>> logic to avoid recursive cluster lock").
>>>
>>> Changes since v1:
>>> - Revised git commit description style in commit log.
>>>
>>> Reported-by: Thomas Voegtle <t...@lio96.de>
>>> Tested-by: Thomas Voegtle <t...@lio96.de>
>>> Signed-off-by: Eric Ren <z...@suse.com>
>>> ---
>>>   fs/ocfs2/dlmglue.c |  4 
>>>   fs/ocfs2/xattr.c   | 23 +--
>>>   2 files changed, 17 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 3b7c937..4689940 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
>>> struct ocfs2_lock_res *lockres;
>>>   
>>> lockres = _I(inode)->ip_inode_lockres;
>>> +   /* had_lock means that the currect process already takes the cluster
>>> +* lock previously. If had_lock is 1, we have nothing to do here, and
>>> +* it will get unlocked where we got the lock.
>>> +*/
>>> if (!had_lock) {
>>> ocfs2_remove_holder(lockres, oh);
>>> ocfs2_inode_unlock(inode, ex);
>>> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
>>> index 3c5384d..f70c377 100644
>>> --- a/fs/ocfs2/xattr.c
>>> +++ b/fs/ocfs2/xattr.c
>>> @@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
>>>void *buffer,
>>>size_t buffer_size)
>>>   {
>>> -   int ret;
>>> +   int ret, had_lock;
>>> struct buffer_head *di_bh = NULL;
>>> +   struct ocfs2_lock_holder oh;
>>>   
>>> -   ret = ocfs2_inode_lock(inode, _bh, 0);
>>> -   if (ret < 0) {
>>> -   mlog_errno(ret);
>>> -   return ret;
>>> +   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, )

[Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-21 Thread Eric Ren

Another deadlock path caused by recursive locking is reported.
This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
taking inode lock at vfs entry points"). Yes, we intend to fix this
kind of case in incremental way, because it's hard to find out all
possible paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
Both nodes will hang up there. The backtraces:

On node1:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
[] generic_perform_write+0xa9/0x180
[] __generic_file_write_iter+0x1aa/0x1d0
[] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
[] __vfs_write+0xc3/0x130
[] vfs_write+0xb1/0x1a0
[] SyS_write+0x46/0xa0

On node2:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
[] ocfs2_set_acl+0x22d/0x260 [ocfs2]
[] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
[] set_posix_acl+0x75/0xb0
[] posix_acl_xattr_set+0x49/0xa0
[] __vfs_setxattr+0x69/0x80
[] __vfs_setxattr_noperm+0x72/0x1a0
[] vfs_setxattr+0xa7/0xb0
[] setxattr+0x12d/0x190
[] path_setxattr+0x9f/0xb0
[] SyS_setxattr+0x14/0x20

Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
logic to avoid recursive cluster lock").

Changes since v1:
- Revised git commit description style in commit log.

Reported-by: Thomas Voegtle <t...@lio96.de>
Tested-by: Thomas Voegtle <t...@lio96.de>
Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c |  4 
 fs/ocfs2/xattr.c   | 23 +--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 3b7c937..4689940 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
struct ocfs2_lock_res *lockres;
 
lockres = _I(inode)->ip_inode_lockres;
+   /* had_lock means that the currect process already takes the cluster
+* lock previously. If had_lock is 1, we have nothing to do here, and
+* it will get unlocked where we got the lock.
+*/
if (!had_lock) {
ocfs2_remove_holder(lockres, oh);
ocfs2_inode_unlock(inode, ex);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3c5384d..f70c377 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
   void *buffer,
   size_t buffer_size)
 {
-   int ret;
+   int ret, had_lock;
struct buffer_head *di_bh = NULL;
+   struct ocfs2_lock_holder oh;
 
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   mlog_errno(ret);
-   return ret;
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0) {
+   mlog_errno(had_lock);
+   return had_lock;
}
down_read(_I(inode)->ip_xattr_sem);
ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
 name, buffer, buffer_size);
up_read(_I(inode)->ip_xattr_sem);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
 
brelse(di_bh);
 
@@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
 {
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
-   int ret, credits, ref_meta = 0, ref_credits = 0;
+   int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct inode *tl_inode = osb->osb_tl_inode;
struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
struct ocfs2_refcount_tree *ref_tree = NULL;
+   struct ocfs2_lock_holder oh;
 
struct ocfs2_xattr_info xi = {
.xi_name_index = name_index,
@@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct inode *inode,
return -ENOMEM;
}
 
-   ret = ocfs2_inode_lock(inode, _bh, 1);
-   if (ret < 0) {
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 1, );
+   if (had_lock < 0) {
+   ret = had_lock;
mlog_errno(ret);
goto cleanup_nolock;
}
@@ -3670,7 +3673,7 @@ int ocfs2_xattr_set(struct inode *inode,
if (ret)
mlog_errno(ret);
}
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
 clea

Re: [Ocfs2-devel] [PATCH] ocfs2: get rid of ocfs2_is_o2cb_active function

2017-06-21 Thread Eric Ren

On 05/22/17 16:17, Gang He wrote:
> This patch is used to get rid of ocfs2_is_o2cb_active() function,
> Why? First, we had the similar functions to identify which cluster
> stack is being used via osb->osb_cluster_stack. Second, the current
> implementation of ocfs2_is_o2cb_active() function is not total safe,
> base on the design of stackglue, we need to get ocfs2_stack_lock lock
> before using ocfs2_stack related data structures, and that
> active_stack pointer can be NULL in case mount failure.
>
> Signed-off-by: Gang He <g...@suse.com>
Looks good.
Reviewed-by: Eric Ren <z...@suse.com>

Eric

> ---
>   fs/ocfs2/dlmglue.c   | 2 +-
>   fs/ocfs2/stackglue.c | 6 --
>   fs/ocfs2/stackglue.h | 3 ---
>   3 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 3b7c937..a54196a 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -3409,7 +3409,7 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
> *osb,
>* we can recover correctly from node failure. Otherwise, we may get
>* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID being set.
>*/
> - if (!ocfs2_is_o2cb_active() &&
> + if (ocfs2_userspace_stack(osb) &&
>   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
>   lvb = 1;
>   
> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> index 8203590..52c07346b 100644
> --- a/fs/ocfs2/stackglue.c
> +++ b/fs/ocfs2/stackglue.c
> @@ -48,12 +48,6 @@
>*/
>   static struct ocfs2_stack_plugin *active_stack;
>   
> -inline int ocfs2_is_o2cb_active(void)
> -{
> - return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
> -}
> -EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
> -
>   static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
>   {
>   struct ocfs2_stack_plugin *p;
> diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
> index e3036e1..f2dce10 100644
> --- a/fs/ocfs2/stackglue.h
> +++ b/fs/ocfs2/stackglue.h
> @@ -298,9 +298,6 @@ int ocfs2_plock(struct ocfs2_cluster_connection *conn, 
> u64 ino,
>   int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
>   void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
>   
> -/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
> -int ocfs2_is_o2cb_active(void);
> -
>   extern struct kset *ocfs2_kset;
>   
>   #endif  /* STACKGLUE_H */



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-21 Thread Eric Ren

Another deadlock path caused by recursive locking is reported.
This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
taking inode lock at vfs entry points"). Yes, we intend to fix this
kind of case in incremental way, because it's hard to find out all
possible paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
Both nodes will hang up there. The backtraces:

On node1:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
[] generic_perform_write+0xa9/0x180
[] __generic_file_write_iter+0x1aa/0x1d0
[] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
[] __vfs_write+0xc3/0x130
[] vfs_write+0xb1/0x1a0
[] SyS_write+0x46/0xa0

On node2:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
[] ocfs2_set_acl+0x22d/0x260 [ocfs2]
[] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
[] set_posix_acl+0x75/0xb0
[] posix_acl_xattr_set+0x49/0xa0
[] __vfs_setxattr+0x69/0x80
[] __vfs_setxattr_noperm+0x72/0x1a0
[] vfs_setxattr+0xa7/0xb0
[] setxattr+0x12d/0x190
[] path_setxattr+0x9f/0xb0
[] SyS_setxattr+0x14/0x20

Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to
avoid recursive cluster lock").

Reported-by:Thomas Voegtle <t...@lio96.de>
Tested-by:  Thomas Voegtle <t...@lio96.de>
Signed-off-by:  Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c |  4 
 fs/ocfs2/xattr.c   | 23 +--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 3b7c937..4689940 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
struct ocfs2_lock_res *lockres;
 
lockres = _I(inode)->ip_inode_lockres;
+   /* had_lock means that the currect process already takes the cluster
+* lock previously. If had_lock is 1, we have nothing to do here, and
+* it will get unlocked where we got the lock.
+*/
if (!had_lock) {
ocfs2_remove_holder(lockres, oh);
ocfs2_inode_unlock(inode, ex);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3c5384d..f70c377 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
   void *buffer,
   size_t buffer_size)
 {
-   int ret;
+   int ret, had_lock;
struct buffer_head *di_bh = NULL;
+   struct ocfs2_lock_holder oh;
 
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   mlog_errno(ret);
-   return ret;
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0) {
+   mlog_errno(had_lock);
+   return had_lock;
}
down_read(_I(inode)->ip_xattr_sem);
ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
 name, buffer, buffer_size);
up_read(_I(inode)->ip_xattr_sem);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
 
brelse(di_bh);
 
@@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
 {
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
-   int ret, credits, ref_meta = 0, ref_credits = 0;
+   int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct inode *tl_inode = osb->osb_tl_inode;
struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
struct ocfs2_refcount_tree *ref_tree = NULL;
+   struct ocfs2_lock_holder oh;
 
struct ocfs2_xattr_info xi = {
.xi_name_index = name_index,
@@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct inode *inode,
return -ENOMEM;
}
 
-   ret = ocfs2_inode_lock(inode, _bh, 1);
-   if (ret < 0) {
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 1, );
+   if (had_lock < 0) {
+   ret = had_lock;
mlog_errno(ret);
goto cleanup_nolock;
}
@@ -3670,7 +3673,7 @@ int ocfs2_xattr_set(struct inode *inode,
if (ret)
mlog_errno(ret);
}
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
 cleanup_nolock:
brelse(di_bh);
brelse(xbs.xattr_bh);
-- 
2.10.2


__

Re: [Ocfs2-devel] deadlock with setfacl

2017-06-21 Thread Eric Ren


Hi Thomas,

I'm attaching a patch for the issue you reported. I've tested myself.
Could you please also try it out?

If it's OK, I'll submit a formal patch later.

Thanks,
Eric

On 06/20/2017 04:38 PM, Eric Ren wrote:

Hi!

Thanks for reporting! I will get to this issue quickly.

Eric

Sent from my iPhone


On 20 Jun 2017, at 16:02, Thomas Voegtle <t...@lio96.de> wrote:


Hello,


We see a deadlock with setfacl on 4.4.70 and on 4.12-rc5, too.

node1: copies a big file from /home/user to the ocfs2 mountpoint
node2: runs setfacl on that file in the ocfs2 mountpoint while cp still running
=> both jobs never end.


When we revert
743b5f1434f57a147226c747fe228cadeb7b05ed ocfs2: take inode lock in
ocfs2_iop_set/get_acl()
and the other two follow-up fixes (5ee0fbd50fdf1c132 and b891fa5024a95c77)
we see no deadlock anymore.

commit b891fa5024a95c77 fixed it for getacl (we can confirm this) but not
for setacl, as we encounter?

Reference:
https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html

Thanks,

Thomas



This gets printed in the dmesg on node1:

[  484.345226] INFO: task cp:10633 blocked for more than 120 seconds.
[  484.345230]   Not tainted 4.12.0-rc5 #1
[  484.345230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  484.345232] cp  D0 10633   5594 0x
[  484.345235] Call Trace:
[  484.345295]  __schedule+0x2e8/0x5f7
[  484.345298]  schedule+0x35/0x80
[  484.345300]  schedule_timeout+0x1a7/0x230
[  484.345326]  ? check_preempt_curr+0x61/0x90
[  484.345358]  ? ocfs2_control_read+0x60/0x60 [ocfs2_stack_user]
[  484.345360]  wait_for_completion+0x9b/0x100
[  484.345361]  ? try_to_wake_up+0x250/0x250
[  484.345447]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
[  484.345463]  ? radix_tree_tag_set+0x7e/0xf0
[  484.345475]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
[  484.345486]  ? ocfs2_wake_downconvert_thread+0x4d/0x60 [ocfs2]
[  484.345497]  ocfs2_write_begin+0x4a/0x190 [ocfs2]
[  484.345509]  generic_perform_write+0xa7/0x190
[  484.345516]  __generic_file_write_iter+0x191/0x1e0
[  484.345528]  ocfs2_file_write_iter+0x1a5/0x490 [ocfs2]
[  484.345541]  ? ext4_file_read_iter+0xae/0xf0
[  484.345550]  new_sync_write+0xc0/0x100
[  484.345552]  __vfs_write+0x27/0x40
[  484.345553]  vfs_write+0xc4/0x1b0
[  484.34]  SyS_write+0x4a/0xa0
[  484.345561]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  484.345563] RIP: 0033:0x7fb5111a8150
[  484.345564] RSP: 002b:7fff125140d8 EFLAGS: 0246 ORIG_RAX: 
0001
[  484.345566] RAX: ffda RBX: 0001 RCX: 7fb5111a8150
[  484.345567] RDX: 0002 RSI: 7fb511c3f000 RDI: 0004
[  484.345678] RBP: 7fff125141d0 R08:  R09: 7fff12515c82
[  484.345680] R10: 7fff12513e70 R11: 0246 R12: 004030b0
[  484.345681] R13: 7fff12514ca0 R14:  R15: 


This gets printed in the dmesg on node2:

[  484.483726] INFO: task setfacl:10279 blocked for more than 120 seconds.
[  484.483729]   Not tainted 4.12.0-rc5 #1
[  484.483730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  484.483731] setfacl D0 10279  10278 0x
[  484.483734] Call Trace:
[  484.483793]  __schedule+0x2e8/0x5f7
[  484.483797]  schedule+0x35/0x80
[  484.483799]  schedule_timeout+0x1a7/0x230
[  484.483825]  ? default_wake_function+0xd/0x10
[  484.483832]  ? autoremove_wake_function+0x11/0x40
[  484.483834]  ? __wake_up_common+0x4f/0x80
[  484.483835]  wait_for_completion+0x9b/0x100
[  484.483837]  ? try_to_wake_up+0x250/0x250
[  484.483973]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
[  484.483993]  ? radix_tree_lookup_slot+0x13/0x30
[  484.484005]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
[  484.484018]  ocfs2_xattr_set+0x143/0x740 [ocfs2]
[  484.484035]  ? jbd2_journal_cancel_revoke+0xbf/0xf0
[  484.484049]  ocfs2_set_acl+0x177/0x190 [ocfs2]
[  484.484061]  ? ocfs2_inode_lock_tracker+0xee/0x180 [ocfs2]
[  484.484074]  ocfs2_iop_set_acl+0x60/0xa0 [ocfs2]
[  484.484084]  set_posix_acl+0x84/0xc0
[  484.484090]  posix_acl_xattr_set+0x4c/0xb0
[  484.484099]  __vfs_setxattr+0x71/0x90
[  484.484102]  __vfs_setxattr_noperm+0x70/0x1b0
[  484.484104]  vfs_setxattr+0xae/0xb0
[  484.484106]  setxattr+0x160/0x190
[  484.484112]  ? strncpy_from_user+0x43/0x140
[  484.484118]  ? getname_flags.part.41+0x56/0x1c0
[  484.484121]  ? __mnt_want_write+0x4d/0x60
[  484.484123]  path_setxattr+0x85/0xb0
[  484.484125]  SyS_setxattr+0xf/0x20
[  484.484131]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  484.484133] RIP: 0033:0x7f203f2b23f9
[  484.484134] RSP: 002b:7ffd7d8585d8 EFLAGS: 0246 ORIG_RAX: 
00bc
[  484.484136] RAX: ffda RBX: 01d9d3a0 RCX: 7f203f2b23f9
[  484.484137] RDX: 01d9d5a0 RSI: 7f203f782b5f RDI: 7ffd7d858890
[  484.484137] RBP:  R08: 000

Re: [Ocfs2-devel] deadlock with setfacl

2017-06-20 Thread Eric Ren

Hi!

Thanks for reporting! I will get to this issue quickly.

Eric

Sent from my iPhone

> On 20 Jun 2017, at 16:02, Thomas Voegtle  wrote:
> 
> 
> Hello,
> 
> 
> We see a deadlock with setfacl on 4.4.70 and on 4.12-rc5, too.
> 
> node1: copies a big file from /home/user to the ocfs2 mountpoint
> node2: runs setfacl on that file in the ocfs2 mountpoint while cp still 
> running
> => both jobs never end.
> 
> 
> When we revert
> 743b5f1434f57a147226c747fe228cadeb7b05ed ocfs2: take inode lock in
> ocfs2_iop_set/get_acl()
> and the other two follow-up fixes (5ee0fbd50fdf1c132 and b891fa5024a95c77)
> we see no deadlock anymore.
> 
> commit b891fa5024a95c77 fixed it for getacl (we can confirm this) but not
> for setacl, as we encounter?
> 
> Reference:
> https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html
> 
> Thanks,
> 
> Thomas
> 
> 
> 
> This gets printed in the dmesg on node1:
> 
> [  484.345226] INFO: task cp:10633 blocked for more than 120 seconds.
> [  484.345230]   Not tainted 4.12.0-rc5 #1
> [  484.345230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  484.345232] cp  D0 10633   5594 0x
> [  484.345235] Call Trace:
> [  484.345295]  __schedule+0x2e8/0x5f7
> [  484.345298]  schedule+0x35/0x80
> [  484.345300]  schedule_timeout+0x1a7/0x230
> [  484.345326]  ? check_preempt_curr+0x61/0x90
> [  484.345358]  ? ocfs2_control_read+0x60/0x60 [ocfs2_stack_user]
> [  484.345360]  wait_for_completion+0x9b/0x100
> [  484.345361]  ? try_to_wake_up+0x250/0x250
> [  484.345447]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
> [  484.345463]  ? radix_tree_tag_set+0x7e/0xf0
> [  484.345475]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
> [  484.345486]  ? ocfs2_wake_downconvert_thread+0x4d/0x60 [ocfs2]
> [  484.345497]  ocfs2_write_begin+0x4a/0x190 [ocfs2]
> [  484.345509]  generic_perform_write+0xa7/0x190
> [  484.345516]  __generic_file_write_iter+0x191/0x1e0
> [  484.345528]  ocfs2_file_write_iter+0x1a5/0x490 [ocfs2]
> [  484.345541]  ? ext4_file_read_iter+0xae/0xf0
> [  484.345550]  new_sync_write+0xc0/0x100
> [  484.345552]  __vfs_write+0x27/0x40
> [  484.345553]  vfs_write+0xc4/0x1b0
> [  484.34]  SyS_write+0x4a/0xa0
> [  484.345561]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  484.345563] RIP: 0033:0x7fb5111a8150
> [  484.345564] RSP: 002b:7fff125140d8 EFLAGS: 0246 ORIG_RAX: 
> 0001
> [  484.345566] RAX: ffda RBX: 0001 RCX: 
> 7fb5111a8150
> [  484.345567] RDX: 0002 RSI: 7fb511c3f000 RDI: 
> 0004
> [  484.345678] RBP: 7fff125141d0 R08:  R09: 
> 7fff12515c82
> [  484.345680] R10: 7fff12513e70 R11: 0246 R12: 
> 004030b0
> [  484.345681] R13: 7fff12514ca0 R14:  R15: 
> 
> 
> 
> This gets printed in the dmesg on node2:
> 
> [  484.483726] INFO: task setfacl:10279 blocked for more than 120 seconds.
> [  484.483729]   Not tainted 4.12.0-rc5 #1
> [  484.483730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  484.483731] setfacl D0 10279  10278 0x
> [  484.483734] Call Trace:
> [  484.483793]  __schedule+0x2e8/0x5f7
> [  484.483797]  schedule+0x35/0x80
> [  484.483799]  schedule_timeout+0x1a7/0x230
> [  484.483825]  ? default_wake_function+0xd/0x10
> [  484.483832]  ? autoremove_wake_function+0x11/0x40
> [  484.483834]  ? __wake_up_common+0x4f/0x80
> [  484.483835]  wait_for_completion+0x9b/0x100
> [  484.483837]  ? try_to_wake_up+0x250/0x250
> [  484.483973]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
> [  484.483993]  ? radix_tree_lookup_slot+0x13/0x30
> [  484.484005]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
> [  484.484018]  ocfs2_xattr_set+0x143/0x740 [ocfs2]
> [  484.484035]  ? jbd2_journal_cancel_revoke+0xbf/0xf0
> [  484.484049]  ocfs2_set_acl+0x177/0x190 [ocfs2]
> [  484.484061]  ? ocfs2_inode_lock_tracker+0xee/0x180 [ocfs2]
> [  484.484074]  ocfs2_iop_set_acl+0x60/0xa0 [ocfs2]
> [  484.484084]  set_posix_acl+0x84/0xc0
> [  484.484090]  posix_acl_xattr_set+0x4c/0xb0
> [  484.484099]  __vfs_setxattr+0x71/0x90
> [  484.484102]  __vfs_setxattr_noperm+0x70/0x1b0
> [  484.484104]  vfs_setxattr+0xae/0xb0
> [  484.484106]  setxattr+0x160/0x190
> [  484.484112]  ? strncpy_from_user+0x43/0x140
> [  484.484118]  ? getname_flags.part.41+0x56/0x1c0
> [  484.484121]  ? __mnt_want_write+0x4d/0x60
> [  484.484123]  path_setxattr+0x85/0xb0
> [  484.484125]  SyS_setxattr+0xf/0x20
> [  484.484131]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  484.484133] RIP: 0033:0x7f203f2b23f9
> [  484.484134] RSP: 002b:7ffd7d8585d8 EFLAGS: 0246 ORIG_RAX: 
> 00bc
> [  484.484136] RAX: ffda RBX: 01d9d3a0 RCX: 
> 7f203f2b23f9
> [  484.484137] RDX: 01d9d5a0 RSI: 7f203f782b5f RDI: 
> 7ffd7d858890
> [  484.484137] RBP:  R08:  R09:

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io. review

2017-06-02 Thread Eric Ren

Hi Guanghui,

Please sort out your mail more orderly. It looks really messy! So, rework your 
mail by asking
question yourself like:

- What is the problem you are facing?
Looks like a BUG_ON() is triggered. but which BUG_ON()? the backtrace?  How 
this can be
reroduced?

- What help or answer do you hope for?
You didn't ask any question below!


On 05/26/2017 11:46 AM, Zhangguanghui wrote:
> This patch replace that function ocfs2_direct_IO_get_blocks with
Which patch? Don't analyze code before telling the problem.

>
> this function ocfs2_get_blocks  in ocfs2_direct_IO, and remove the  
> ip_alloc_sem.
>
> but i think ip_alloc_sem is still needed because protect  allocation changes 
> is very correct.
"still needed" - so, which commit dropped it?

>
> Now, BUG_ON have been tiggered  in the process of testing direct-io.
>
> Comments and questions are, as always, welcome. Thanks

Comments on what?

>
>
> As wangww631 described
A mail thread link is useful for people to know the discussion and background.

>
> In ocfs2, ip_alloc_sem is used to protect allocation changes on the node.
> In direct IO, we add ip_alloc_sem to protect date consistent between
> direct-io and ocfs2_truncate_file race (buffer io use ip_alloc_sem
> already).  Although inode->i_mutex lock is used to avoid concurrency of
> above situation, i think ip_alloc_sem is still needed because protect
> allocation changes is significant.
>
> Other filesystem like ext4 also uses rw_semaphore to protect data
> consistent between get_block-vs-truncate race by other means, So
> ip_alloc_sem in ocfs2 direct io is needed.
>
>
> Date: Fri, 11 Sep 2015 16:19:18 +0800
> From: Ryan Ding 
> Subject: [Ocfs2-devel] [PATCH 7/8] ocfs2: fix sparse file & data
>  orderingissue in direct io.
You email subject is almost the same as this patch, which brings confusion...

> To: ocfs2-devel@oss.oracle.com
> Cc: mfas...@suse.de
> Message-ID: <1441959559-29947-8-git-send-email-ryan.d...@oracle.com>
Don't copy & paste patch content, only making you mail too long to scare reader 
away.

Eric

>
> There are mainly 3 issue in the direct io code path after commit 24c40b329e03 
> ("ocfs2: implement ocfs2_direct_IO_write"):
>* Do not support sparse file.
>* Do not support data ordering. eg: when write to a file hole, it will 
> alloc
>  extent first. If system crashed before io finished, data will corrupt.
>* Potential risk when doing aio+dio. The -EIOCBQUEUED return value is 
> likely
>  to be ignored by ocfs2_direct_IO_write().
>
> To resolve above problems, re-design direct io code with following ideas:
>* Use buffer io to fill in holes. And this will make better performance 
> also.
>* Clear unwritten after direct write finished. So we can make sure meta 
> data
>  changes after data write to disk. (Unwritten extent is invisible to user,
>  from user's view, meta data is not changed when allocate an unwritten
>  extent.)
>* Clear ocfs2_direct_IO_write(). Do all ending work in end_io.
>
> This patch has passed 
> fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4
> test cases of ltp.
>
> Signed-off-by: Ryan Ding 
> Reviewed-by: Junxiao Bi 
> cc: Joseph Qi 
> ---
>   fs/ocfs2/aops.c |  851 
> ++-
>   1 files changed, 342 insertions(+), 509 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index b4ec600..4bb9921 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -499,152 +499,6 @@ bail:
>  return status;
>   }
>
> -/*
> - * TODO: Make this into a generic get_blocks function.
> - *
> - * From do_direct_io in direct-io.c:
> - *  "So what we do is to permit the ->get_blocks function to populate
> - *   bh.b_size with the size of IO which is permitted at this offset and
> - *   this i_blkbits."
> - *
> - * This function is called directly from get_more_blocks in direct-io.c.
> - *
> - * called like this: dio->get_blocks(dio->inode, fs_startblk,
> - * fs_count, map_bh, dio->rw == WRITE);
> - */
> -static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock,
> -struct buffer_head *bh_result, int 
> create)
> -{
> -   int ret;
> -   u32 cpos = 0;
> -   int alloc_locked = 0;
> -   u64 p_blkno, inode_blocks, contig_blocks;
> -   unsigned int ext_flags;
> -   unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits;
> -   unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits;
> -   unsigned long len = bh_result->b_size;
> -   unsigned int clusters_to_alloc = 0, contig_clusters = 0;
> -
> -   cpos = ocfs2_blocks_to_clusters(inode->i_sb, iblock);
> -
> -   /* This function won't even be called if the request isn't all
> -* nicely aligned and of the right size, so there's no need
> -* for us

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a static checker warning

2017-05-24 Thread Eric Ren

On 05/23/2017 01:17 PM, Gang He wrote:
> This patch will fix a static code checker warning, which looks
> like below,
> fs/ocfs2/inode.c:179 ocfs2_iget()
> warn: passing zero to 'ERR_PTR'
>
> this warning was caused by the
> commit d56a8f32e4c6 ("ocfs2: check/fix inode block for online file check").
> after apply this patch, the error return value will not be NULL(zero).
>
> Signed-off-by: Gang He <g...@suse.com>
Looks good to me.

Reviewed-by: Eric Ren <z...@suse.com>
> ---
>   fs/ocfs2/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 382401d..1a1e007 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -136,7 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
> blkno)
>   struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>int sysfile_type)
>   {
> - int rc = 0;
> + int rc = -ESTALE;
>   struct inode *inode = NULL;
>   struct super_block *sb = osb->sb;
>   struct ocfs2_find_inode_args args;



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] Fwd: OCFS2 test report against kernel 4.11.0-rc8-2.g540c429-vanilla

2017-05-03 Thread Eric Ren


FYI,


The testing result is good. Refer to the attached testing logs for more 
information.


run-dev-test
*BUILD SUCCESS*
Build URL   http://147.2.207.231:8080/job/zren-testing/job/run-dev-test/81/
Work Space  
http://147.2.207.231:8080/job/zren-testing/job/run-dev-test//ws/81
Build Log   
http://147.2.207.231:8080/job/zren-testing/job/run-dev-test/81//console
Project:run-dev-test
Date of build:  Wed, 03 May 2017 09:38:08 +0800
Build duration: 17 hr
Build cause:Started by user eric
Build description:  
Built on:   HA-236


 Health Report

W   Description Score
Test Result: 0 tests failing out of a total of 38 tests.100
Build stability: No recent builds failed.   100


 Tests Reports




   Test Trend

[Test result trend chart]


   JUnit Tests

Package Failed  Passed  Skipped Total
DiscontigBgMultiNode0   4   0   *4*
DiscontigBgSingleNode   0   5   0   *5*
MultipleNodes   0   9   1   *10*
SingleNode  0   18  1   *19*


 Changes



No Changes




single_run.log
Description: Binary data


multiple-run-x86_64-2017-05-03-12-47-17.log
Description: Binary data
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 02/17] Single Run: kernel building is little broken now

2017-03-13 Thread Eric Ren

Hi Junxiao,

On 03/13/2017 04:12 PM, Junxiao Bi wrote:
> On 12/13/2016 01:29 PM, Eric Ren wrote:
>> Only check kernel source if we specify "buildkernel" test case.
>> The original kernel source web-link cannot be reached,
>> so give a new link instead but the md5sum check is missing
>> now.
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   programs/python_common/single_run-WIP.sh | 56 
>> 
>>   1 file changed, 28 insertions(+), 28 deletions(-)
>>
>> diff --git a/programs/python_common/single_run-WIP.sh 
>> b/programs/python_common/single_run-WIP.sh
>> index fe0056c..61008d8 100755
>> --- a/programs/python_common/single_run-WIP.sh
>> +++ b/programs/python_common/single_run-WIP.sh
>> @@ -20,9 +20,9 @@ WGET=`which wget`
>>   WHOAMI=`which whoami`
>>   SED=`which sed`
>>   
>> -DWNLD_PATH="http://oss.oracle.com/~smushran/ocfs2-test;
>> -KERNEL_TARBALL="linux-kernel.tar.gz"
>> -KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
>> +DWNLD_PATH="https://cdn.kernel.org/pub/linux/kernel/v3.x/;
>> +KERNEL_TARBALL="linux-3.2.80.tar.xz"
>> +#KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
> Can we compute the md5sum manually and put it here?

OK.

Thanks for your review.

Thanks,
Eric
>
> Thanks,
> Junxiao.
>
>>   USERID=`${WHOAMI}`
>>   
>>   DEBUGFS_BIN="${SUDO} `which debugfs.ocfs2`"
>> @@ -85,7 +85,7 @@ get_bits()
>>   # get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> $KERNEL_TARBALL_CHECK
>>   get_kernel_source()
>>   {
>> -if [ "$#" -lt "4" ]; then
>> +if [ "$#" -lt "3" ]; then
>>  ${ECHO} "Error in get_kernel_source()"
>>  exit 1
>>  fi
>> @@ -93,18 +93,18 @@ get_kernel_source()
>>  logdir=$1
>>  dwnld_path=$2
>>  kernel_tarball=$3
>> -kernel_tarball_check=$4
>> +#kernel_tarball_check=$4
>>   
>>  cd ${logdir}
>>   
>>  outlog=get_kernel_source.log
>>   
>> -${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
>> -if [ $? -ne 0 ]; then
>> -${ECHO} "ERROR downloading 
>> ${dwnld_path}/${kernel_tarball_check}"
>> -cd -
>> -exit 1
>> -fi
>> +#   ${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
>> +#   if [ $? -ne 0 ]; then
>> +#   ${ECHO} "ERROR downloading 
>> ${dwnld_path}/${kernel_tarball_check}"
>> +#   cd -
>> +#   exit 1
>> +#   fi
>>   
>>  ${WGET} -a ${outlog} ${dwnld_path}/${kernel_tarball}
>>  if [ $? -ne 0 ]; then
>> @@ -113,13 +113,13 @@ get_kernel_source()
>>  exit 1
>>  fi
>>   
>> -${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
>> -if [ $? -ne 0 ]; then
>> -${ECHO} "ERROR ${kernel_tarball_check} check failed"
>> -cd -
>> -exit 1
>> -fi
>> -cd -
>> +#   ${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
>> +#   if [ $? -ne 0 ]; then
>> +#   ${ECHO} "ERROR ${kernel_tarball_check} check failed"
>> +#   cd -
>> +#   exit 1
>> +#   fi
>> +#   cd -
>>   }
>>   
>>   # do_format() ${BLOCKSIZE} ${CLUSTERSIZE} ${FEATURES} ${DEVICE}
>> @@ -1012,16 +1012,6 @@ LOGFILE=${LOGDIR}/single_run.log
>>   
>>   do_mkdir ${LOGDIR}
>>   
>> -if [ -z ${KERNELSRC} ]; then
>> -get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> $KERNEL_TARBALL_CHECK
>> -KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
>> -fi
>> -
>> -if [ ! -f ${KERNELSRC} ]; then
>> -${ECHO} "No kernel source"
>> -usage
>> -fi
>> -
>>   STARTRUN=$(date +%s)
>>   log_message "*** Start Single Node test ***"
>>   
>> @@ -1058,6 +1048,16 @@ for tc in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; 
>> do
>>  fi
>>   
>>  if [ "$tc"X = "buildkernel"X -o "$tc"X = "all"X ];then
>> +if [ -z ${KERNELSRC} ]; then
>> +get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> #$KERNEL_TARBALL_CHECK
>> +KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
>> +fi
>> +
>> +if [ ! -f ${KERNELSRC} ]; then
>> +${ECHO} "No kernel source"
>> +usage
>> +fi
>> +
>>  run_buildkernel ${LOGDIR} ${DEVICE} ${MOUNTPOINT} ${KERNELSRC}
>>  fi
>>   
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3] ocfs2/dlm: Optimization of code while free dead node locks.

2017-01-18 Thread Eric Ren

Hi,

On 01/17/2017 07:22 PM, Guozhonghua wrote:
> Three loops can be optimized into one and its sub loops, so as small code can 
> do the same work.  ===> (1)
>
>  From 8a1e682503f4e5a5299fe8316cbf559f9b9701f1 Mon Sep 17 00:00:00 2001
> From: Guozhonghua <guozhong...@h3c.com>
> Date: Fri, 13 Jan 2017 11:27:32 +0800
> Subject: [PATCH] Optimization of code while free dead locks, changed for
>   reviews.
>   
>===> (2)
>
> Signed-off-by: Guozhonghua <guozhong...@h3c.com>
The patch looks good to me, except some formatting issues:
1. The commit message at (1) should be placed at (2);
2. Change log is still missing;

I think it's not a big deal, though. The fix is quite simple. Wish your patch 
has good
formatting next time;-)

Reviewed-by: Eric Ren <z...@suse.com>

Eric

> ---
>   fs/ocfs2/dlm/dlmrecovery.c |   39 ++-
>   1 file changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index dd5cb8b..93b71dd 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2268,6 +2268,8 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm,
>   {
>  struct dlm_lock *lock, *next;
>  unsigned int freed = 0;
> +   struct list_head *queue = NULL;
> +   int i;
>
>  /* this node is the lockres master:
>   * 1) remove any stale locks for the dead node
> @@ -2280,31 +2282,18 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm,
>   * to force the DLM_UNLOCK_FREE_LOCK action so as to free the locks 
> */
>
>  /* TODO: check pending_asts, pending_basts here */
> -   list_for_each_entry_safe(lock, next, >granted, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, >converting, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, >blocked, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> +   for (i = DLM_GRANTED_LIST; i <= DLM_BLOCKED_LIST; i++) {
> +   queue = dlm_list_idx_to_ptr(res, i);
> +   list_for_each_entry_safe(lock, next, queue, list) {
> +   if (lock->ml.node == dead_node) {
> +   list_del_init(>list);
> +   dlm_lock_put(lock);
> +   /* Can't schedule DLM_UNLOCK_FREE_LOCK
> +* do manually
> +*/
> +   dlm_lock_put(lock);
> +   freed++;
> +   }
>  }
>  }
>
> --
> 1.7.9.5
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v4 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-17 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren <z...@suse.com>
Reviewed-by: Junxiao Bi <junxiao...@oracle.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the

[Ocfs2-devel] [PATCH v4 0/2] fix deadlock caused by recursive cluster locking

2017-01-17 Thread Eric Ren

Hi Andrew,

This patch set version has got reviewed by Joseph and Junxiao Bi. I
think it's good to queued up now.

Thanks for all of you!
Eric

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Change since v3:
- Fixes redundant space, spotted by: Joseph Qi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v4 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-17 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

Change since v3:
- Fixes redundant space, spotted by: Joseph Qi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren <z...@suse.com>
Reviewed-by: Junxiao Bi <junxiao...@oracle.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..8dce409 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid = get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *o

Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-17 Thread Eric Ren

Hi!

On 01/17/2017 04:43 PM, Joseph Qi wrote:
> On 17/1/17 15:55, Eric Ren wrote:
>> Hi!
>>
>> On 01/17/2017 03:39 PM, Joseph Qi wrote:
>>>
>>> On 17/1/17 14:30, Eric Ren wrote:
>>>> We are in the situation that we have to avoid recursive cluster locking,
>>>> but there is no way to check if a cluster lock has been taken by a
>>>> precess already.
>>>>
>>>> Mostly, we can avoid recursive locking by writing code carefully.
>>>> However, we found that it's very hard to handle the routines that
>>>> are invoked directly by vfs code. For instance:
>>>>
>>>> const struct inode_operations ocfs2_file_iops = {
>>>>  .permission = ocfs2_permission,
>>>>  .get_acl= ocfs2_iop_get_acl,
>>>>  .set_acl= ocfs2_iop_set_acl,
>>>> };
>>>>
>>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>>>> do_sys_open
>>>>   may_open
>>>>inode_permission
>>>> ocfs2_permission
>>>>  ocfs2_inode_lock() <=== first time
>>>>   generic_permission
>>>>get_acl
>>>> ocfs2_iop_get_acl
>>>> ocfs2_inode_lock() <=== recursive one
>>>>
>>>> A deadlock will occur if a remote EX request comes in between two
>>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>>
>>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>>> on behalf of the remote EX lock request. Another hand, the recursive
>>>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>>>> because there is no chance for the first cluster lock on this node to be
>>>> unlocked - we block ourselves in the code path.
>>>>
>>>> The idea to fix this issue is mostly taken from gfs2 code.
>>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>>> keep track of the processes' pid  who has taken the cluster lock
>>>> of this lock resource;
>>>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>>>> it means just getting back disk inode bh for us if we've got cluster lock.
>>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>>> have got the cluster lock in the upper code path.
>>>>
>>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>>> to solve the recursive locking issue cuased by the fact that vfs routines
>>>> can call into each other.
>>>>
>>>> The performance penalty of processing the holder list should only be seen
>>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>>
>>>> You may ask what if the first time we got a PR lock, and the second time
>>>> we want a EX lock? fortunately, this case never happens in the real world,
>>>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>>>> the gfs2 code also do so.
>>>>
>>>> Changes since v1:
>>>> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>>>> process gets the cluster lock - suggested by: Joseph Qi 
>>>> <jiangqi...@gmail.com>
>>>> and Junxiao Bi <junxiao...@oracle.com>.
>>>>
>>>> - Change "struct ocfs2_holder" to a more meaningful name 
>>>> "ocfs2_lock_holder",
>>>> suggested by: Junxiao Bi.
>>>>
>>>> - Do not inline functions whose bodies are not in scope, changed by:
>>>> Stephen Rothwell <s...@canb.auug.org.au>.
>>>>
>>>> Changes since v2:
>>>> - Wrap the tracking logic code of recursive locking into functions,
>>>> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
>>>> suggested by: Junxiao Bi.
>>>>
>>>> [s...@canb.auug.org.au remove some inlines]
>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>> ---
>>>>   fs/ocfs2/dlmglue.c | 105 
>>>> +++--
>>>>   fs/ocfs2/dlmglue.h |  18 +
>>>>   fs/ocfs2/ocfs2.h   |   1 +
>>>>   3 files changed, 121 insertions(+), 3 deletion

Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

Hi!

On 01/17/2017 03:39 PM, Joseph Qi wrote:
>
> On 17/1/17 14:30, Eric Ren wrote:
>> We are in the situation that we have to avoid recursive cluster locking,
>> but there is no way to check if a cluster lock has been taken by a
>> precess already.
>>
>> Mostly, we can avoid recursive locking by writing code carefully.
>> However, we found that it's very hard to handle the routines that
>> are invoked directly by vfs code. For instance:
>>
>> const struct inode_operations ocfs2_file_iops = {
>>  .permission = ocfs2_permission,
>>  .get_acl= ocfs2_iop_get_acl,
>>  .set_acl= ocfs2_iop_set_acl,
>> };
>>
>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>> do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== first time
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>> ocfs2_inode_lock() <=== recursive one
>>
>> A deadlock will occur if a remote EX request comes in between two
>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>> on behalf of the remote EX lock request. Another hand, the recursive
>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>> because there is no chance for the first cluster lock on this node to be
>> unlocked - we block ourselves in the code path.
>>
>> The idea to fix this issue is mostly taken from gfs2 code.
>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>> keep track of the processes' pid  who has taken the cluster lock
>> of this lock resource;
>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>> it means just getting back disk inode bh for us if we've got cluster lock.
>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>> have got the cluster lock in the upper code path.
>>
>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>> to solve the recursive locking issue cuased by the fact that vfs routines
>> can call into each other.
>>
>> The performance penalty of processing the holder list should only be seen
>> at a few cases where the tracking logic is used, such as get/set acl.
>>
>> You may ask what if the first time we got a PR lock, and the second time
>> we want a EX lock? fortunately, this case never happens in the real world,
>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>> the gfs2 code also do so.
>>
>> Changes since v1:
>> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>> process gets the cluster lock - suggested by: Joseph Qi 
>> <jiangqi...@gmail.com>
>> and Junxiao Bi <junxiao...@oracle.com>.
>>
>> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
>> suggested by: Junxiao Bi.
>>
>> - Do not inline functions whose bodies are not in scope, changed by:
>> Stephen Rothwell <s...@canb.auug.org.au>.
>>
>> Changes since v2:
>> - Wrap the tracking logic code of recursive locking into functions,
>> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
>> suggested by: Junxiao Bi.
>>
>> [s...@canb.auug.org.au remove some inlines]
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c | 105 
>> +++--
>>   fs/ocfs2/dlmglue.h |  18 +
>>   fs/ocfs2/ocfs2.h   |   1 +
>>   3 files changed, 121 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 77d1632..c75b9e9 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>>   init_waitqueue_head(>l_event);
>>   INIT_LIST_HEAD(>l_blocked_list);
>>   INIT_LIST_HEAD(>l_mask_waiters);
>> +INIT_LIST_HEAD(>l_holders);
>>   }
>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>> @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>   res->l_flags = 0UL;
>>   }
>>   +/*
>> + * Keep a list of processes who have interest in a lockres.
>> + * Note: this is now

[Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);

[Ocfs2-devel] [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+* For instance:

[Ocfs2-devel] [PATCH v3 0/2] fix deadlock caused by recursive cluster locking

2017-01-16 Thread Eric Ren

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Hi!

On 01/16/2017 02:58 PM, Junxiao Bi wrote:
> On 01/16/2017 02:42 PM, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
>>
>> Changes since v1:
>> 1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>> process gets the cluster lock - suggested by: Joseph Qi 
>> <jiangqi...@gmail.com>
>> and Junxiao Bi <junxiao...@oracle.com>.
>>
>> 2. Change "struct ocfs2_holder" to a more meaningful name 
>> "ocfs2_lock_holder",
>> suggested by: Junxiao Bi <junxiao...@oracle.com>.
>>
>> 3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
>> catch exceptional cases, suggested by: Junxiao Bi <junxiao...@oracle.com>.
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/acl.c  | 39 +
>>   fs/ocfs2/file.c | 76 
>> +
>>   2 files changed, 100 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..3e47262 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>>  int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, , 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = _I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>>  if (status < 0) {
>>  if (status != -ENOENT)
>>  mlog_errno(status);
>>  return status;
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, );
>> +
> Same code pattern showed here and *get_acl, can it be abstracted to one
> function?
> The same issue for *setattr and *permission. Sorry for not mention that
> in last review.

Good idea! I will do it in the next version;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, );
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>  brelse(bh);
>> +
>>  return status;
>>   }
>>   
>> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>>  int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, _bh, 0);
>> +
>> +lockres = _I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OC

[Ocfs2-devel] [PATCH v2 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi <junxiao...@oracle.com>.

3. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c | 48 +---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..b045f02 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,46 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid) {
+   spin_unlock(>l_lock);
+   return 1;
+   }
+   }
+   spin_unlock(>l_lock);
+
+   return 0;
+}
+
 static inline void ocfs2_inc_

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi <junxiao...@oracle.com>.

3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi <junxiao...@oracle.com>.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/acl.c  | 39 +
 fs/ocfs2/file.c | 76 +
 2 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..3e47262 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
int status = 0;
-
-   status = ocfs2_inode_lock(inode, , 1);
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
+
+   lockres = _I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
return status;
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, );
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, );
+   ocfs2_inode_unlock(inode, 1);
+   }
brelse(bh);
+
return status;
 }
 
@@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
int ret;
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
+
+   lockres = _I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   ret = ocfs2_inode_lock_full(inode, _bh, 0, arg_flags);
if (ret < 0) {
if (ret != -ENOENT)
mlog_errno(ret);
return ERR_PTR(ret);
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, );
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, );
+   ocfs2_inode_unlock(inode, 0);
+   }
brelse(di_bh);
+
return acl;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..b620c25 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int arg_flags = 0, had_lock;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,13 +1176,41 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_i

[Ocfs2-devel] [PATCH v2 0/2] fix deadlock caused by recursive cluster locking

2017-01-15 Thread Eric Ren

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
and Junxiao Bi <junxiao...@oracle.com>.
 
2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi <junxiao...@oracle.com>.
 
3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi <junxiao...@oracle.com>.
 
4. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell <s...@canb.auug.org.au>.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c | 39 
 fs/ocfs2/dlmglue.c | 48 +++---
 fs/ocfs2/dlmglue.h | 18 +
 fs/ocfs2/file.c| 76 +++---
 fs/ocfs2/ocfs2.h   |  1 +
 5 files changed, 164 insertions(+), 18 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

On 01/16/2017 11:13 AM, Junxiao Bi wrote:
> On 01/16/2017 11:06 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>>>> can help us catch bug at the first time.
>>>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>>>> who takes the cluster lock.
>>>> It's harder for me to name all the possible paths;-/
>>> The BUG_ON() can help catch the path where ocfs2_setattr is not the
>>> first one.
>> Yes, I understand. But, the problem is that the vfs entries calling
>> order is out of our control.
>> I don't want to place an assertion where I'm not 100% sure it's
>> absolutely right;-)
> If it is not the first one, is it another recursive locking bug? In this
> case, if you don't like BUG_ON(), you can dump the call trace and print
> some warning message.

Yes! I like this idea, will add it in next version, thanks!

Eric

>
> Thanks,
> Junxiao.
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
>>>>>> +if (had_lock)
>>>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>>>> +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>>>>>> if (status < 0) {
>>>>>> if (status != -ENOENT)
>>>>>> mlog_errno(status);
>>>>>> goto bail_unlock_rw;
>>>>>> }
>>>>>> -inode_locked = 1;
>>>>>> +if (!had_lock) {
>>>>>> +ocfs2_add_holder(lockres, );
>>>>>> +inode_locked = 1;
>>>>>> +}
>>>>>>   if (size_change) {
>>>>>> status = inode_newsize_ok(inode, attr->ia_size);
>>>>>> @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
>>>>>> iattr *attr)
>>>>>> bail_commit:
>>>>>> ocfs2_commit_trans(osb, handle);
>>>>>> bail_unlock:
>>>>>> -if (status) {
>>>>>> +if (status && inode_locked) {
>>>>>> +ocfs2_remove_holder(lockres, );
>>>>>> ocfs2_inode_unlock(inode, 1);
>>>>>> inode_locked = 0;
>>>>>> }
>>>>>> @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
>>>>>> struct iattr *attr)
>>>>>> if (status < 0)
>>>>>> mlog_errno(status);
>>>>>> }
>>>>>> -if (inode_locked)
>>>>>> +if (inode_locked) {
>>>>>> +ocfs2_remove_holder(lockres, );
>>>>>> ocfs2_inode_unlock(inode, 1);
>>>>>> +}
>>>>>>   brelse(bh);
>>>>>> return status;
>>>>>> @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
>>>>>> int ocfs2_permission(struct inode *inode, int mask)
>>>>>> {
>>>>>> int ret;
>>>>>> +int has_locked;
>>>>>> +struct ocfs2_holder oh;
>>>>>> +struct ocfs2_lock_res *lockres;
>>>>>>   if (mask & MAY_NOT_BLOCK)
>>>>>> return -ECHILD;
>>>>>> -ret = ocfs2_inode_lock(inode, NULL, 0);
>>>>>> -if (ret) {
>>>>>> -if (ret != -ENOENT)
>>>>>> -mlog_errno(ret);
>>>>>> -goto out;
>>>>>> +lockres = _I(inode)->ip_inode_lockres;
>>>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>>> The same thing as ocfs2_setattr.
>>>> OK. I will think over your suggestions!
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>> Thanks,
>>>>> Junxiao.
>>>>>> +if (!has_locked) {
>>>>>> +ret = ocfs2_inode_lock(inode, NULL, 0);
>>>>>> +if (ret) {
>>>>>> +if (ret != -ENOENT)
>>>>>> +mlog_errno(ret);
>>>>>> +goto out;
>>>>>> +}
>>>>>> +ocfs2_add_holder(lockres, );
>>>>>> }
>>>>>>   ret = generic_permission(inode, mask);
>>>>>> -ocfs2_inode_unlock(inode, 0);
>>>>>> +if (!has_locked) {
>>>>>> +ocfs2_remove_holder(lockres, );
>>>>>> +ocfs2_inode_unlock(inode, 0);
>>>>>> +}
>>>>>> out:
>>>>>> return ret;
>>>>>> }
>>>>>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Hi Junxiao,

On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>> can help us catch bug at the first time.
>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>> who takes the cluster lock.
>> It's harder for me to name all the possible paths;-/
> The BUG_ON() can help catch the path where ocfs2_setattr is not the
> first one.
Yes, I understand. But, the problem is that the vfs entries calling order is 
out of our control.
I don't want to place an assertion where I'm not 100% sure it's absolutely 
right;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>>>
 +if (had_lock)
 +arg_flags = OCFS2_META_LOCK_GETBH;
 +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
goto bail_unlock_rw;
}
 -inode_locked = 1;
 +if (!had_lock) {
 +ocfs2_add_holder(lockres, );
 +inode_locked = 1;
 +}
  if (size_change) {
status = inode_newsize_ok(inode, attr->ia_size);
 @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
 iattr *attr)
bail_commit:
ocfs2_commit_trans(osb, handle);
bail_unlock:
 -if (status) {
 +if (status && inode_locked) {
 +ocfs2_remove_holder(lockres, );
ocfs2_inode_unlock(inode, 1);
inode_locked = 0;
}
 @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
 struct iattr *attr)
if (status < 0)
mlog_errno(status);
}
 -if (inode_locked)
 +if (inode_locked) {
 +ocfs2_remove_holder(lockres, );
ocfs2_inode_unlock(inode, 1);
 +}
  brelse(bh);
return status;
 @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
int ocfs2_permission(struct inode *inode, int mask)
{
int ret;
 +int has_locked;
 +struct ocfs2_holder oh;
 +struct ocfs2_lock_res *lockres;
  if (mask & MAY_NOT_BLOCK)
return -ECHILD;
-ret = ocfs2_inode_lock(inode, NULL, 0);
 -if (ret) {
 -if (ret != -ENOENT)
 -mlog_errno(ret);
 -goto out;
 +lockres = _I(inode)->ip_inode_lockres;
 +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> The same thing as ocfs2_setattr.
>> OK. I will think over your suggestions!
>>
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
 +if (!has_locked) {
 +ret = ocfs2_inode_lock(inode, NULL, 0);
 +if (ret) {
 +if (ret != -ENOENT)
 +mlog_errno(ret);
 +goto out;
 +}
 +ocfs2_add_holder(lockres, );
}
  ret = generic_permission(inode, mask);
-ocfs2_inode_unlock(inode, 0);
 +if (!has_locked) {
 +ocfs2_remove_holder(lockres, );
 +ocfs2_inode_unlock(inode, 0);
 +}
out:
return ret;
}

>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-13 Thread Eric Ren

On 01/13/2017 10:52 AM, Changwei Ge wrote:
> Hi Joseph,
>
> Do you think my last version of patch to fix umount hang after journal
> flushing failure is OK?
>
> If so, I 'd like to ask Andrew's help to merge this patch into his test
> tree.
>
>
> Thanks,
>
> Br.
>
> Changwei

The message above should not occur in a formal patch.  It should be put in 
"cover-letter" if
you want to say something to the other developers. See "git format-patch 
--cover-letter".

>
>
>
>  From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
> From: Changwei Ge 
> Date: Wed, 11 Jan 2017 09:05:35 +0800
> Subject: [PATCH] fix umount hang after journal flushing failure

The commit message is needed here! It should describe what's your problem, how 
to reproduce it,
and what's your solution, things like that.

>
> Signed-off-by: Changwei Ge 
> ---
>   fs/ocfs2/journal.c |   18 ++
>   1 file changed, 18 insertions(+)
>
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index a244f14..5f3c862 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
>   "commit_thread: %u transactions pending on "
>   "shutdown\n",
>   atomic_read(>j_num_trans));
> +
> +   if (status < 0) {
> +   mlog(ML_ERROR, "journal is already abort
> and cannot be "
> +"flushed any more. So ignore
> the pending "
> +"transactions to avoid blocking
> ocfs2 unmount.\n");

Can you find any example in the kernel source to print out message like that?!

I saw Joseph showed you the right way in previous email:
"

if (status < 0) {

  mlog(ML_ERROR, "journal is already abort and cannot be "

  "flushed any more. So ignore the pending "

  "transactions to avoid blocking ocfs2 unmount.\n");

"
So, please be careful and learn from the kernel source and the right way other 
developers do in
their patch work. Otherwise, it's meaningless to waste others' time in such 
basic issues.

> +   /*
> +* This may a litte hacky, however, no
> chance
> +* for ocfs2/journal to decrease this
> variable
> +* thourgh commit-thread. I have to do so to
> +* avoid umount hang after journal flushing
> +* failure. Since jounral has been
> marked ABORT
> +* within jbd2_journal_flush, commit
> cache will
> +* never do any real work to flush
> journal to
> +* disk.Set it to ZERO so that umount will
> +* continue during shutting down journal
> +*/
> +   atomic_set(>j_num_trans, 0);
It's possible to corrupt data doing this way. Why not just crash the kernel 
when jbd2 aborts?
and let the other node to do the journal recovery. It's the strength of cluster 
filesystem.

Anyway, it's glad to see you guys making contributions!

Thanks,
Eric


> +   }
>  }
>  }
>
> --
> 1.7.9.5
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-12 Thread Eric Ren

Hi Joseph,

On 01/09/2017 10:13 AM, Eric Ren wrote:
>>>> So you are trying to fix it by making phase3 finish without really doing
>>> Phase3 can go ahead because this node is already under protection of 
>>> cluster lock.
>> You said it was blocked...
> Oh, sorry, I meant phase3 can go ahead if this patch set is applied;-)
>
>> "Another hand, the recursive cluster lock (the second one) will be blocked in
>> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED."
>>>> __ocfs2_cluster_lock, then Process B can continue either.
>>>> Let us bear in mind that phase1 and phase3 are in the same context and
>>>> executed in order. That's why I think there is no need to check if locked
>>>> by myself in phase1.
> Sorry, I still cannot see it. Without keeping track of the first cluster 
> lock, how can we
> know if
> we are under a context that has already been in the protecting of cluster 
> lock? How can we
> handle
> the recursive locking (the second cluster lock) if we don't have this 
> information?
>>>> If phase1 finds it is already locked by myself, that means the holder
>>>> is left by last operation without dec holder. That's why I think it is a 
>>>> bug
>>>> instead of a recursive lock case.
> I think I got your point here. Do you mean that we should just add the lock 
> holder at the
> first locking position
> without checking before that? Unfortunately, it's tricky here to know exactly 
> which ocfs2
> routine will be the first vfs
> entry point, such as ocfs2_get_acl() which can be both the first vfs entry 
> point and the
> second vfs entry point after
> ocfs2_permission(), right?
>
> It will be a coding bug if the problem you concern about happens. I think we 
> don't need to
> worry about this much because
> the code logic here is quite simple;-)
Ping...

Did I clear your doubts by the last email? I really want to get your point, if 
not.

If there's any problem, I will fix them in the next version;-)

Thanks,
Eric

>
> Thanks for your patience!
> Eric
>
>>> D


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Eric Ren

Hi!

On 01/06/2017 05:55 PM, Joseph Qi wrote:
> On 17/1/6 17:13, Eric Ren wrote:
>> Hi,
>>
>>>>>>>>
>>>>>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>>>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>>>>>> ocfs2_setattr().
>>>>>>> As described cases above, shall we just add the tracking logic only for 
>>>>>>> set/get_acl()?
>>>>>>
>>>>>> The idea is to detect recursive locking on the running task stack. Take 
>>>>>> case 1) for 
>>>>>> example if ocfs2_permisssion()
>>>>>> is not changed:
>>>>>>
>>>>>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>>>>>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder 
>>>>>> on the 
>>>>>> tracking list
>>>>> I mean we have no need to check if locked by me, just do inode lock and 
>>>>> add holder.
>>>>> This will make code more clean, IMO.
>>>> Oh, sorry, I get your point this time. I think we need to check it if 
>>>> there are more 
>>>> than one processes that hold
>>>> PR lock on the same resource.  If I don't understand you correctly, please 
>>>> tell me why 
>>>> you think it's not neccessary
>>>> to check before getting lock?
>>> The code logic can only check if it is locked by myself. In the case
>> Why only...?
>>> described above, ocfs2_permission is the first entry to take inode lock.
>>> And even if check succeeds, it is a bug without unlock, but not the case
>>> of recursive lock.
>>
>> By checking succeeds, you mean it's locked by me, right? If so, this flag
>>   "arg_flags = OCFS2_META_LOCK_GETBH"
>> will be passed down to ocfs2_inode_lock_full(), which gets back buffer head 
>> of
>> the disk inode for us if necessary, but doesn't take cluster locking again. 
>> So, there is
>> no need to unlock in such case.
> I am trying to state my point more clearly...

Thanks a lot!

> The issue case you are trying to fix is:
> Process A
> take inode lock (phase1)
> ...
> <<< race window (phase2, Process B)

The deadlock only happens if process B is on a remote node and request EX lock.

Quote the patch[1/2]'s commit message:

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock (the
second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why?  because
there is no chance for the first cluster lock on this node to be unlocked
- we block ourselves in the code path.
---

> ...
> take inode lock again (phase3)
>
> Deadlock happens because Process B in phase2 and Process A in phase3
> are waiting for each other.
It's local lock's (like i_mutex) responsibility to protect critical section 
from racing
among processes on the same node.

> So you are trying to fix it by making phase3 finish without really doing

Phase3 can go ahead because this node is already under protection of cluster 
lock.

> __ocfs2_cluster_lock, then Process B can continue either.
> Let us bear in mind that phase1 and phase3 are in the same context and
> executed in order. That's why I think there is no need to check if locked
> by myself in phase1.
> If phase1 finds it is already locked by myself, that means the holder
> is left by last operation without dec holder. That's why I think it is a bug
> instead of a recursive lock case.

Did I answer your question?

Thanks!
Eric

>
> Thanks,
> Joseph
>>
>> Thanks,
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Thanks,
>>>> Eric
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>>
>>>>>> Thanks for your review;-)
>>>>>> Eric
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Joseph
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>>>>>> ---
>>>>>>>>   fs/ocfs2/acl.c  | 39 +

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Eric Ren

Hi,

>>>>>>
>>>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>>>> ocfs2_setattr().
>>>>> As described cases above, shall we just add the tracking logic only for 
>>>>> set/get_acl()?
>>>>
>>>> The idea is to detect recursive locking on the running task stack. Take 
>>>> case 1) for 
>>>> example if ocfs2_permisssion()
>>>> is not changed:
>>>>
>>>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>>>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder 
>>>> on the 
>>>> tracking list
>>> I mean we have no need to check if locked by me, just do inode lock and add 
>>> holder.
>>> This will make code more clean, IMO.
>> Oh, sorry, I get your point this time. I think we need to check it if there 
>> are more than 
>> one processes that hold
>> PR lock on the same resource.  If I don't understand you correctly, please 
>> tell me why 
>> you think it's not neccessary
>> to check before getting lock?
> The code logic can only check if it is locked by myself. In the case
Why only...?
> described above, ocfs2_permission is the first entry to take inode lock.
> And even if check succeeds, it is a bug without unlock, but not the case
> of recursive lock.

By checking succeeds, you mean it's locked by me, right? If so, this flag
   "arg_flags = OCFS2_META_LOCK_GETBH"
will be passed down to ocfs2_inode_lock_full(), which gets back buffer head of
the disk inode for us if necessary, but doesn't take cluster locking again. So, 
there is
no need to unlock in such case.

Thanks,
Eric

>
> Thanks,
> Joseph
>>
>> Thanks,
>> Eric
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Thanks for your review;-)
>>>> Eric
>>>>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>>
>>>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>>>> ---
>>>>>>   fs/ocfs2/acl.c  | 39 ++-
>>>>>>   fs/ocfs2/file.c | 44 ++--
>>>>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>>>> index bed1fcb..c539890 100644
>>>>>> --- a/fs/ocfs2/acl.c
>>>>>> +++ b/fs/ocfs2/acl.c
>>>>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>>>>>> posix_acl 
>>>>>> *acl, int type)
>>>>>>   {
>>>>>>   struct buffer_head *bh = NULL;
>>>>>>   int status = 0;
>>>>>> -
>>>>>> -status = ocfs2_inode_lock(inode, , 1);
>>>>>> +int arg_flags = 0, has_locked;
>>>>>> +struct ocfs2_holder oh;
>>>>>> +struct ocfs2_lock_res *lockres;
>>>>>> +
>>>>>> +lockres = _I(inode)->ip_inode_lockres;
>>>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>>>> +if (has_locked)
>>>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>>>> +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>>>>>>   if (status < 0) {
>>>>>>   if (status != -ENOENT)
>>>>>>   mlog_errno(status);
>>>>>>   return status;
>>>>>>   }
>>>>>> +if (!has_locked)
>>>>>> +ocfs2_add_holder(lockres, );
>>>>>> +
>>>>>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>>>>> -ocfs2_inode_unlock(inode, 1);
>>>>>> +
>>>>>> +if (!has_locked) {
>>>>>> +ocfs2_remove_holder(lockres, );
>>>>>> +ocfs2_inode_unlock(inode, 1);
>>>>>> +}
>>>>>>   brelse(bh);
>>>>>> +
>>>>>>   return status;
>>>>>>   }
>>>>>>   @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>>>>>> *inode, int 
>>>>>> type)
>>>>>>   stru

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Eric Ren

On 01/06/2017 02:07 PM, Joseph Qi wrote:
> Hi Eric,
>
>
> On 17/1/5 23:31, Eric Ren wrote:
>> We are in the situation that we have to avoid recursive cluster locking,
>> but there is no way to check if a cluster lock has been taken by a
>> precess already.
>>
>> Mostly, we can avoid recursive locking by writing code carefully.
>> However, we found that it's very hard to handle the routines that
>> are invoked directly by vfs code. For instance:
>>
>> const struct inode_operations ocfs2_file_iops = {
>>  .permission = ocfs2_permission,
>>  .get_acl= ocfs2_iop_get_acl,
>>  .set_acl= ocfs2_iop_set_acl,
>> };
>>
>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>> do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== first time
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>> ocfs2_inode_lock() <=== recursive one
>>
>> A deadlock will occur if a remote EX request comes in between two
>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>> on behalf of the remote EX lock request. Another hand, the recursive
>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>> because there is no chance for the first cluster lock on this node to be
>> unlocked - we block ourselves in the code path.
>>
>> The idea to fix this issue is mostly taken from gfs2 code.
>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>> keep track of the processes' pid  who has taken the cluster lock
>> of this lock resource;
>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>> it means just getting back disk inode bh for us if we've got cluster lock.
>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>> have got the cluster lock in the upper code path.
>>
>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>> to solve the recursive locking issue cuased by the fact that vfs routines
>> can call into each other.
>>
>> The performance penalty of processing the holder list should only be seen
>> at a few cases where the tracking logic is used, such as get/set acl.
>>
>> You may ask what if the first time we got a PR lock, and the second time
>> we want a EX lock? fortunately, this case never happens in the real world,
>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>> the gfs2 code also do so.
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c | 47 ---
>>   fs/ocfs2/dlmglue.h | 18 ++
>>   fs/ocfs2/ocfs2.h   |  1 +
>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 83d576f..500bda4 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>>   init_waitqueue_head(>l_event);
>>   INIT_LIST_HEAD(>l_blocked_list);
>>   INIT_LIST_HEAD(>l_mask_waiters);
>> +INIT_LIST_HEAD(>l_holders);
>>   }
>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>   res->l_flags = 0UL;
>>   }
>>   +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>> +   struct ocfs2_holder *oh)
>> +{
>> +INIT_LIST_HEAD(>oh_list);
>> +oh->oh_owner_pid =  get_pid(task_pid(current));
>> +
>> +spin_lock(>l_lock);
>> +list_add_tail(>oh_list, >l_holders);
>> +spin_unlock(>l_lock);
>> +}
>> +
>> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
>> +   struct ocfs2_holder *oh)
>> +{
>> +spin_lock(>l_lock);
>> +list_del(>oh_list);
>> +spin_unlock(>l_lock);
>> +
>> +put_pid(oh->oh_owner_pid);
>> +}
>> +
>> +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
>> *lockres)
>> +{
>> +struct ocfs2_holder *oh;
>> +struct pid *pid;
>> +
>&

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-05 Thread Eric Ren

On 01/06/2017 02:09 PM, Joseph Qi wrote:
> Hi Eric,
>
>
> On 17/1/5 23:31, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
> As described cases above, shall we just add the tracking logic only for 
> set/get_acl()?

The idea is to detect recursive locking on the running task stack. Take case 1) 
for example 
if ocfs2_permisssion()
is not changed:

ocfs2_permission() <=== take PR, ocfs2_holder is not added
ocfs2_iop_get_acl <=== still take PR, because there is no lock holder on 
the tracking list

Thanks for your review;-)
Eric

>
> Thanks,
> Joseph
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/acl.c  | 39 ++-
>>   fs/ocfs2/file.c | 44 ++--
>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..c539890 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, 
>> int type)
>>   {
>>   struct buffer_head *bh = NULL;
>>   int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, , 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = _I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>>   if (status < 0) {
>>   if (status != -ENOENT)
>>   mlog_errno(status);
>>   return status;
>>   }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, );
>> +
>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, );
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>   brelse(bh);
>> +
>>   return status;
>>   }
>>   @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>   struct buffer_head *di_bh = NULL;
>>   struct posix_acl *acl;
>>   int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> osb = OCFS2_SB(inode->i_sb);
>>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>   return NULL;
>> -ret = ocfs2_inode_lock(inode, _bh, 0);
>> +
>> +lockres = _I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +ret = ocfs2_inode_lock_full(inode, _bh, 0, arg_flags);
>>   if (ret < 0) {
>>   if (ret != -ENOENT)
>>   mlog_errno(ret);
>>   return ERR_PTR(ret);
>>   }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, );
>> acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   -ocfs2_inode_unlock(inode, 0);
>> +if (!has_locked) {
>> +o

[Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-05 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/acl.c  | 39 ++-
 fs/ocfs2/file.c | 44 ++--
 2 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..c539890 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
int status = 0;
-
-   status = ocfs2_inode_lock(inode, , 1);
+   int arg_flags = 0, has_locked;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
+
+   lockres = _I(inode)->ip_inode_lockres;
+   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
return status;
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, );
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, );
+   ocfs2_inode_unlock(inode, 1);
+   }
brelse(bh);
+
return status;
 }
 
@@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
int ret;
+   int arg_flags = 0, has_locked;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
+
+   lockres = _I(inode)->ip_inode_lockres;
+   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   ret = ocfs2_inode_lock_full(inode, _bh, 0, arg_flags);
if (ret < 0) {
if (ret != -ENOENT)
mlog_errno(ret);
return ERR_PTR(ret);
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, );
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, );
+   ocfs2_inode_unlock(inode, 0);
+   }
brelse(di_bh);
+
return acl;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..62be75d 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int arg_flags = 0, had_lock;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,13 +1176,20 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, , 1);
+   lockres = _I(inode)->ip_inode_lockres;
+   had_lock = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (had_lock)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
goto bail_unlock_rw;
}
-   inode_locked = 1;
+   if (!had_lock) {
+   ocfs2_add_holder(lockres,

[Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c | 47 ---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..500bda4 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
*lockres)
+{
+   struct ocfs2_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid)
+   goto out;
+   }
+   oh = NULL;
+out:
+   spin_unlock(>l_lock);
+   return oh;
+}
+
 static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 int level)
 {
@@ -2333,8 +2373,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
goto getbh;
}
 
-   if (ocfs2_mount_local(osb))
-   goto local;
+   if ((arg_flags & OCFS2_META_LOCK_GETBH) ||
+   ocfs2_mount_local(osb))
+   goto update;
 
if (!(arg_flags & OCFS2_META_LOCK_RECOVERY))
ocfs2_wait_for_recovery(osb);
@@ -2363,7 +2404,7 @@ int ocfs2_inode_lock_full_n

Re: [Ocfs2-devel] [PATCH 00/17] ocfs2-test: misc improvements and trivial fixes

2017-01-04 Thread Eric Ren

Hi all,

I will push this patches into "suse" branch at Mark's github repo, considering 
no review 
accepted for more than 2 weeks.
According to Mark's advice, patch can be merged only when it has a review;-)

Thanks,
Eric

On 12/13/2016 01:29 PM, Eric Ren wrote:
> - Misc trivial fixes:
>
> [PATCH 01/17] ocfs2 test: correct the check on testcase if supported
> [PATCH 02/17] Single Run: kernel building is little broken now
> [PATCH 03/17] Trivial: better not to depend on where we issue testing
> [PATCH 04/17] Trivial: fix a typo mistake
> [PATCH 05/17] Trivial: fix checking empty return value
> [PATCH 06/17] multi_mmap: make log messages go to right place
> [PATCH 07/17] lvb_torture: failed when pcmk is used as cluster stack
> [PATCH 08/17] multiple node: pass cross_delete the right log file
>
> - This patches add two more parameters: blocksize and clustersize when we
> kick off a testing, which shortens the run time of a testing round.
> It will keep the old behaviors if they are not specified.
>
> [PATCH 09/17] Single run: make blocksize and clustersize as
> [PATCH 10/17] Multiple run: make blocksize and clustersize as
> [PATCH 11/17] discontig bg: make blocksize and clustersize as
>
> - This patch reflects the mkfs.ocfs2 changes that "--cluster-stack" and
> "--cluster-name" were added.
>
> [PATCH 12/17] Add two cluster-aware parameters: cluster stack and cluster name
>
> - More misc trival fixes:
>
> [PATCH 13/17] Save punch_hole details into logfile for debugging
> [PATCH 14/17] Fix openmpi warning by specifying proper slot number
> [PATCH 15/17] Handle the case when a symbolic link device is given
> [PATCH 16/17] inline data: fix build error
> [PATCH 17/17] discontig bg: give single and multiple node test
>
> Comments and questions are, as always, welcome.
>
> Thanks,
> Eric
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 01/17] ocfs2 test: correct the check on testcase if supported

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren <z...@suse.com>
---
 programs/python_common/multiple_run.sh   | 2 +-
 programs/python_common/single_run-WIP.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index dd9603f..c4a7da9 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -201,7 +201,7 @@ f_setup()
fi
 
SUPPORTED_TESTCASES="all xattr inline reflink write_append_truncate 
multi_mmap create_racer flock_unit cross_delete open_delete lvb_torture"
-   for cas in ${TESTCASES}; do
+   for cas in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; do
echo ${SUPPORTED_TESTCASES} | grep -sqw $cas
if [ $? -ne 0 ]; then
echo "testcase [${cas}] not supported."
diff --git a/programs/python_common/single_run-WIP.sh 
b/programs/python_common/single_run-WIP.sh
index 5a8fae1..fe0056c 100755
--- a/programs/python_common/single_run-WIP.sh
+++ b/programs/python_common/single_run-WIP.sh
@@ -997,7 +997,7 @@ fi
 SUPPORTED_TESTCASES="all create_and_open directaio fillverifyholes 
renamewriterace aiostress\
   filesizelimits mmaptruncate buildkernel splice sendfile mmap reserve_space 
inline xattr\
   reflink mkfs tunefs backup_super filecheck"
-for cas in ${TESTCASES}; do
+for cas in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; do
echo ${SUPPORTED_TESTCASES} | grep -sqw $cas
if [ $? -ne 0 ]; then
echo "testcase [${cas}] not supported."
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-11 Thread Eric Ren

b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
>> [  247.834748] RIP  [] ocfs2_truncate_file+0x640/0x6c0
>> [ocfs2]
>> [  247.834774]  RSP 
>> "
>>
>> It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not
>> equal to the disk i_size. We mistakenly trust the LVB because the
>> underlaying
>> fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly
>> for
>> us. But, why?
>>
>> The current code tries to downconvert lock without DLM_LKF_VALBLK
>> flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion,
>> even if the lock resource type needs LVB. This is not the right way for
>> fsdlm.
>>
>> The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
>> DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
>> DLM_LKF_VALBLK
>> is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the
>> right
>> DLM_SBF_VALNOTVALID appropriately when node failure happens.
>>
>> The following diagram briefly illustrates how this crash happens:
>>
>> RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
>>
>> The 1st round:
>>
>>   Node1Node2
>> RSB1: PR
>>RSB1(master): NULL->EX
>> ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
>>ocfs2_dlm_lock(no DLM_LKF_VALBLK)
>>
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>> dlm_lock(no DLM_LKF_VALBLK)
>>convert_lock(overwrite lkb->lkb_exflags
>> with no DLM_LKF_VALBLK)
>>
>> RSB1: NULLRSB1: EX
>>reset Node2
>> dlm_recover_rsbs()
>>recover_lvb()
>>
>> /* The LVB is not trustable if the node with EX fails and
>>   * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
>>   */
>>
>>   if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
>> return;   * to invalid the LVB here.
>>   */
>>
>> The 2nd round:
>>
>>   Node 1Node2
>> RSB1(become master from recovery)
>>
>> ocfs2_setattr()
>>ocfs2_inode_lock(NULL->EX)
>>  /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID
>> */
>>  ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from
>> disk */
>>ocfs2_truncate_file()
>>mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */
>>
>> The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for
>> dlm_lock()
>> if the lock resource type needs LVB and the fsdlm plugin is uesed.
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/dlmglue.c   | 10 ++
>>   fs/ocfs2/stackglue.c |  6 ++
>>   fs/ocfs2/stackglue.h |  3 +++
>>   3 files changed, 19 insertions(+)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 83d576f..77d1632 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -3303,6 +3303,16 @@ static int ocfs2_downconvert_lock(struct ocfs2_super
>> *osb,
>>  mlog(ML_BASTS, "lockres %s, level %d => %d\n", lockres->l_name,
>>   lockres->l_level, new_level);
>>   
>> +/*
>> + * On DLM_LKF_VALBLK, fsdlm behaves differently with o2cb. It always
>> + * expects DLM_LKF_VALBLK being set if the LKB has LVB, so that
>> + * we can recover correctly from node failure. Otherwise, we may get
>> + * invalid LVB in LKB, but without DLM_SBF_VALNOTVALID being set.
>> + */
>> +if (!ocfs2_is_o2cb_active() &&
>> +lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
>> +lvb = 1;
>> +
>>  if (lvb)
>>  dlm_flags |= DLM_LKF_VALBLK;
>>   
>> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
>> index 52c07346b..8203590 100644
>> --- a/fs/ocfs2/stackglue.c
>> +++ b/fs/ocfs2/stackglue.c
>> @@ -48,6 +48,12 @@ static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] =
>> "/sbin/ocfs2_hb_ctl";
>>*/
>>   static struct ocfs2_stack_plugin *active_stack;
>>   
>> +inline int ocfs2_is_o2cb_active(void)
>> +{
>> +return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
>> +}
>> +EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
>> +
>>   static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
>>   {
>>  struct ocfs2_stack_plugin *p;
>> diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
>> index f2dce10..e3036e1 100644
>> --- a/fs/ocfs2/stackglue.h
>> +++ b/fs/ocfs2/stackglue.h
>> @@ -298,6 +298,9 @@ void ocfs2_stack_glue_set_max_proto_version(struct
>> ocfs2_protocol_version *max_p
>>   int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
>>   void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
>>   
>> +/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
>> +int ocfs2_is_o2cb_active(void);
>> +
>>   extern struct kset *ocfs2_kset;
>>   
>>   #endif  /* STACKGLUE_H */
>> -- 
>> 2.6.6
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren

Sorry, this email is not delivered to Mark successfully because of one weird 
character
trailing his email address somehow.

So, resend later...

Thanks,
Eric

On 12/09/2016 05:24 PM, Eric Ren wrote:
> The crash happens rather often when we reset some cluster
> nodes while nodes contend fiercely to do truncate and append.
>
> The crash backtrace is below:
> "
> [  245.197849] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 
> locks on 971 resources
> [  245.197859] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 
> generation 5 done: 4 ms
> [  245.198379] ocfs2: Begin replay journal (node 318952601, slot 2) on device 
> (253,18)
> [  247.272338] ocfs2: End replay journal (node 318952601, slot 2) on device 
> (253,18)
> [  247.547084] ocfs2: Beginning quota recovery on device (253,18) for slot 2
> [  247.683263] ocfs2: Finishing quota recovery on device (253,18) for slot 2
> [  247.833022] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug 
> expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
> [  247.833029] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 
> 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
> [  247.833074] [ cut here ]
> [  247.833077] kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
> [  247.833079] invalid opcode:  [#1] SMP
> [  247.833081] Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) 
> ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse 
> sd_modiscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet 
> iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr 
> parport  joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button 
> processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix  
>  drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt 
> fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio 
> ehci_hcd   usbcore serio_raw usb_common sg dm_multipath dm_mod 
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> [  247.833107] Supported: No, Unsupported modules are loaded
> [  247.833110] CPU: 1 PID: 30154 Comm: truncate Tainted: G   OE   N  
> 4.4.21-69-default #1
> [  247.833111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
> [  247.833112] task: 88004ff6d240 ti: 880074e68000 task.ti: 
> 880074e68000
> [  247.833113] RIP: 0010:[]  [] 
> ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
> [  247.833151] RSP: 0018:880074e6bd50  EFLAGS: 00010282
> [  247.833152] RAX: 0074 RBX: 029e RCX: 
> 
> [  247.833153] RDX: 0001 RSI: 0246 RDI: 
> 0246
> [  247.833154] RBP: 880074e6bda8 R08: 3675dc7a R09: 
> 82013414
> [  247.833155] R10: 00034c50 R11:  R12: 
> 88003aab3448
> [  247.833156] R13: 02dc R14: 00046e11 R15: 
> 0020
> [  247.833157] FS:  7f839f965700() GS:88007fc8() 
> knlGS:
> [  247.833158] CS:  0010 DS:  ES:  CR0: 8005003b
> [  247.833159] CR2: 7f839f97e000 CR3: 36723000 CR4: 
> 06e0
> [  247.833164] Stack:
> [  247.833165]  03a9 0001 880060554000 
> 88004fcaf000
> [  247.833167]  88003aa7b090 1000 88003aab3448 
> 880074e6beb0
> [  247.833169]  0001 2068 0020 
> 
> [  247.833171] Call Trace:
> [  247.833208]  [] ocfs2_setattr+0x698/0xa90 [ocfs2]
> [  247.833225]  [] notify_change+0x1ae/0x380
> [  247.833242]  [] do_truncate+0x5e/0x90
> [  247.833246]  [] do_sys_ftruncate.constprop.11+0x108/0x160
> [  247.833257]  [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [  247.834724] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
> [  247.834725]
> [  247.834726] Leftover inexact backtrace:
>
> [  247.834728] Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 
> 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 
> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
> [  247.834748] RIP  [] ocfs2_truncate_file+0x640/0x6c0 
> [ocfs2]
> [  247.834774]  RSP 
> "
>
> It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not
> equal to the disk i_size. We mistakenly trust the LVB because the underlaying
> fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for
> us. But, why?
>
> The current code tries to downconvert lock without DLM_LKF_VALBLK
> flag to tell o2cb don't update RSB's LVB if it's a PR

[Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren

   RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
   with no DLM_LKF_VALBLK)

RSB1: NULLRSB1: EX
  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
   return;   * to invalid the LVB here.
 */

The 2nd round:

 Node 1Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
/* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from 
disk */
  ocfs2_truncate_file()
  mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for 
dlm_lock()
if the lock resource type needs LVB and the fsdlm plugin is uesed.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c   | 10 ++
 fs/ocfs2/stackglue.c |  6 ++
 fs/ocfs2/stackglue.h |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..77d1632 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3303,6 +3303,16 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
*osb,
mlog(ML_BASTS, "lockres %s, level %d => %d\n", lockres->l_name,
 lockres->l_level, new_level);
 
+   /*
+* On DLM_LKF_VALBLK, fsdlm behaves differently with o2cb. It always
+* expects DLM_LKF_VALBLK being set if the LKB has LVB, so that
+* we can recover correctly from node failure. Otherwise, we may get
+* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID??being set.
+*/
+   if (!ocfs2_is_o2cb_active() &&
+   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
+   lvb = 1;
+
if (lvb)
dlm_flags |= DLM_LKF_VALBLK;
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 52c07346b..8203590 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -48,6 +48,12 @@ static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = 
"/sbin/ocfs2_hb_ctl";
  */
 static struct ocfs2_stack_plugin *active_stack;
 
+inline int ocfs2_is_o2cb_active(void)
+{
+   return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
+}
+EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
+
 static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
 {
struct ocfs2_stack_plugin *p;
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index f2dce10..e3036e1 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -298,6 +298,9 @@ void ocfs2_stack_glue_set_max_proto_version(struct 
ocfs2_protocol_version *max_p
 int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
 void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 
+/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
+int ocfs2_is_o2cb_active(void);
+
 extern struct kset *ocfs2_kset;
 
 #endif  /* STACKGLUE_H */
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 0/7] quota: Use s_umount for quota on/off serialization

2016-11-30 Thread Eric Ren

Hello,

On 11/24/2016 04:12 PM, Jan Kara wrote:
> Hello,
>
> this patch set changes quota code to use s_umount semaphore for serialization
> of quota on/off operations among each other and with other quotactl and
> quota writeback operations. So far we have used dedicated dqonoff_mutex but
> that triggered lockdep warnings during fs freezing and also unnecessarily
> serialized some quotactl operations.
>
> Al, any objections to patch 1/7 exporting functionality to get superblock with
> s_umount in exclusive mode? Alternatively I could add a wrapper around
> get_super_thawed() in quota code to drop s_umount & get it in exclusive mode
> and recheck that superblock didn't get unmounted / frozen but what I did here
> looked cleaner to me.
>
> OCFS2 guys, it would be good if you could test ocfs2 quotas with this patch 
> set
> in some multi-node setup (I have tested just with a single node), especially
> whether quota file recovery for other nodes still works as expected. Thanks.

With this patch set, the quota file recovery works well for ocfs2 on multiple 
nodes.

Tested-by:Eric Ren <z...@suse.com>

Thanks,
Eric
>
> If nobody objects, I'll push these changes through my tree to Linus.
>
>   Honza
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Optimization of code while free dead locks, changed for reviews.

2016-11-28 Thread Eric Ren

Hi,

I am tired telling you things about patch format... won't do any response until 
you really 
model after
a correct patch.

Eric

On 11/28/2016 05:05 PM, Guozhonghua wrote:
> Changed the free order and code styles with reviews. Based on Linux-4.9-rc6. 
> Thanks.
>
> Signed-off-by: guozhonghua 
>
> diff -uprN ocfs2.orig/dlm/dlmrecovery.c ocfs2/dlm/dlmrecovery.c
> --- ocfs2.orig/dlm/dlmrecovery.c2016-11-28 16:26:45.890934481 +0800
> +++ ocfs2/dlm/dlmrecovery.c 2016-11-28 16:32:04.982940629 +0800
> @@ -2268,6 +2268,9 @@ static void dlm_free_dead_locks(struct d
>   {
>  struct dlm_lock *lock, *next;
>  unsigned int freed = 0;
> +   struct list_head *queue = NULL;
> +   int i;
> +
>
>  /* this node is the lockres master:
>   * 1) remove any stale locks for the dead node
> @@ -2280,33 +2283,19 @@ static void dlm_free_dead_locks(struct d
>   * to force the DLM_UNLOCK_FREE_LOCK action so as to free the locks 
> */
>
>  /* TODO: check pending_asts, pending_basts here */
> -   list_for_each_entry_safe(lock, next, >granted, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, >converting, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, >blocked, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(>list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> +   for (i = DLM_GRANTED_LIST; i <= DLM_BLOCKED_LIST; i++) {
> +   queue = dlm_list_idx_to_ptr(res, i);
> +   list_for_each_entry_safe(lock, next, queue, list) {
> +   if (lock->ml.node == dead_node) {
> +   list_del_init(>list);
> +   dlm_lock_put(lock);
> +
> +   /* Can't schedule DLM_UNLOCK_FREE_LOCK
> +* - do manually
> +*/
> +   dlm_lock_put(lock);
> +   freed++;
> +   }
> +   }
> +   }
>
>  if (freed) {
>  mlog(0, "%s:%.*s: freed %u locks for dead node %u, "
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [Bug Report] multiple node reflink: kernel BUG at ../fs/ocfs2/suballoc.c:1989!

2016-11-23 Thread Eric Ren

Hi all,

FYI,

Reflink testcase in multiple nodes mode failed with the backtrace below:

---
2016-11-02T16:43:41.862247+08:00 ocfs2cts2 kernel: [25429.622914] [ 
cut here ]
2016-11-02T16:43:41.862273+08:00 ocfs2cts2 kernel: [25429.622979] kernel BUG at 
../fs/ocfs2/suballoc.c:1989!
2016-11-02T16:43:41.862274+08:00 ocfs2cts2 kernel: [25429.623024] invalid 
opcode:  [#1] SMP
2016-11-02T16:43:41.862276+08:00 ocfs2cts2 kernel: [25429.623064] Modules 
linked in: ocfs2_stack_user ocfs2 ocfs2_nodemanager ocfs2_stackglue jbd2 
quota_tree dlm configfs softdog sd_mod iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs ppdev acpi_cpufreq 
pvpanic virtio_net parport_pc joydev serio_raw i2c_piix4 pcspkr parport button 
virtio_balloon processor btrfs ata_generic xor raid6_pq ata_piix ahci libahci 
cirrus virtio_blk drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
ttm uhci_hcd libata ehci_hcd usbcore drm virtio_pci floppy virtio_ring 
usb_common virtio sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
scsi_mod autofs4
2016-11-02T16:43:41.862277+08:00 ocfs2cts2 kernel: [25429.623590] Supported: Yes
2016-11-02T16:43:41.862278+08:00 ocfs2cts2 kernel: [25429.623624] CPU: 0 PID: 
1923 Comm: multi_reflink_t Not tainted 4.4.21-69-default #1
2016-11-02T16:43:41.862279+08:00 ocfs2cts2 kernel: [25429.623684] Hardware 
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
2016-11-02T16:43:41.862280+08:00 ocfs2cts2 kernel: [25429.623744] task: 
880080010480 ti: 8800806cc000 task.ti: 8800806cc000
2016-11-02T16:43:41.862281+08:00 ocfs2cts2 kernel: [25429.623801] RIP: 
0010:[]  [] 
ocfs2_claim_metadata+0x148/0x150 [ocfs2]
2016-11-02T16:43:41.862282+08:00 ocfs2cts2 kernel: [25429.623961] RSP: 
0018:8800806cf838  EFLAGS: 00010297
2016-11-02T16:43:41.862282+08:00 ocfs2cts2 kernel: [25429.624010] RAX: 
0003 RBX: 88008989f4c0 RCX: 8800806cf8b0
2016-11-02T16:43:41.862284+08:00 ocfs2cts2 kernel: [25429.624064] RDX: 
0001 RSI: 88008989f4c0 RDI: 88006e2313b8
2016-11-02T16:43:41.862284+08:00 ocfs2cts2 kernel: [25429.624119] RBP: 
 R08: 8800806cf8aa R09: 8800806cf8ac
2016-11-02T16:43:41.862285+08:00 ocfs2cts2 kernel: [25429.624173] R10: 
0001d5e0 R11: 88011f9912c0 R12: 88008076e000
2016-11-02T16:43:41.862286+08:00 ocfs2cts2 kernel: [25429.624226] R13: 
88006e2313b8 R14: 88003693cae8 R15: 88008989f4c0
2016-11-02T16:43:41.862287+08:00 ocfs2cts2 kernel: [25429.624281] FS:  
7f6a35621740() GS:88013fc0() knlGS:
2016-11-02T16:43:41.862288+08:00 ocfs2cts2 kernel: [25429.624342] CS:  0010 DS: 
 ES:  CR0: 8005003b
2016-11-02T16:43:41.862289+08:00 ocfs2cts2 kernel: [25429.624389] CR2: 
0114e000 CR3: 53597000 CR4: 06f0
2016-11-02T16:43:41.862297+08:00 ocfs2cts2 kernel: [25429.624448] Stack:
2016-11-02T16:43:41.862299+08:00 ocfs2cts2 kernel: [25429.624476]  
810b7a80   
2016-11-02T16:43:41.862300+08:00 ocfs2cts2 kernel: [25429.624534]  
 0001  88008076e000
2016-11-02T16:43:41.862301+08:00 ocfs2cts2 kernel: [25429.624592]  
88006e2313b8 88003693cae8 a051bcb9 8800806cf8b8
2016-11-02T16:43:41.862302+08:00 ocfs2cts2 kernel: [25429.624649] Call Trace:
2016-11-02T16:43:41.862303+08:00 ocfs2cts2 kernel: [25429.624719]  
[] ocfs2_create_new_meta_bhs.isra.49+0x69/0x330 [ocfs2]
2016-11-02T16:43:41.862304+08:00 ocfs2cts2 kernel: [25429.624797]  
[] ocfs2_add_branch+0x1fd/0x830 [ocfs2]
2016-11-02T16:43:41.862306+08:00 ocfs2cts2 kernel: [25429.624878]  
[] ocfs2_grow_tree+0x350/0x710 [ocfs2]
2016-11-02T16:43:41.862307+08:00 ocfs2cts2 kernel: [25429.624943]  
[] ocfs2_split_and_insert+0x2e1/0x450 [ocfs2]
2016-11-02T16:43:41.862308+08:00 ocfs2cts2 kernel: [25429.625012]  
[] ocfs2_split_extent+0x3e4/0x540 [ocfs2]
2016-11-02T16:43:41.862309+08:00 ocfs2cts2 kernel: [25429.625082]  
[] ocfs2_clear_ext_refcount+0x1c9/0x2b0 [ocfs2]
2016-11-02T16:43:41.862310+08:00 ocfs2cts2 kernel: [25429.625155]  
[] ocfs2_make_clusters_writable+0x3b8/0x8d0 [ocfs2]
2016-11-02T16:43:41.862311+08:00 ocfs2cts2 kernel: [25429.625229]  
[] ocfs2_replace_cow+0x87/0x1c0 [ocfs2]
2016-11-02T16:43:41.862312+08:00 ocfs2cts2 kernel: [25429.626825]  
[] ocfs2_refcount_cow+0x3ea/0x4f0 [ocfs2]
2016-11-02T16:43:41.862314+08:00 ocfs2cts2 kernel: [25429.626825]  
[] ocfs2_file_write_iter+0xb8b/0xdf0 [ocfs2]
2016-11-02T16:43:41.862315+08:00 ocfs2cts2 kernel: [25429.626825]  
[] __vfs_write+0xa9/0xf0
2016-11-02T16:43:41.862316+08:00 ocfs2cts2 kernel: [25429.626825]  
[] vfs_write+0x9d/0x190
2016-11-02T16:43:41.862317+08:00 ocfs2cts2 kernel: [25429.626825]  
[] SyS_pwrite64+0x62/0x90
2016-11-02T16:43:41.862318+08:00 ocfs2cts2 kernel: [25429.626825]  
[] entry_SYSCALL_64_fastpath+0x12/0x6d

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-16 Thread Eric Ren

Hi,

On 11/16/2016 06:45 PM, Dan Carpenter wrote:
> On Wed, Nov 16, 2016 at 10:33:49AM +0800, Eric Ren wrote:
> That silences the warning, of course, but I feel like the code is buggy.
> How do we know that we don't hit that exit path?
Sorry, I missed your point. Do you mean the below?

"1817 goto out_quota; " will free (*wc), but with "ret = 0". Thus, the caller
think it's OK to use (*wc), but...

Do I understand you correctly?

Eric
>
> fs/ocfs2/aops.c
>1808  /*
>1809   * ocfs2_grab_pages_for_write() returns -EAGAIN if it could 
> not lock
>1810   * the target page. In this case, we exit with no error and 
> no target
>1811   * page. This will trigger the caller, page_mkwrite(), to 
> re-try
>1812   * the operation.
>1813   */
>1814  if (ret == -EAGAIN) {
>1815  BUG_ON(wc->w_target_page);
>1816  ret = 0;
>1817  goto out_quota;
>1818  }
>
> regards,
> dan carpenter
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-15 Thread Eric Ren

Hi Dan,

On 11/15/2016 06:36 PM, Dan Carpenter wrote:
> Ryan's email is dead.  But this is buggy.  Someone please fix it.
>
> regards,
> dan carpenter
>
> On Tue, Nov 15, 2016 at 01:33:30PM +0300, Dan Carpenter wrote:
>> I never got a response on this.  I was looking at it today and it still
>> looks buggy to me.
>>
>> regards,
>> dan carpenter
>>
>> On Wed, Mar 09, 2016 at 01:25:05PM +0300, Dan Carpenter wrote:
>>> Hello Ryan Ding,
>>>
>>> The patch fbe25fb91af5: "ocfs2: fix sparse file & data ordering issue
>>> in direct io" from Feb 25, 2016, leads to the following static
>>> checker warning:
>>>
>>> fs/ocfs2/aops.c:2242 ocfs2_dio_get_block()
>>> error: potentially dereferencing uninitialized 'wc'.
>>>
>>> fs/ocfs2/aops.c
>>>2235
>>>2236  ret = ocfs2_write_begin_nolock(inode->i_mapping, pos, len,
>>>2237 OCFS2_WRITE_DIRECT, NULL,
>>>2238 (void **), di_bh, NULL);
>>> 
How do you perform the static checker? Please tech me;-)

Regarding this warning, please try to make this line 
(https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L2128)
into:

struct ocfs2_write_ctxt *wc = NULL;

It should work, and haven't any side effect.

Eric
>>>
>>> See commit 5c9e2986 ('ocfs2: Fix ocfs2_page_mkwrite()') for an
>>> explanation why a zero return here does not imply that "wc" has been
>>> initialized.
>>>
>>>2239  if (ret) {
>>>2240  mlog_errno(ret);
>>>2241  goto unlock;
>>>2242  }
>>>2243
>>>2244  desc = >w_desc[0];
>>>2245
>>>2246  p_blkno = ocfs2_clusters_to_blocks(inode->i_sb, 
>>> desc->c_phys);
>>>
>>> regards,
>>> dan carpenter
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren

Hi,
> Thanks for your attention. Actually, I tried different versions of draft 
> patch locally.
> Either of them can satisfy myself so far.

Sorry, I meat "neither of them".

Eric
> Some rules I'd like to follow:
> 1) check and avoid recursive cluster locking, rather than allow it which 
> Junxiao had tried
> before;
> 2) Just keep track of lock resource that meets the following requirements:
>a. normal inodes (non systemfile);
>b. inode metadata lockres (not open, rw lockres);
> why? to avoid more special cluster locking usecases, like journal systemfile, 
> "LOST+FOUND"
> open lockres, that lock/unlock
> operations are performed by different processes, making tracking task more 
> tricky.
> 3) There is another problem if we follow "check + avoid" pattern, which I 
> have mentioned in
> this thread:
> """
> This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.
>
> So, we may need another function something like ocfs2_inode_getbh():
>if (!oh)
>   ocfs2_inode_lock();
>  else
>  ocfs2_inode_getbh();
> """
>
> Hope we can work out a nice solution for this tricky issue ;-)
>
> Eric
>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren

Hi,

On 11/14/2016 01:42 PM, piaojun wrote:
> Hi Eric,
>
>
> OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST 
> (ocfs2_generic_handle_bast) when downconvert is needed
> on behalf of remote lock request.
>
> The recursive cluster lock (the second one) will be blocked in 
> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED.
> But the downconvert cannot be done, why? because there is no chance for the 
> first cluster lock on this node to be unlocked -
> we blocked ourselves in the code path.
>
> Eric
> You clear my doubt. I will look through your solution.

Thanks for your attention. Actually, I tried different versions of draft patch 
locally. 
Either of them can satisfy myself so far.
Some rules I'd like to follow:
1) check and avoid recursive cluster locking, rather than allow it which 
Junxiao had tried 
before;
2) Just keep track of lock resource that meets the following requirements:
  a. normal inodes (non systemfile);
  b. inode metadata lockres (not open, rw lockres);
why? to avoid more special cluster locking usecases, like journal systemfile, 
"LOST+FOUND" 
open lockres, that lock/unlock
operations are performed by different processes, making tracking task more 
tricky.
3) There is another problem if we follow "check + avoid" pattern, which I have 
mentioned in 
this thread:
"""
This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.

So, we may need another function something like ocfs2_inode_getbh():
  if (!oh)
 ocfs2_inode_lock();
else
ocfs2_inode_getbh();
"""

Hope we can work out a nice solution for this tricky issue ;-)

Eric

>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren

On 11/11/2016 02:20 PM, Darrick J. Wong wrote:
> On Fri, Nov 11, 2016 at 01:49:48PM +0800, Eric Ren wrote:
>> Hi,
>>
>> A few issues obvious to me:
>>
>> On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
>>> Connect the new VFS clone_range, copy_range, and dedupe_range features
>>> to the existing reflink capability of ocfs2.  Compared to the existing
>>> ocfs2 reflink ioctl We have to do things a little differently to support
>>> the VFS semantics (we can clone subranges of a file but we don't clone
>>> xattrs), but the VFS ioctls are more broadly supported.
>> How can I test the new ocfs2 reflink (with this patch) manually? What
>> commands should I use to do xxx_range things?
> See the 'reflink', 'dedupe', and 'copy_range' commands in xfs_io.
>
> The first two were added in xfsprogs 4.3, and copy_range in 4.7.

OK, thanks. I think you are missing the following two inline comments:

>>> +   spin_lock(_I(dest)->ip_lock);
>>> +   if (newlen > i_size_read(dest)) {
>>> +   i_size_write(dest, newlen);
>>> +   di->i_size = newlen;
>> di->i_size = cpu_to_le64(newlen);
>>
>>> +   }
>>> +   spin_unlock(_I(dest)->ip_lock);
>>> +
>> Add ocfs2_update_inode_fsync_trans() here? Looks this function was
>> introduced by you to improve efficiency.
>> Just want to awake your memory about this, though I don't know about the
>> details why it should be.
>>
>> Eric
Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren

Hi,

A few issues obvious to me:

On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
> Connect the new VFS clone_range, copy_range, and dedupe_range features
> to the existing reflink capability of ocfs2.  Compared to the existing
> ocfs2 reflink ioctl We have to do things a little differently to support
> the VFS semantics (we can clone subranges of a file but we don't clone
> xattrs), but the VFS ioctls are more broadly supported.

How can I test the new ocfs2 reflink (with this patch) manually? What commands 
should I
use to do xxx_range things?

>
> Signed-off-by: Darrick J. Wong 
> ---
>   fs/ocfs2/file.c |   62 -
>   fs/ocfs2/file.h |3
>   fs/ocfs2/refcounttree.c |  619 
> +++
>   fs/ocfs2/refcounttree.h |7 +
>   4 files changed, 688 insertions(+), 3 deletions(-)
>
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 000c234..d5a022d 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1667,9 +1667,9 @@ static void ocfs2_calc_trunc_pos(struct inode *inode,
>   *done = ret;
>   }
>   
> -static int ocfs2_remove_inode_range(struct inode *inode,
> - struct buffer_head *di_bh, u64 byte_start,
> - u64 byte_len)
> +int ocfs2_remove_inode_range(struct inode *inode,
> +  struct buffer_head *di_bh, u64 byte_start,
> +  u64 byte_len)
>   {
>   int ret = 0, flags = 0, done = 0, i;
>   u32 trunc_start, trunc_len, trunc_end, trunc_cpos, phys_cpos;
> @@ -2440,6 +2440,56 @@ static loff_t ocfs2_file_llseek(struct file *file, 
> loff_t offset, int whence)
>   return offset;
>   }
>   
> +static ssize_t ocfs2_file_copy_range(struct file *file_in,
> +  loff_t pos_in,
> +  struct file *file_out,
> +  loff_t pos_out,
> +  size_t len,
> +  unsigned int flags)
> +{
> + int error;
> +
> + error = ocfs2_reflink_remap_range(file_in, pos_in, file_out, pos_out,
> +   len, false);
> + if (error)
> + return error;
> + return len;
> +}
> +
> +static int ocfs2_file_clone_range(struct file *file_in,
> +   loff_t pos_in,
> +   struct file *file_out,
> +   loff_t pos_out,
> +   u64 len)
> +{
> + return ocfs2_reflink_remap_range(file_in, pos_in, file_out, pos_out,
> +  len, false);
> +}
> +
> +#define OCFS2_MAX_DEDUPE_LEN (16 * 1024 * 1024)
> +static ssize_t ocfs2_file_dedupe_range(struct file *src_file,
> +u64 loff,
> +u64 len,
> +struct file *dst_file,
> +u64 dst_loff)
> +{
> + int error;
> +
> + /*
> +  * Limit the total length we will dedupe for each operation.
> +  * This is intended to bound the total time spent in this
> +  * ioctl to something sane.
> +  */
> + if (len > OCFS2_MAX_DEDUPE_LEN)
> + len = OCFS2_MAX_DEDUPE_LEN;
> +
> + error = ocfs2_reflink_remap_range(src_file, loff, dst_file, dst_loff,
> +   len, true);
> + if (error)
> + return error;
> + return len;
> +}
> +
>   const struct inode_operations ocfs2_file_iops = {
>   .setattr= ocfs2_setattr,
>   .getattr= ocfs2_getattr,
> @@ -2479,6 +2529,9 @@ const struct file_operations ocfs2_fops = {
>   .splice_read= generic_file_splice_read,
>   .splice_write   = iter_file_splice_write,
>   .fallocate  = ocfs2_fallocate,
> + .copy_file_range = ocfs2_file_copy_range,
> + .clone_file_range = ocfs2_file_clone_range,
> + .dedupe_file_range = ocfs2_file_dedupe_range,
>   };
>   
>   const struct file_operations ocfs2_dops = {
> @@ -2524,6 +2577,9 @@ const struct file_operations ocfs2_fops_no_plocks = {
>   .splice_read= generic_file_splice_read,
>   .splice_write   = iter_file_splice_write,
>   .fallocate  = ocfs2_fallocate,
> + .copy_file_range = ocfs2_file_copy_range,
> + .clone_file_range = ocfs2_file_clone_range,
> + .dedupe_file_range = ocfs2_file_dedupe_range,
>   };
>   
>   const struct file_operations ocfs2_dops_no_plocks = {
> diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
> index e8c62f2..897fd9a 100644
> --- a/fs/ocfs2/file.h
> +++ b/fs/ocfs2/file.h
> @@ -82,4 +82,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int 
> cmd,
>   
>   int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos,
>  size_t count);
> +int ocfs2_remove_inode_range(struct inode *inode,
> +

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-10 Thread Eric Ren

Hi,

On 11/10/2016 06:49 PM, piaojun wrote:
> Hi Eric,
>
> On 2016-11-1 9:45, Eric Ren wrote:
>> Hi,
>>
>> On 10/31/2016 06:55 PM, piaojun wrote:
>>> Hi Eric,
>>>
>>> On 2016-10-19 13:19, Eric Ren wrote:
>>>> The deadlock issue happens when running discontiguous block
>>>> group testing on multiple nodes. The easier way to reproduce
>>>> is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
>>>> nodes at the same time by pssh.
>>>>
>>>> This is indeed another deadlock caused by: commit 743b5f1434f5
>>>> ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
>>>> had been explained well by Tariq Saeed in this thread:
>>>>
>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html
>>>>
>>>> For this case, the ocfs2_inode_lock() is misused recursively as below:
>>>>
>>>> do_sys_open
>>>>do_filp_open
>>>> path_openat
>>>>  may_open
>>>>   inode_permission
>>>>__inode_permission
>>>> ocfs2_permission  <== ocfs2_inode_lock()
>>>>  generic_permission
>>>>   get_acl
>>>>ocfs2_iop_get_acl  <== ocfs2_inode_lock()
>>>> ocfs2_inode_lock_full_nested <= deadlock if a remote EX 
>>>> request
>>> Do you mean another node wants to get ex of the inode? or another process?
>> Remote EX request means "another node wants to get ex of the inode";-)
>>
>> Eric
> If another node wants to get ex, it will get blocked as this node has
> got pr. Why will the ex request make this node get blocked? Expect your
> detailed description.
Did you look at this link I mentioned above?

OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST 
(ocfs2_generic_handle_bast) when 
downconvert is needed
on behalf of remote lock request.

The recursive cluster lock (the second one) will be blocked in 
__ocfs2_cluster_lock() 
because of OCFS2_LOCK_BLOCKED.
But the downconvert cannot be done, why? because there is no chance for the 
first cluster 
lock on this node to be unlocked -
we blocked ourselves in the code path.

Eric
>
> thanks,
> Jun
>>>> comes between two ocfs2_inode_lock()
>>>>
>>>> Fix by checking if the cluster lock has been acquired aready in the 
>>>> call-chain
>>>> path.
>>>>
>>>> Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in 
>>>> ocfs2_iop_set/get_acl()")
>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>> ---
>>>>fs/ocfs2/acl.c | 39 +++
>>>>1 file changed, 27 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>> index bed1fcb..7e3544e 100644
>>>> --- a/fs/ocfs2/acl.c
>>>> +++ b/fs/ocfs2/acl.c
>>>> @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
>>>>int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int 
>>>> type)
>>>>{
>>>>struct buffer_head *bh = NULL;
>>>> +struct ocfs2_holder *oh;
>>>> +struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
>>>>int status = 0;
>>>>-status = ocfs2_inode_lock(inode, , 1);
>>>> -if (status < 0) {
>>>> -if (status != -ENOENT)
>>>> -mlog_errno(status);
>>>> -return status;
>>>> +oh = ocfs2_is_locked_by_me(lockres);
>>>> +if (!oh) {
>>>> +status = ocfs2_inode_lock(inode, , 1);
>>>> +if (status < 0) {
>>>> +if (status != -ENOENT)
>>>> +mlog_errno(status);
>>>> +return status;
>>>> +}
>>>>}
>>>> +
>>>>status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>>> -ocfs2_inode_unlock(inode, 1);
>>>> +
>>>> +if (!oh)
>>>> +ocfs2_inode_unlock(inode, 1);
>>>>brelse(bh);
>>>>return status;
>>>>}
>>>> @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>>>> *inode, int type)
>>>>struct ocfs2_super *osb;
>>>>struct buffer_head *di_bh = NULL;
>>>>

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-08 Thread Eric Ren


Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..7e3544e 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  {
struct buffer_head *bh = NULL;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
int status = 0;
  
-	status = ocfs2_inode_lock(inode, , 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   status = ocfs2_inode_lock(inode, , 1);
+   if (status < 0) {
+   if (status != -ENOENT)
+   mlog_errno(status);
+   return status;
+   }
}

This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.

So, we may need another function something like ocfs2_inode_getbh():
 if (!oh)
ocfs2_inode_lock();
   else
   ocfs2_inode_getbh();

Eric

+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!oh)
+   ocfs2_inode_unlock(inode, 1);
brelse(bh);
return status;
  }
@@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
int ret;
  
  	osb = OCFS2_SB(inode->i_sb);

if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
+
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   ret = ocfs2_inode_lock(inode, _bh, 0);
+   if (ret < 0) {
+   if (ret != -ENOENT)
+   mlog_errno(ret);
+   return ERR_PTR(ret);
+   }
}
  
  	acl = ocfs2_get_acl_nolock(inode, type, di_bh);
  
-	ocfs2_inode_unlock(inode, 0);

+   if (!oh)
+   ocfs2_inode_unlock(inode, 0);
brelse(di_bh);
return acl;
  }



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-11-08 Thread Eric Ren


Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are three solutions I can think of.  The first one is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code. Frown on this.

The second one is, what I am trying now, to keep track of the processes who
lock/unlock a cluster lock by the following draft patches. But, I quickly
find out that a cluster locking which has been taken by processA can be unlocked
by processB. For example, systemfiles like journal: is locked during mout, 
and
unlocked during umount.

We can avoid the problem above by:

1) not keeping track of system file inode:

   if (!(OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE)) {
   
  }

2) only keeping track of inode metadata lockres:

   OCFS2_I(inode)->ip_inode_lockres;

because inode open lockres can also be get/release by different processes.

Eric


The thrid one is to revert that problematic commit! It looks like get/set_acl()
are always been called by other vfs callback like ocfs2_permission(). I think
we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)

Hope for your input to solve this problem;-)

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] what is g_f_a_w_n() short for? thanks

2016-11-07 Thread Eric Ren


Hello Mark,

There is a piece of comment that confused me, please correct me:

https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L2274
```
ocfs2_file_write_iter() {
...
 /*
 * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
 * function pointer which is called when o_direct io completes so that
 * it can unlock our rw lock.
...
}
```
Should g_f_a_w_n() be g_f_a_w_i() instead? Because grep only hits 
(__)generic_file_write_iter() by pattern.


Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-31 Thread Eric Ren

Hi,

On 10/31/2016 06:55 PM, piaojun wrote:
> Hi Eric,
>
> On 2016-10-19 13:19, Eric Ren wrote:
>> The deadlock issue happens when running discontiguous block
>> group testing on multiple nodes. The easier way to reproduce
>> is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
>> nodes at the same time by pssh.
>>
>> This is indeed another deadlock caused by: commit 743b5f1434f5
>> ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
>> had been explained well by Tariq Saeed in this thread:
>>
>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html
>>
>> For this case, the ocfs2_inode_lock() is misused recursively as below:
>>
>> do_sys_open
>>   do_filp_open
>>path_openat
>> may_open
>>  inode_permission
>>   __inode_permission
>>ocfs2_permission  <== ocfs2_inode_lock()
>> generic_permission
>>  get_acl
>>   ocfs2_iop_get_acl  <== ocfs2_inode_lock()
>>ocfs2_inode_lock_full_nested <= deadlock if a remote EX 
>> request
> Do you mean another node wants to get ex of the inode? or another process?
Remote EX request means "another node wants to get ex of the inode";-)

Eric
>> comes between two ocfs2_inode_lock()
>>
>> Fix by checking if the cluster lock has been acquired aready in the 
>> call-chain
>> path.
>>
>> Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in 
>> ocfs2_iop_set/get_acl()")
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/acl.c | 39 +++
>>   1 file changed, 27 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..7e3544e 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
>>   int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>> +struct ocfs2_holder *oh;
>> +struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
>>  int status = 0;
>>   
>> -status = ocfs2_inode_lock(inode, , 1);
>> -if (status < 0) {
>> -if (status != -ENOENT)
>> -mlog_errno(status);
>> -return status;
>> +oh = ocfs2_is_locked_by_me(lockres);
>> +if (!oh) {
>> +status = ocfs2_inode_lock(inode, , 1);
>> +if (status < 0) {
>> +if (status != -ENOENT)
>> +mlog_errno(status);
>> +return status;
>> +}
>>  }
>> +
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!oh)
>> +ocfs2_inode_unlock(inode, 1);
>>  brelse(bh);
>>  return status;
>>   }
>> @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct ocfs2_super *osb;
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>> +struct ocfs2_holder *oh;
>> +struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
>>  int ret;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, _bh, 0);
>> -if (ret < 0) {
>> -if (ret != -ENOENT)
>> -mlog_errno(ret);
>> -return ERR_PTR(ret);
>> +
>> +oh = ocfs2_is_locked_by_me(lockres);
>> +if (!oh) {
>> +ret = ocfs2_inode_lock(inode, _bh, 0);
>> +if (ret < 0) {
>> +if (ret != -ENOENT)
>> +mlog_errno(ret);
>> +return ERR_PTR(ret);
>> +}
>>  }
>>   
>>  acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   
>> -ocfs2_inode_unlock(inode, 0);
>> +if (!oh)
>> +ocfs2_inode_unlock(inode, 0);
>>  brelse(di_bh);
>>  return acl;
>>   }
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-28 Thread Eric Ren


Hi Christoph!

Thanks for your attention.

On 10/28/2016 02:20 PM, Christoph Hellwig wrote:

Hi Eric,

I've added linux-fsdevel to the cc list as this should get a bit
broader attention.

On Wed, Oct 19, 2016 at 01:19:40PM +0800, Eric Ren wrote:

Mostly, we can avoid recursive locking by writing code carefully. However, as
the deadlock issues have proved out, it's very hard to handle the routines
that are called directly by vfs. For instance:

 const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
 };


ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

What do you actually protect in ocfs2_permission?  It's a trivial
wrapper around generic_permission which just looks at the VFS inode.

Yes, it is.

https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L1321
---
ocfs2_permission
  ocfs2_inode_lock()
generic_permission
ocfs2_inode_unlock


I think the right fix is to remove ocfs2_permission entirely and use
the default VFS implementation.  That both solves your locking problem,
and it will also get you RCU lookup instead of dropping out of
RCU mode all the time.
But, from my understanding, the pair of ocfs2_inode_lock/unlock() is used to prevent any 
concurrent changes
to the permission of the inode on the other cluster node while we are checking on it. It's a 
common  case for cluster

filesystem, such as GFS2: 
https://github.com/torvalds/linux/blob/master/fs/gfs2/inode.c#L1777

Thanks for your suggestion again!
Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] BUG_ON(le64_to_cpu(fe->i_size) != i_size_read(inode)) is triggered in ocfs2_truncate_file()

2016-10-27 Thread Eric Ren


Hi all,

Any one ever see this BUG_ON() assertion 
(https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L460)
triggered? (paste log message at the end). I cannot reproduced it so far.

fallocate with FALLOC_FL_KEEP_SIZE flag (man 2 fallocate) can result in  
"le64_to_cpu(fe->i_size) != i_size_read(inode)"
as explained in this commit message:

```
commit d62e74be1270c89fbaf7aada8218bfdf62d00a58
Author: Younger Liu >
Date:   Mon Feb 10 14:25:51 2014 -0800

ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size

The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
  FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again.  but the disk free
  space is not changed back.  This is a real bug that be fixed in this
  patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.
```

I was thinking to remove this BUG_ON() assertion. But, the following steps 
cannot
trigger it:

$dd if=/dev/zero of=/mnt/ocfs2/test2 bs=512 count=1
$fallocate --keep-size --length 102400 /mnt/ocfs2/test2
$truncate --size=512 /mnt/ocfs2/test2

I'm wondering why the testing result goes against what I expect? Finally, I 
find that:

ocfs2_inode_lock(inode, bh, 1) in ocfs2_setattr() will update the inode->i_size 
filed from LVB value
or ocfs2_dinode from disk.
---
vfs_truncate
 do_truncate
   inode_lock()
   notify_change
ocfs2_setattr
 ocfs2_rw_lock()
 ocfs2_inode_lock()
  ocfs2_truncate_file
 ocfs2_rw_lock()
 ocfs2_inode_unlock
   inode_unlock()

---
ocfs2_inode_lock()
 ocfs2_inode_lock_full_nested() ocfs2_inode_lock_update() ==> 
https://github.com/torvalds/linux/blob/master/fs/ocfs2/dlmglue.c#L2204 if 
(ocfs2_meta_lvb_is_trustable()) ocfs2_refresh_inode_from_lvb() else { 
ocfs2_read_inode_block() ocfs2_refresh_inode() } Since we update inode->i_size under the protection of thus locks, how was the assertion triggered?

As always, any comments and suggestion will be appreciated!

Thanks,
Eric

Log message:
```
kernel-default-3.0.101-80
ocfs2-kmp-default-1.6_3.0.101_68-0.25.6

[239098.534619] (dsmc,8244,6):ocfs2_truncate_file:466 ERROR: bug expression: 
le64_to_cpu(fe->i_size) != i_size_read(inode)
[239098.534633] (dsmc,8244,6):ocfs2_truncate_file:466 ERROR: Inode 10812, inode 
i_size = 677 != di i_size = 738, i_flags = 0x1

...

[239098.534724] kernel BUG at 
/usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!

PID: 8244   TASK: 8801dfb862c0  CPU: 6   COMMAND: "dsmc"
 #0 [8801f59618d0] machine_kexec at 8102c54e
 #1 [8801f5961920] crash_kexec at 810ae858
 #2 [8801f59619f0] oops_end at 8146b558
 #3 [8801f5961a10] do_invalid_op at 810036c4
 #4 [8801f5961ab0] invalid_op at 81472d5b
[exception RIP: ocfs2_truncate_file+165]
RIP: a0929ba5  RSP: 8801f5961b68  RFLAGS: 00010296
RAX: 0085  RBX: 8801dfb862c0  RCX: 7335
RDX:   RSI: 0007  RDI: 0246
RBP: 1000   R8: 81da3ac0   R9: 
R10: 0003  R11:   R12: 8801cbb864f8
R13: 8801dfb86930  R14: 8801bf5ad000  R15: 02a5
ORIG_RAX:   CS: 0010  SS: 0018
 #5 [8801f5961bd0] ocfs2_setattr at a092c43e [ocfs2]
 #6 [8801f5961c90] notify_change at 8117a0cf
 #7 [8801f5961cf0] do_truncate at 8115e177
 #8 [8801f5961d60] do_last at 8116d0c3
 #9 [8801f5961dc0] path_openat at 8116df29
#10 [8801f5961e50] do_filp_open at 8116e39c
#11 [8801f5961f20] do_sys_open at 8115eabf
#12 [8801f5961f80] system_call_fastpath at 81471d72
RIP: 7f7da23f67b0  RSP: 7fffae4a3470  RFLAGS: 00010246
RAX: 0002  RBX: 81471d72  RCX: 
RDX: 01b6  RSI: 0241  RDI: 00cff3df
RBP: 7fffae4a4ab0   R8: 0004   R9: 0001
R10: 0241  R11: 0246  R12: 
R13: 00d1d2f0  R14: 00930a68  R15: 0004
ORIG_RAX: 0002  CS: 0033  SS: 002b

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-24 Thread Eric Ren

Hi all,


On 10/19/2016 01:19 PM, Eric Ren wrote:
> The thrid one is to revert that problematic commit! It looks like 
> get/set_acl()
> are always been called by other vfs callback like ocfs2_permission(). I think
> we can do this if it's true, right? Anyway, I'll try to work out if it's 
> true;-)
After looking into more, I get to know get/set_acl() can be invoked 
directly from vfs,
for instance:

fsetxattr()
  setxattr()
vfs_setxattr()
  __vfs_setxattr()
handler->set(handler, dentry, inode, name, value, size) // 
posix_acl_access_xattr_handler.set = posix_acl_xattr_set
  posix_acl_xattr_set()
 set_posix_acl()
   inode->i_op->set_acl()


So, this problem looks really hard to solve:-/

Eric

>
> Hope for your input to solve this problem;-)
>
> Thanks,
> Eric
>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-19 Thread Eric Ren

Hi Junxiao,

On 10/19/2016 02:57 PM, Junxiao Bi wrote:
> I had ever implemented generic recursive locking support, please check the 
> patch at 
> https://oss.oracle.com/pipermail/ocfs2-devel/2015-December/011408.html 
>  , 
> the issue that locking and unlocking in different processes was considered. 
> But it was rejected by Mark as recursive locking is not allowed in 
> ocfs2/kernel .
Yes, I can remember it. The different point is that I just want to have a 
function to check 
recursive locking
than supporting recursive locking;-)

Honestly, I cannot understand your patch thoroughly until now.  Back to that 
time, it's the 
complication of your patch
that concerns me. Besides, looks like the "PR+EX" + "non-block" request cannot 
be handled well?
>> The thrid one is to revert that problematic commit! It looks like 
>> get/set_acl()
>> are always been called by other vfs callback like ocfs2_permission(). I think
>> we can do this if it's true, right? Anyway, I'll try to work out if it's 
>> true;-)
> Not sure whether get/set_acl() will be called directly by vfs. Even not now, 
> we can’t make sure that in the future. So revert it may be a little risky. 
> But if refactor is complicated, then this maybe the only way we can do.
Agree. Let's investigate more into it;-)

Thanks,
Junxiao
>
> Thanks,
> Junxiao.
>> Hope for your input to solve this problem;-)
>>
>> Thanks,
>> Eric
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-18 Thread Eric Ren

The deadlock issue happens when running discontiguous block
group testing on multiple nodes. The easier way to reproduce
is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
nodes at the same time by pssh.

This is indeed another deadlock caused by: commit 743b5f1434f5
("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
had been explained well by Tariq Saeed in this thread:

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

For this case, the ocfs2_inode_lock() is misused recursively as below:

do_sys_open
 do_filp_open
  path_openat
   may_open
inode_permission
 __inode_permission
  ocfs2_permission  <== ocfs2_inode_lock()
   generic_permission
get_acl
 ocfs2_iop_get_acl  <== ocfs2_inode_lock()
  ocfs2_inode_lock_full_nested <= deadlock if a remote EX request
comes between two ocfs2_inode_lock()

Fix by checking if the cluster lock has been acquired aready in the call-chain
path.

Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/acl.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..7e3544e 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
int status = 0;
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   status = ocfs2_inode_lock(inode, , 1);
+   if (status < 0) {
+   if (status != -ENOENT)
+   mlog_errno(status);
+   return status;
+   }
}
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!oh)
+   ocfs2_inode_unlock(inode, 1);
brelse(bh);
return status;
 }
@@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = _I(inode)->ip_inode_lockres;
int ret;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
+
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   ret = ocfs2_inode_lock(inode, _bh, 0);
+   if (ret < 0) {
+   if (ret != -ENOENT)
+   mlog_errno(ret);
+   return ERR_PTR(ret);
+   }
}
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!oh)
+   ocfs2_inode_unlock(inode, 0);
brelse(di_bh);
return acl;
 }
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [DRAFT 1/2] ocfs2/dlmglue: keep track of the processes who take/put a cluster lock

2016-10-18 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are two solutions I can think of. One way is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code.

Another way is to keep track of the processes who lock/unlock a cluster
lock. This patch provides ocfs2_is_locked_by_me() for process to check
if the cluster lock has been requested before. This is now only used for
avoiding recursive locking, though it also can help debug cluster locking
issue. Unfortunately, this may incur some performance lost.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/dlmglue.c | 60 ++
 fs/ocfs2/dlmglue.h | 13 
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 74 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..9f91884 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,48 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_holder *oh = kmalloc(sizeof(struct ocfs2_holder), 
GFP_KERNEL);
+
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_lockres = lockres;
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+   kfree(oh);
+}
+
+struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_holder *oh;
+   struct pid *pid;
+
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid)
+   goto out;
+   }
+   oh = NULL;
+out:
+   spin_unlock(>l_lock);
+
+   return oh;
+}
+
 static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 int level)
 {
@@ -1392,6 +1435,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
int noqueue_attempted = 0;
int dlm_locked = 0;
int kick_dc = 0;
+   struct ocfs2_holder *oh;
 
if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) {
mlog_errno(-EINVAL);
@@ -1403,6 +1447,14 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
lkm_flags |= DLM_LKF_VALBLK;
 
+   /* This block is just used to check recursive locking now */
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (unlikely(oh))
+   mlog_bug_on_msg(1, "PID(%d) locks on lockres(%s) recursively\n",
+   pid_nr(oh->oh_owner_pid), lockres->l_name);
+   else
+   ocfs2_add_holder(lockres);
+
 again:
wait = 0;
 
@@ -1596,6 +1648,14 @@ static void __ocfs2_cluster_unlock(struct ocfs2_super 
*osb,
   unsigned long caller_ip)
 {
unsigned long flags;
+   struct ocfs2_holder *oh = ocfs2_is_locked_by_me(lockres);
+
+   /* This block is just used to check recursive locking now */
+   if (unlikely(!oh))
+   mlog_bug_on_msg(1, "PID(%d) unlock lockres(%s) unnecessarily\n",
+   pid_nr(task_pid(current)), lockres->l_name);
+   else
+   ocfs2_remove_holder(lockres, oh);
 
spin_lock_irqsave(>l_lock, flags);
ocfs2_dec_holders(lockres, level);
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index d293a22..3b1d4e7 100644

[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-18 Thread Eric Ren

Hi all!

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in another deadlock as we have discussed in the recent thread:
https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html

Before this one, a similiar deadlock has been fixed by Junxiao:
commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock")
commit 5ee0fbd50fdf ("ocfs2: revert using ocfs2_acl_chmod to avoid inode 
cluster lock hang")

We are in the situation that we have to avoid recursive cluster locking, but
there is no way to check if a cluster lock has been taken by a precess already.

Mostly, we can avoid recursive locking by writing code carefully. However, as
the deadlock issues have proved out, it's very hard to handle the routines
that are called directly by vfs. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};


ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are three solutions I can think of.  The first one is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code. Frown on this.

The second one is, what I am trying now, to keep track of the processes who
lock/unlock a cluster lock by the following draft patches. But, I quickly
find out that a cluster locking which has been taken by processA can be unlocked
by processB. For example, systemfiles like journal: is locked during mout, 
and
unlocked during umount. 

The thrid one is to revert that problematic commit! It looks like get/set_acl()
are always been called by other vfs callback like ocfs2_permission(). I think
we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)

Hope for your input to solve this problem;-)

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-14 Thread Eric Ren

Hello Guys,

This is indeed another deadlock caused by:

Commit 743b5f1434f5 ("ocfs2: take inode lock in 
ocfs2_iop_set/get_acl()")

The reason had been explained well by Tariq Saeed in this thread:

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

For this case, the ocfs2_inode_lock() is misused recursively as below:

do_sys_open
do_filp_open
  path_openat
may_open
   inode_permission
  __inode_permission
 ocfs2_permission  <== ocfs2_inode_lock()
generic_permission
get_acl
 ocfs2_iop_get_acl  <== ocfs2_inode_lock()
  ocfs2_inode_lock_full_nested <= deadlock 
if a remote 
EX request comes between two ocfs2_inode_lock()

Welcome any thoughts to deal with this issue!

Thanks,
Eric

On 10/12/2016 09:23 AM, Eric Ren wrote:
> Hi Junxiao,
>
>> Hi Eric,
>>
>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> As the subject, the testing hung there on a kernel without your patches:
>>>
>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>>> and
>>> "ocfs2: fix posix_acl_create deadlock"
>>>
>>> The stack trace is:
>>> ```
>>> ocfs2cts1:~ # pstree -pl 24133
>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>
>>> ocfs2cts1:~ # pgrep -a chmod
>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>
>>> ocfs2cts1:~ # cat /proc/15232/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>> [] iterate_dir+0x9c/0x110
>>> [] SyS_getdents+0x83/0xf0
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>> ```
>>>
>>> Do you think this issue can be fixed by your patches?
>> Looks not. Those two patches are to fix recursive locking deadlock. But
>> from above call trace, there is no recursive lock.
> Sorry, the call trace on another node was missing.  Here it is:
>
> ocfs2cts2:~ # pstree -lp
> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>
> ocfs2cts2:~ # cat /proc/4865/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
> [] generic_permission+0x166/0x1c0
> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
> [] __inode_permission+0x56/0xb0
> [] link_path_walk+0x29a/0x560
> [] path_lookupat+0x7f/0x110
> [] filename_lookup+0x9c/0x150
> [] SyS_fchmodat+0x33/0x90
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
>
> Thanks,
> Eric
>
>
>> Thanks,
>> Junxiao.
>>> I will try your patches later, but I am little worried the possibility
>>> of reproduction may not be 100%.
>>> So ask you to confirm;-)
>>>
>>> Eric
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren

Hi,

On 10/12/2016 05:45 PM, Junxiao Bi wrote:
> On 10/12/2016 05:34 PM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
>>> On 10/12/2016 10:36 AM, Eric Ren wrote:
>>>> Hi,
>>>>
>>>> When backporting those patches, I find that they are already in our
>>>> product kernel, maybe
>>>> via "stable kernel" policy, although our product kernel is 4.4 while the
>>>> patches were merged
>>>> into 4.6.
>>>>
>>>> Seems it's another deadlock that happens when doing `chmod -R 777
>>>> /mnt/ocfs2`
>>>> among mutilple nodes at the same time.
>>> Yes, but i just finish running ocfs2 full test on linux next-20161006
>>> and didn't find any issue.
>> Thanks a lot, really!
>>
>> 1. What's the size of your ocfs2 disk? My disk is 200G.
> 212G
>
>> 2. Did you run discontig block group test with multiple nodes? with this
>> option:
> Yes, but i don't know what that option is.
>
>>  " -m ocfs2cts1,ocfs2cts2"

ocfs2ctsX is the host name of cluster nodes. Discontig bg testcase will run in 
local mode if 
without
this option.

Thanks
Eric

>>
>> 3. Then, I am using fs/dlm. That's a different point.
> Yes, that deserve a look since your issue is cluster locking hung.
>
> Thanks,
> Junxiao.
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
>>>> Thanks,
>>>> Eric
>>>> On 10/12/2016 09:23 AM, Eric Ren wrote:
>>>>> Hi Junxiao,
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>>>>>> Hi Junxiao,
>>>>>>>
>>>>>>> As the subject, the testing hung there on a kernel without your
>>>>>>> patches:
>>>>>>>
>>>>>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock
>>>>>>> hang"
>>>>>>> and
>>>>>>> "ocfs2: fix posix_acl_create deadlock"
>>>>>>>
>>>>>>> The stack trace is:
>>>>>>> ```
>>>>>>> ocfs2cts1:~ # pstree -pl 24133
>>>>>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ocfs2cts1:~ # pgrep -a chmod
>>>>>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>>>>>
>>>>>>> ocfs2cts1:~ # cat /proc/15232/stack
>>>>>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>>>>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>>>>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>>>>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>>>>>> [] iterate_dir+0x9c/0x110
>>>>>>> [] SyS_getdents+0x83/0xf0
>>>>>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>>>>>> [] 0x
>>>>>>> ```
>>>>>>>
>>>>>>> Do you think this issue can be fixed by your patches?
>>>>>> Looks not. Those two patches are to fix recursive locking deadlock.
>>>>>> But
>>>>>> from above call trace, there is no recursive lock.
>>>>> Sorry, the call trace on another node was missing.  Here it is:
>>>>>
>>>>> ocfs2cts2:~ # pstree -lp
>>>>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>>>>
>>>>>
>>>>>
>>>>> ocfs2cts2:~ # cat /proc/4865/stack
>>>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>>>> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>>>>> [] generic_permission+0x166/0x1c0
>>>>> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
>>>>> [] __inode_permission+0x56/0xb0
>>>>> [] link_path_walk+0x29a/0x560
>>>>> [] path_lookupat+0x7f/0x110
>>>>> [] filename_lookup+0x9c/0x150
>>>>> [] SyS_fchmodat+0x33/0x90
>>>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>>>> [] 0x
>>>>>
>>>>> Thanks,
>>>>> Eric
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>> I will try your patches later, but I am little worried the
>>>>>>> possibility
>>>>>>> of reproduction may not be 100%.
>>>>>>> So ask you to confirm;-)
>>>>>>>
>>>>>>> Eric
>>>>> ___
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel@oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test passed on linux-next/next-20161006

2016-10-12 Thread Eric Ren

Hi Junxiao,

On 10/12/2016 02:54 PM, Junxiao Bi wrote:
> Hi all,
>
> I just finished a full ocfs2 test(single/multiple/discontig) on
> linux-next/next-20161006. All test case passed. That's a good sign of
> quality. Thank you for your effort.

Great! Thanks for your efforts!

Eric

>
> Thanks,
> Junxiao.
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren

Hi Junxiao,

On 10/12/2016 02:47 PM, Junxiao Bi wrote:
> On 10/12/2016 10:36 AM, Eric Ren wrote:
>> Hi,
>>
>> When backporting those patches, I find that they are already in our
>> product kernel, maybe
>> via "stable kernel" policy, although our product kernel is 4.4 while the
>> patches were merged
>> into 4.6.
>>
>> Seems it's another deadlock that happens when doing `chmod -R 777
>> /mnt/ocfs2`
>> among mutilple nodes at the same time.
> Yes, but i just finish running ocfs2 full test on linux next-20161006
> and didn't find any issue.

Thanks a lot, really!

1. What's the size of your ocfs2 disk? My disk is 200G.

2. Did you run discontig block group test with multiple nodes? with this option:

 " -m ocfs2cts1,ocfs2cts2"

3. Then, I am using fs/dlm. That's a different point.

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>> Thanks,
>> Eric
>> On 10/12/2016 09:23 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>>> Hi Eric,
>>>>
>>>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>>>> Hi Junxiao,
>>>>>
>>>>> As the subject, the testing hung there on a kernel without your
>>>>> patches:
>>>>>
>>>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>>>>> and
>>>>> "ocfs2: fix posix_acl_create deadlock"
>>>>>
>>>>> The stack trace is:
>>>>> ```
>>>>> ocfs2cts1:~ # pstree -pl 24133
>>>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>>>
>>>>>
>>>>> ocfs2cts1:~ # pgrep -a chmod
>>>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>>>
>>>>> ocfs2cts1:~ # cat /proc/15232/stack
>>>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>>>> [] iterate_dir+0x9c/0x110
>>>>> [] SyS_getdents+0x83/0xf0
>>>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>>>> [] 0x
>>>>> ```
>>>>>
>>>>> Do you think this issue can be fixed by your patches?
>>>> Looks not. Those two patches are to fix recursive locking deadlock. But
>>>> from above call trace, there is no recursive lock.
>>> Sorry, the call trace on another node was missing.  Here it is:
>>>
>>> ocfs2cts2:~ # pstree -lp
>>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>>
>>>
>>> ocfs2cts2:~ # cat /proc/4865/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>>> [] generic_permission+0x166/0x1c0
>>> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
>>> [] __inode_permission+0x56/0xb0
>>> [] link_path_walk+0x29a/0x560
>>> [] path_lookupat+0x7f/0x110
>>> [] filename_lookup+0x9c/0x150
>>> [] SyS_fchmodat+0x33/0x90
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>>> Thanks,
>>>> Junxiao.
>>>>> I will try your patches later, but I am little worried the possibility
>>>>> of reproduction may not be 100%.
>>>>> So ask you to confirm;-)
>>>>>
>>>>> Eric
>>> ___
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren

Hi Junxiao,

On 10/11/2016 10:58 AM, Junxiao Bi wrote:
>> Do you think this issue can be fixed by your patches?
> Looks not. Those two patches are to fix recursive locking deadlock. But
> from above call trace, there is no recursive lock.
OK, thanks a lot!

Eric
>
> Thanks,
> Junxiao.
>> I will try your patches later, but I am little worried the possibility
>> of reproduction may not be 100%.
>> So ask you to confirm;-)
>>
>> Eric
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren


Hi Junxiao,

As the subject, the testing hung there on a kernel without your patches:

"ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
and
"ocfs2: fix posix_acl_create deadlock"

The stack trace is:
```
ocfs2cts1:~ # pstree -pl 24133
discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)

ocfs2cts1:~ # pgrep -a chmod
15232 /bin/chmod -R 777 /mnt/ocfs2

ocfs2cts1:~ # cat /proc/15232/stack
[] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
[] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
[] ocfs2_readdir+0x41/0x1b0 [ocfs2]
[] iterate_dir+0x9c/0x110
[] SyS_getdents+0x83/0xf0
[] entry_SYSCALL_64_fastpath+0x12/0x6d
[] 0x
```

Do you think this issue can be fixed by your patches?

I will try your patches later, but I am little worried the possibility of reproduction may 
not be 100%.

So ask you to confirm;-)

Eric
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix undefined struct variable in inode.h

2016-09-21 Thread Eric Ren


Hi,

On 09/21/2016 09:38 AM, Joseph Qi wrote:

The extern struct variable ocfs2_inode_cache is not defined. It meant to
use ocfs2_inode_cachep defined in super.c, I think. Fortunately it is
not used anywhere now, so no impact actually. Clean it up to fix this
mistake.

Signed-off-by: Joseph Qi <joseph...@huawei.com>


LGTM
Reviewed-by: Eric Ren<z...@suse.com>


---
  fs/ocfs2/inode.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index 50cc550..5af68fc 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -123,8 +123,6 @@ static inline struct ocfs2_inode_info *OCFS2_I(struct inode 
*inode)
  #define INODE_JOURNAL(i) (OCFS2_I(i)->ip_flags & OCFS2_INODE_JOURNAL)
  #define SET_INODE_JOURNAL(i) (OCFS2_I(i)->ip_flags |= OCFS2_INODE_JOURNAL)

-extern struct kmem_cache *ocfs2_inode_cache;
-
  extern const struct address_space_operations ocfs2_aops;
  extern const struct ocfs2_caching_operations ocfs2_inode_caching_ops;



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Fix double put of recount tree in ocfs2_lock_refcount_tree()

2016-09-17 Thread Eric Ren

Hi,

On 09/16/2016 08:06 AM, Ashish Samant wrote:
> In ocfs2_lock_refcount_tree, if ocfs2_read_refcount_block() returns error,
> we do ocfs2_refcount_tree_put twice (once in ocfs2_unlock_refcount_tree
> and once outside it), thereby reducing the refcount of the refcount tree
> twice, but we dont delete the tree in this case. This will make refcnt
> of the tree = 0 and the ocfs2_refcount_tree_put will eventually call
> ocfs2_mark_lockres_freeing, setting OCFS2_LOCK_FREEING for the
> refcount_tree->rf_lockres.
>
> The error returned by ocfs2_read_refcount_block is propagated all the way
> back and for next iteration of write, ocfs2_lock_refcount_tree gets the
> same tree back from ocfs2_get_refcount_tree because we havent deleted the
> tree. Now we have the same tree, but OCFS2_LOCK_FREEING is set for
> rf_lockres and eventually, when _ocfs2_lock_refcount_tree is called in
> this iteration, BUG_ON( __ocfs2_cluster_lock:1395 ERROR: Cluster lock
> called on freeing lockres T0386019775b08d! flags 0x81) is
> triggerred.
>
> Call stack:
>
> (loop16,11155,0):ocfs2_lock_refcount_tree:482 ERROR: status = -5
> (loop16,11155,0):ocfs2_refcount_cow_hunk:3497 ERROR: status = -5
> (loop16,11155,0):ocfs2_refcount_cow:3560 ERROR: status = -5
> (loop16,11155,0):ocfs2_prepare_inode_for_refcount:2111 ERROR: status = -5
> (loop16,11155,0):ocfs2_prepare_inode_for_write:2190 ERROR: status = -5
> (loop16,11155,0):ocfs2_file_write_iter:2331 ERROR: status = -5
> (loop16,11155,0):__ocfs2_cluster_lock:1395 ERROR: bug expression:
> lockres->l_flags & OCFS2_LOCK_FREEING
>
> (loop16,11155,0):__ocfs2_cluster_lock:1395 ERROR: Cluster lock called on
> freeing lockres T0386019775b08d! flags 0x81
>
> [ cut here ]
> kernel BUG at fs/ocfs2/dlmglue.c:1395!
>
> invalid opcode:  [#1] SMP  CPU 0
> Modules linked in: tun ocfs2 jbd2 xen_blkback xen_netback xen_gntdev ..
> sd_mod crc_t10dif ext3 jbd mbcache
>
> RIP: e030:[]  []
> __ocfs2_cluster_lock+0x31c/0x740 [ocfs2]
> RSP: e02b:88017c0138a0  EFLAGS: 00010086
> Process loop16 (pid: 11155, threadinfo 88017c01, task
> 8801b5374300)
> Stack:
>   88017bd25880 0081 00017c013920 88017c013960
>   001d 0001 88017bd258b4 
>   880172006000 a07fa410 88017bd202b4 
> Call Trace:
>   [] ocfs2_refcount_lock+0xae/0x130 [ocfs2]
>   [] ? __ocfs2_lock_refcount_tree+0x29/0xe0 [ocfs2]
>   [] ? _raw_spin_lock+0xe/0x20
>   [] __ocfs2_lock_refcount_tree+0x29/0xe0 [ocfs2]
>   [] ocfs2_lock_refcount_tree+0xdd/0x320 [ocfs2]
>   [] ocfs2_refcount_cow_hunk+0x1cb/0x440 [ocfs2]
>   [] ocfs2_refcount_cow+0xa9/0x1d0 [ocfs2]
>   [] ? ocfs2_prepare_inode_for_refcount+0x67/0x200 [ocfs2]
>   [] ocfs2_prepare_inode_for_refcount+0x115/0x200 [ocfs2]
>   [] ? ocfs2_inode_unlock+0xd4/0x140 [ocfs2]
>   [] ocfs2_prepare_inode_for_write+0x33b/0x470 [ocfs2]
>   [] ? ocfs2_rw_lock+0x80/0x190 [ocfs2]
>   [] ocfs2_file_write_iter+0x220/0x8c0 [ocfs2]
>   [] ? mempool_free_slab+0x17/0x20
>   [] ? bio_free+0x61/0x70
>   [] ? aio_kernel_free+0xe/0x10
>   [] aio_write_iter+0x2e/0x30
>
> Fix this by avoiding the second call to ocfs2_refcount_tree_put()
>
> Signed-off-by: Ashish Samant <ashish.sam...@oracle.com>
LGTM
Reviewed-by: Eric Ren <z...@suse.com>
> ---
>   fs/ocfs2/refcounttree.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index 92bbe93..a9d4102 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -478,7 +478,6 @@ again:
>   if (ret) {
>   mlog_errno(ret);
>   ocfs2_unlock_refcount_tree(osb, tree, rw);
> - ocfs2_refcount_tree_put(tree);
>   goto out;
>   }
>   



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2 RESEND] ocfs2: fix double unlock in case retry after free truncate log

2016-09-17 Thread Eric Ren

Hi,

On 09/14/2016 05:32 PM, Joseph Qi wrote:
> If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
> free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
> will lock/unlock global bitmap inode, we have to unlock it before
> calling this function. But when retry reserve and it fails with no
> global bitmap inode lock taken, it will unlock again in error handling
> branch and BUG.
> This issue also exists if no need retry and then ocfs2_inode_lock fails.
> So fix it.
>
> Changes since v1:
> Use ret instead of status to avoid return value overwritten issue.
>
> Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
> truncate log"
> Signed-off-by: Joseph Qi <joseph...@huawei.com>
> Signed-off-by: Jiufei Xue <xuejiu...@huawei.com>
LGTM
Reviewed-by: Eric Ren <z...@suse.com>
> ---
>   fs/ocfs2/suballoc.c | 14 --
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index ea47120..6ad3533 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -1199,14 +1199,24 @@ retry:
>   inode_unlock((*ac)->ac_inode);
>
>   ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
> - if (ret == 1)
> + if (ret == 1) {
> + iput((*ac)->ac_inode);
> + (*ac)->ac_inode = NULL;
>   goto retry;
> + }
>
>   if (ret < 0)
>   mlog_errno(ret);
>
>   inode_lock((*ac)->ac_inode);
> - ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
> + ret = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
> + if (ret < 0) {
> + mlog_errno(ret);
> + inode_unlock((*ac)->ac_inode);
> + iput((*ac)->ac_inode);
> + (*ac)->ac_inode = NULL;
> + goto bail;
> + }
>   }
>   if (status < 0) {
>   if (status != -ENOSPC)



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix double unlock in case retry after free truncate log

2016-09-17 Thread Eric Ren

Hello Joseph,

On 09/14/2016 04:13 PM, Joseph Qi wrote:
> Hi Eric,
>
> On 2016/9/14 15:57, Eric Ren wrote:
>> Hello Joseph,
>>
>> Thanks for fixing up this.
>>
>> On 09/14/2016 12:15 PM, Joseph Qi wrote:
>>> If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
>>> free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
>>> will lock/unlock global bitmap inode, we have to unlock it before
>>> calling this function. But when retry reserve and it fails with no
>> You mean the retry succeeds by "retry reserve", right? I fail to understand 
>> in which situation
>> the retry will fail to get global bitmap inode lock. Because I didn't see 
>> this problem when I
>> tested my patch, could you explain a bit more?
>>
>> Eric
> Before retry it has inode unlocked, but ac inode is still valid. And
> if inode lock fails this time, it will goto bail and do inode unlock
> again.
Yeah, I see this point, thanks. I am also wondering when we will fail to
lock the global bitmap inode?

BTW, I'm guessing you mean "retry deserves" by "retry reserve"?

Eric
>
> Thanks,
> Joseph
>
>>> global bitmap inode lock taken, it will unlock again in error handling
>>> branch and BUG.
>>> This issue also exists if no need retry and then ocfs2_inode_lock fails.
>>> So fix it.
>>>
>>> Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
>>> truncate log"
>>> Signed-off-by: Jospeh Qi <joseph...@huawei.com>
>>> Signed-off-by: Jiufei Xue <xuejiu...@huawei.com>
>>> ---
>>>fs/ocfs2/suballoc.c | 13 +++--
>>>1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
>>> index ea47120..041453b 100644
>>> --- a/fs/ocfs2/suballoc.c
>>> +++ b/fs/ocfs2/suballoc.c
>>> @@ -1199,14 +1199,23 @@ retry:
>>>inode_unlock((*ac)->ac_inode);
>>>
>>>ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
>>> -if (ret == 1)
>>> +if (ret == 1) {
>>> +iput((*ac)->ac_inode);
>>> +(*ac)->ac_inode = NULL;
>>>goto retry;
>>> +}
>>>
>>>if (ret < 0)
>>>mlog_errno(ret);
>>>
>>>inode_lock((*ac)->ac_inode);
>>> -ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
>>> +status = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
>>> +if (status < 0) {
>>> +inode_unlock((*ac)->ac_inode);
>>> +iput((*ac)->ac_inode);
>>> +(*ac)->ac_inode = NULL;
>>> +goto bail;
>>> +}
>>>}
>>>if (status < 0) {
>>>if (status != -ENOSPC)
>>
>>
>> .
>>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-17 Thread Eric Ren

The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE)
and then ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:
[] __wait_on_bit_lock+0x50/0xa0
[] __lock_page+0xb7/0xc0
[] ? autoremove_wake_function+0x40/0x40
[] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
[] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
[] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
[] do_page_mkwrite+0x66/0xc0
[] handle_mm_fault+0x685/0x1350
[] ? __fpu__restore_sig+0x70/0x530
[] __do_page_fault+0x1d8/0x4d0
[] trace_do_page_fault+0x37/0xf0
[] do_async_page_fault+0x19/0x70
[] async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then
allocate disk space for this write; ocfs2_try_to_free_truncate_log()
will be called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
the target page isn't unlocked, so we will deadlock when trying to grab
the target page again.

Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another
deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite()
returns non-VM_FAULT_LOCKED, and along with a locked target page.

These two errors fail on the same path, so fix them by unlocking the target
page manually before ocfs2_free_write_ctxt().

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.

Changes since v1:
1. Also put ENOMEM error case into consideration.

Signed-off-by: Eric Ren <z...@suse.com>
Reviewed-by: He Gang <g...@suse.com>
Acked-by: Joseph Qi <joseph...@huawei.com>
---
 fs/ocfs2/aops.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 98d3654..bbb4b3e 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1842,6 +1842,16 @@ int ocfs2_write_begin_nolock(struct address_space 
*mapping,
ocfs2_commit_trans(osb, handle);
 
 out:
+   /*
+* The mmapped page won't be unlocked in ocfs2_free_write_ctxt(),
+* even in case of error here like ENOSPC and ENOMEM. So, we need
+* to unlock the target page manually to prevent deadlocks when
+* retrying again on ENOSPC, or when returning non-VM_FAULT_LOCKED
+* to VM code.
+*/
+   if (wc->w_target_locked)
+   unlock_page(mmap_page);
+
ocfs2_free_write_ctxt(inode, wc);
 
if (data_ac) {
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren

Hi Joseph,

On 09/14/2016 04:25 PM, Joseph Qi wrote:
> Hi Eric,
> Sorry for the delayed response.
> I have got your explanation. So we have to unlock the page only in case
> of retry, right?
> If so, I think the unlock should be right before "goto try_again".
No, the mmapped page should be unlocked as long as we cannot return 
VM_FAULT_LOCKED
to do_page_mkpage(). Otherwise, the deadlock will happen in do_page_mkpage(). 
Please
see the recent 2 mails;-)

Eric
>
> Thanks,
> Joseph
>
> On 2016/9/14 16:04, Eric Ren wrote:
>> Hi Joseph,
>>>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>>>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>>>>> which is usually the case, we start over again. But in 
>>>>> ocfs2_free_write_ctxt()
>>>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>>>> the target page again.
>>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>>> w_target_locked is set to true, and then will be unlocked by
>>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>>> it in more detail?
>>> Thanks for review;-) Follow up the calling chain:
>>>
>>> ocfs2_free_write_ctxt()
>>>   ->ocfs2_unlock_pages()
>>>
>>> in ocfs2_unlock_pages 
>>> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>>> can see the code just put_page(target_page), but not unlock it.
>> Did this answer your question?
>>
>> Thanks,
>> Eric
>>> Yeah, I will think this a bit more like:
>>> why not unlock the target_page there? Is there other potential problems if 
>>> the "ret" is not "-ENOSPC" but
>>> other possible error code?
>>>
>>> Thanks,
>>> Eric
>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>>> enough space at the first time.
>>>>>
>>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
>>>>> cause.
>>>>>
>>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>>> ---
>>>>>fs/ocfs2/aops.c | 7 +++
>>>>>1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 98d3654..78d1d67 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -1860,6 +1860,13 @@ out:
>>>>> */
>>>>>try_free = 0;
>>>>>+/*
>>>>> + * Unlock mmap_page because the page has been locked when we
>>>>> + * are here.
>>>>> + */
>>>>> +if (mmap_page)
>>>>> +unlock_page(mmap_page);
>>>>> +
>>>>>ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>>if (ret1 == 1)
>>>>>goto try_again;
>>>>>
>>>>
>>>
>>>
>>>
>>> ___
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren

Hi,

On 09/12/2016 11:06 AM, Eric Ren wrote:
> Hi,
>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>> w_target_locked is set to true, and then will be unlocked by
>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>> it in more detail?
>> Thanks for review;-) Follow up the calling chain:
>>
>> ocfs2_free_write_ctxt()
>>   ->ocfs2_unlock_pages()
>>
>> in ocfs2_unlock_pages
>> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>> can see the code just put_page(target_page), but not unlock it.
>>
>> Yeah, I will think this a bit more like:
>> why not unlock the target_page there? Is there other potential problems if 
>> the "ret" is
>> not "-ENOSPC" but
>> other possible error code?
> 1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this 
> case, we
> definitely want to return a locked mmaped page
> to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set.
>
> 2. But there's indeed a potential existing deadlock situation:
>  ocfs2_grab_pages_for_write()  ==> return 
> -ENOMEM and with
> the mmaped page locked
>  ocfs2_free_write_ctxt() ==> leave the 
> mmapped page locked
>ocfs2_write_begin_nolock() ==> return -ENOMEM
>  __ocfs2_page_mkwrite() ==> return VM_FAULT_OMM
>__do_page_mkwrite() ==> deadlock here
> (https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054)
> This is another corner case, right?
>
> Anyway, I think this patch is good for the -ENOSPC case. And another patch 
> should be
> proposed for -ENOMEM case?
Yes, I think we can catch both -ENOSPC and -ENOMEM cases in the failure path by 
unlocking the
mmaped page after ocfs2_free_write_ctx(), right?

Eric
>
> Thanks,
> Eric
>
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Joseph
>>>
>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>> enough space at the first time.
>>>>
>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
>>>> cause.
>>>>
>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>> ---
>>>>fs/ocfs2/aops.c | 7 +++
>>>>1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 98d3654..78d1d67 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -1860,6 +1860,13 @@ out:
>>>> */
>>>>try_free = 0;
>>>>+/*
>>>> + * Unlock mmap_page because the page has been locked when we
>>>> + * are here.
>>>> + */
>>>> +if (mmap_page)
>>>> +unlock_page(mmap_page);
>>>> +
>>>>ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>if (ret1 == 1)
>>>>goto try_again;
>>>>
>>>
>>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Eric Ren


Hi Joseph,

In ocfs2_write_begin_nolock(), we first grab the pages and then
allocate disk space for this write; ocfs2_try_to_free_truncate_log()
will be called if ENOSPC is turned; if we're lucky to get enough clusters,
which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
the target page isn't unlocked, so we will deadlock when trying to grab
the target page again.

IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
w_target_locked is set to true, and then will be unlocked by
ocfs2_unlock_pages in ocfs2_free_write_ctxt.
So I'm not getting the case "page isn't unlock". Could you please explain
it in more detail?

Thanks for review;-) Follow up the calling chain:

ocfs2_free_write_ctxt()
 ->ocfs2_unlock_pages()

in ocfs2_unlock_pages 
(https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we

can see the code just put_page(target_page), but not unlock it.

Did this answer your question?

Thanks,
Eric


Yeah, I will think this a bit more like:
why not unlock the target_page there? Is there other potential problems if the "ret" is 
not "-ENOSPC" but

other possible error code?

Thanks,
Eric



Thanks,
Joseph


Fix this issue by unlocking the target page after we fail to allocate
enough space at the first time.

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.

Signed-off-by: Eric Ren <z...@suse.com>
---
  fs/ocfs2/aops.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 98d3654..78d1d67 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1860,6 +1860,13 @@ out:
   */
  try_free = 0;
  +/*
+ * Unlock mmap_page because the page has been locked when we
+ * are here.
+ */
+if (mmap_page)
+unlock_page(mmap_page);
+
  ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
  if (ret1 == 1)
  goto try_again;









___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: free the mle while the res had one, to avoid mle memory leak.

2016-09-13 Thread Eric Ren

Hi,
On 09/13/2016 03:52 PM, Guozhonghua wrote:
> In the function dlm_migrate_request_handler, while the ret is --EEXIST, the 
> mle should be freed, otherwise the memory will be leaked.
Keep your commit comments within 75 or 78 (I don't remember clearly but git 
will warn
if you don't keep its rule) characters per line.
>
> Signed-off-by: Guozhonghua 
>
> --- ocfs2.orig/dlm/dlmmaster.c  2016-09-13 15:18:13.602684325 +0800
Please use `git format-patch` to create patch. FYI:
http://wiki.openhatch.org/How_to_generate_patches_with_git_format-patch

Sorry, I don't familiar with ocfs2/dlm code, so cannot review this patch.

Eric
> +++ ocfs2/dlm/dlmmaster.c   2016-09-13 15:27:05.014675736 +0800
> @@ -3188,6 +3188,9 @@ int dlm_migrate_request_handler(struct o
>  migrate->new_master,
>  migrate->master);
>
> +   if (ret < 0)
> +   kmem_cache_free(dlm_mle_cache, mle);
> +
>  spin_unlock(>master_lock);
>   unlock:
>  spin_unlock(>spinlock);
>
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-11 Thread Eric Ren

Hi,
>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>> w_target_locked is set to true, and then will be unlocked by
>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>> So I'm not getting the case "page isn't unlock". Could you please explain
>> it in more detail?
> Thanks for review;-) Follow up the calling chain:
>
> ocfs2_free_write_ctxt()
>  ->ocfs2_unlock_pages()
>
> in ocfs2_unlock_pages 
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
> can see the code just put_page(target_page), but not unlock it.
>
> Yeah, I will think this a bit more like:
> why not unlock the target_page there? Is there other potential problems if 
> the "ret" is 
> not "-ENOSPC" but
> other possible error code?
1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this 
case, we 
definitely want to return a locked mmaped page
to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set.

2. But there's indeed a potential existing deadlock situation:
ocfs2_grab_pages_for_write()  ==> return 
-ENOMEM and with 
the mmaped page locked
ocfs2_free_write_ctxt() ==> leave the mmapped 
page locked
  ocfs2_write_begin_nolock() ==> return -ENOMEM
__ocfs2_page_mkwrite() ==> return VM_FAULT_OMM
  __do_page_mkwrite() ==> deadlock here 
(https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054)
This is another corner case, right?

Anyway, I think this patch is good for the -ENOSPC case. And another patch 
should be 
proposed for -ENOMEM case?

Thanks,
Eric

>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>
>>> Fix this issue by unlocking the target page after we fail to allocate
>>> enough space at the first time.
>>>
>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
>>> cause.
>>>
>>> Signed-off-by: Eric Ren <z...@suse.com>
>>> ---
>>>   fs/ocfs2/aops.c | 7 +++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 98d3654..78d1d67 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1860,6 +1860,13 @@ out:
>>>*/
>>>   try_free = 0;
>>>   +/*
>>> + * Unlock mmap_page because the page has been locked when we
>>> + * are here.
>>> + */
>>> +if (mmap_page)
>>> +unlock_page(mmap_page);
>>> +
>>>   ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>   if (ret1 == 1)
>>>   goto try_again;
>>>
>>
>>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-08-30 Thread Eric Ren

Hi Ashish,

On 08/31/2016 07:17 AM, Ashish Samant wrote:
> Hi Eric,
>
> I am able to reproduce this on 4.8.0-rc3 as well. Can you try again and issue 
> a sync 
> between fallocate and dd?

It works!
ocfs2dev2 is not patched:

ocfs2dev2:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
ocfs2dev2:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
ocfs2dev2:/mnt/ocfs2 # sync
ocfs2dev2:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump 
-C
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ||
*
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0936888 s, 11.2 MB/s
0010

while ocfs2dev1 is patched:
==
ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
wrote 10485760/10485760 bytes at offset 0
10 MiB, 2560 ops; 0. sec (183.137 MiB/sec and 46883.0122 ops/sec)
ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
ocfs2dev1:/mnt/ocfs2 # sync
ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump 
-C
  cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd ||
*
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0933082 s, 11.2 MB/s
0010

>
> On 08/30/2016 12:38 AM, Eric Ren wrote:
>> Hi,
>>
>> I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-)
>>
>> On 08/30/2016 12:11 PM, Ashish Samant wrote:
>>> Hmm, thats weird. I see this on 4.7 kernel without the patch:
>>>
>>> # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>> wrote 10485760/10485760 bytes at offset 0
>>> 10 MiB, 2560 ops; 0. sec (683.995 MiB/sec and 175102.5992 ops/sec)
>>> # reflink -f 10MBfile reflnktest
>>> # fallocate -p -o 0 -l 1048615 reflnktest
>>> # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
>>>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
>>> ||
>>> *
>>> 1+0 records in
>>> 1+0 records out
>>> 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s
>>> 0010
>>>
>>> and with patch
>>> 
>>> # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C
>>>   cd cd cd cd cd cd cd cd  cd cd cd cd cd cd cd cd 
>>> ||
>>
>> I'm not familiar with this code.  So why is the output "cd ..."? because we 
>> didn't write 
>> anything
>> into "10MBfile". Is it a magic number when reading from a hole?
> No, "cd" is what xfs_io wrote into the file. Those are the original contents 
> of the file 
> which are overwritten by 0 in the first cluster because of this bug.

Ah, gotcha, thanks!

Eric

>
> Thanks,
> Ashish
>>
>> Eric
>>
>>> *
>>> 1+0 records in
>>> 1+0 records out
>>> 0010
>>
>>
>>
>>>
>>> Thanks,
>>> Ashish
>>>
>>>
>>> On 08/29/2016 08:33 PM, Eric Ren wrote:
>>>> Hello,
>>>>
>>>> On 08/30/2016 03:23 AM, Ashish Samant wrote:
>>>>> Hi Eric,
>>>>>
>>>>> The easiest way to reproduce this is :
>>>>>
>>>>> 1. Create a random file of say 10 MB
>>>>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>>>> 2. Reflink  it
>>>>> reflink -f 10MBfile reflnktest
>>>>> 3. Punch a hole at starting at cluster boundary  with range greater that 
>>>>> 1MB. You can 
>>>>> also use a range that will put the end offset in another extent.
>>>>> fallocate -p -o 0 -l 1048615 reflnktest
>>>>> 4. sync
>>>>> 5. Check the  first cluster in the source file. (It will be zeroed out).
>>>>>dd if=10MBfile iflag=direct bs= count=1 | hexdump -C
>>>>
>>>> Thanks! I have a try myself, but I'm not sure what is our expected output 
>>>> and if the 
>>>> test result meet
>>>> it:
>>>>
>>>> 1. After applying this patch:
>>>> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>>> wrote 10485760/10485760 bytes at offset 0
>>>> 10 MiB, 2560 ops; 0. sec (1.089 GiB/sec and 285427.5839 ops/sec)
>>>> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | 
>>>> hexdump

Re: [Ocfs2-devel] OCFS2 test report for linux vanilla kernel V4.8.0-rc2

2016-08-22 Thread Eric Ren

Sorry, actually it's already:

4.8.0-rc2-173-g184ca82-1.gacbdb4b-vanilla

Eric

On 08/22/2016 10:42 AM, Eric Ren wrote:
> Hi,
>
>
> The test report below is agaist vanilla kernel v4.7.0. Some highlights:
>
> 1.  As you can see from logs attached, pcmk stack is used with 
> "blocksize=4096, 
> clustersize=32768";
>
> 2. "inline" testcase on multiple nodes failed 3/3 times so far; seems to be a 
> regression 
> issue;
>
> 3. Two cases are skipped:
> o lvb_torture:  libo2dlm [1] doesn't correctly support LVB operations for 
> fs/dlm;
>
>   Actually, ocfs2 tools (mkfs and fsck) even don't need LVB operations, 
> so it is not 
> worth touching
>
>  libo2dlm right now, I think;
>
> o filecheck: is not scheduled this time;
>
> 4. "discontig test" is missing now:-/
>
>
> If anyone is intersted in  the detailed test logs, I can upload somewhere;-) 
> will schedule 
> test for V4.8.rc2 soon.
>
> [1] https://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-May/000761.html
>
> Eric
>
> run-dev-test
> *BUILD UNSTABLE*
> Project: run-dev-test
> Date of build: Sat, 20 Aug 2016 20:11:19 +0800
> Build duration: 19 hr
> Build cause: Started by upstream project "zren-testing/ocfs2-dlm-dev" 
> build number 16
> Build description:  linux vanilla kernel V4.7.0
> Built on: HA-232
>
>
>  Health Report
>
> W Description Score
> Build stability: 2 out of the last 5 builds failed. 60
> Test Result: 1 test failing out of a total of 29 tests. 96
>
>
>  Tests Reports
>
> <http://147.2.207.237:8080/job/zren-testing/job/run-dev-test/28//testReport>
> Package Failed Passed Skipped Total
> MultipleNodes 1 8 1 *10*
> MultipleNodes.inline.inline-test
> SingleNode 0 18 1 *19*
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] OCFS2 test report for linux vanilla kernel V4.7.0

2016-08-21 Thread Eric Ren


Hi,


The test report below is agaist vanilla kernel v4.7.0. Some highlights:

1.  As you can see from logs attached, pcmk stack is used with "blocksize=4096, 
clustersize=32768";


2. "inline" testcase on multiple nodes failed 3/3 times so far; seems to be a 
regression issue;

3. Two cases are skipped:
o lvb_torture:  libo2dlm [1] doesn't correctly support LVB operations for 
fs/dlm;

  Actually, ocfs2 tools (mkfs and fsck) even don't need LVB operations, so it is not 
worth touching


 libo2dlm right now, I think;

o filecheck: is not scheduled this time;

4. "discontig test" is missing now:-/


If anyone is intersted in  the detailed test logs, I can upload somewhere;-) will schedule 
test for V4.8.rc2 soon.


[1] https://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-May/000761.html

Eric

run-dev-test
*BUILD UNSTABLE*
Project:run-dev-test
Date of build:  Sat, 20 Aug 2016 20:11:19 +0800
Build duration: 19 hr
Build cause:Started by upstream project "zren-testing/ocfs2-dlm-dev" build 
number 16
Build description:   linux vanilla kernel V4.7.0
Built on:   HA-232


 Health Report

W   Description Score
Build stability: 2 out of the last 5 builds failed. 60
Test Result: 1 test failing out of a total of 29 tests. 96


 Tests Reports


Package Failed  Passed  Skipped Total
MultipleNodes   1   8   1   *10*
MultipleNodes.inline.inline-test
SingleNode  0   18  1   *19*




single_run.log
Description: Binary data


multiple-run-x86_64-2016-08-21-01-11-40.log
Description: Binary data
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] A bug in the end of DLM recovery

2016-08-07 Thread Eric Ren

Hi,

On 08/06/2016 01:58 PM, Gechangwei wrote:
> Hi,
>
> I found an issue in the end of DLM recovery.

What's the detailed steps of reproduction?

> When DLM recovery comes to the end of recovery procedure, it will remaster 
> all locks in other nodes.
> Right after a request message is sent to a node A (say), the new master node 
> will wait for node A’s response forever.
> But node A may die just after receiving the remaster request, not responses 
> to new master node yet.
> That causes new master node waiting forever.
> I think below patch can solve this problem. Please have a review!

Sorry, I cannot understand your problem. Could you give a more specific 
description
in the style such as this patch from Piaojun couple days ago:

ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before 
dlm_deref_lockres_done_handler

Also, a patch should be for a real bug which can be produced, and a test for 
this patch must 
also be performed. I'm a little worried because this patch is seemingly based 
on assumption.


BTW, the format of your patche isn't formal;-) Please
go through docs below:

[1] 
https://github.com/torvalds/linux/blob/master/Documentation/SubmittingPatches
[2] https://github.com/torvalds/linux/blob/master/Documentation/SubmitChecklist

Eric

>
>
> Subject: [PATCH] interrupt waiting for node's response if node dies
>
> Signed-off-by: gechangwei 
> ---
> dlm/dlmrecovery.c | 4 
> 1 file changed, 4 insertions(+)
>
> diff --git a/dlm/dlmrecovery.c b/dlm/dlmrecovery.c
> index 3d90ad7..5e455cb 100644
> --- a/dlm/dlmrecovery.c
> +++ b/dlm/dlmrecovery.c
> @@ -679,6 +679,10 @@ static int dlm_remaster_locks(struct dlm_ctxt *dlm, u8 
> dead_node)
>  dlm->name, ndata->node_num,
>  
> ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
>  "receiving" : "requested");
> +if (dlm_is_node_dead(dlm, 
> ndata->node_num)) {
> +  mlog(0, "%s: node %u 
> died after requesting all locks.\n");
> +  ndata->state = 
> DLM_RECO_NODE_DATA_DONE;
> +}
> all_nodes_done = 0;
> break;
>case DLM_RECO_NODE_DATA_DONE:
> --
>
> BR.
>
> Chauncey
>
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
>
>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v3] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-08 Thread Eric Ren

The testcase "mmaptruncate" in ocfs2 test suite always fails with
ENOSPC error on small volume (say less than 10G). This testcase
repeatedly performs "extend" and "truncate" on a file. Continuously,
it truncates the file to 1/2 of the size, and then extends to 100% of
the size. The main bitmap will quickly run out of space because the
"truncate" code prevent truncate log from being flushed by
ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may
have cached lots of clusters.

So retry to allocate after flushing truncate log when ENOSPC is
returned. And we cannot reuse the deleted blocks before the transaction
committed. Fortunately, we already have a function to do this -
ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
modifier and put it into the right place.

The "unlock"/"lock" code isn't elegant, but looks no better option.

v3:
1. Also need to lock allocator inode when "= 0" is returned from
ocfs2_schedule_truncate_log_flush(), which means no space really.
-- spotted by Joseph Qi

v2:
1. Lock allocator inode again if ocfs2_schedule_truncate_log_flush()
fails. -- spotted by Joseph Qi <joseph...@huawei.com>

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/alloc.c| 37 +
 fs/ocfs2/alloc.h|  2 ++
 fs/ocfs2/aops.c | 37 -
 fs/ocfs2/suballoc.c | 20 +++-
 4 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 460c0ce..7dabbc3 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct 
ocfs2_super *osb,
}
 }
 
+/*
+ * Try to flush truncate logs if we can free enough clusters from it.
+ * As for return value, "< 0" means error, "0" no space and "1" means
+ * we have freed enough spaces and let the caller try to allocate again.
+ */
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+   unsigned int needed)
+{
+   tid_t target;
+   int ret = 0;
+   unsigned int truncated_clusters;
+
+   inode_lock(osb->osb_tl_inode);
+   truncated_clusters = osb->truncated_clusters;
+   inode_unlock(osb->osb_tl_inode);
+
+   /*
+* Check whether we can succeed in allocating if we free
+* the truncate log.
+*/
+   if (truncated_clusters < needed)
+   goto out;
+
+   ret = ocfs2_flush_truncate_log(osb);
+   if (ret) {
+   mlog_errno(ret);
+   goto out;
+   }
+
+   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
+   jbd2_log_wait_commit(osb->journal->j_journal, target);
+   ret = 1;
+   }
+out:
+   return ret;
+}
+
 static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
   int slot_num,
   struct inode **tl_inode,
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index f3dc1b0..4a5152e 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
  u64 start_blk,
  unsigned int num_clusters);
 int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+  unsigned int needed);
 
 /*
  * Process local structure which describes the block unlinks done
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index c034edf..1802aef 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, struct 
buffer_head *di_bh,
return ret;
 }
 
-/*
- * Try to flush truncate logs if we can free enough clusters from it.
- * As for return value, "< 0" means error, "0" no space and "1" means
- * we have freed enough spaces and let the caller try to allocate again.
- */
-static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
- unsigned int needed)
-{
-   tid_t target;
-   int ret = 0;
-   unsigned int truncated_clusters;
-
-   inode_lock(osb->osb_tl_inode);
-   truncated_clusters = osb->truncated_clusters;
-   inode_unlock(osb->osb_tl_inode);
-
-   /*
-* Check whether we can succeed in allocating if we free
-* the truncate log.
-*/
-   if (truncated_clusters < needed)
-   goto out;
-
-   ret = ocfs2_flush_truncate_log(osb);
-   if (ret) {
-   mlog_errno(ret);
-   goto out;
-   }
-
-   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
-   jbd2_log_wait_commit(osb->journa

[Ocfs2-devel] [PATCH v2] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-06 Thread Eric Ren

The testcase "mmaptruncate" in ocfs2 test suite always fails with
ENOSPC error on small volume (say less than 10G). This testcase
repeatedly performs "extend" and "truncate" on a file. Continuously,
it truncates the file to 1/2 of the size, and then extends to 100% of
the size. The main bitmap will quickly run out of space because the
"truncate" code prevent truncate log from being flushed by
ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may
have cached lots of clusters.

So retry to allocate after flushing truncate log when ENOSPC is
returned. And we cannot reuse the deleted blocks before the transaction
committed. Fortunately, we already have a function to do this -
ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
modifier and put it into the right place.

The "unlock"/"lock" code isn't elegant, but looks no better option.

v2:
1. Lock allocator inode again if ocfs2_schedule_truncate_log_flush()
fails. -- spotted by Joseph Qi <joseph...@huawei.com>

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/alloc.c| 37 +
 fs/ocfs2/alloc.h|  2 ++
 fs/ocfs2/aops.c | 37 -
 fs/ocfs2/suballoc.c | 20 +++-
 4 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 460c0ce..7dabbc3 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct 
ocfs2_super *osb,
}
 }
 
+/*
+ * Try to flush truncate logs if we can free enough clusters from it.
+ * As for return value, "< 0" means error, "0" no space and "1" means
+ * we have freed enough spaces and let the caller try to allocate again.
+ */
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+   unsigned int needed)
+{
+   tid_t target;
+   int ret = 0;
+   unsigned int truncated_clusters;
+
+   inode_lock(osb->osb_tl_inode);
+   truncated_clusters = osb->truncated_clusters;
+   inode_unlock(osb->osb_tl_inode);
+
+   /*
+* Check whether we can succeed in allocating if we free
+* the truncate log.
+*/
+   if (truncated_clusters < needed)
+   goto out;
+
+   ret = ocfs2_flush_truncate_log(osb);
+   if (ret) {
+   mlog_errno(ret);
+   goto out;
+   }
+
+   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
+   jbd2_log_wait_commit(osb->journal->j_journal, target);
+   ret = 1;
+   }
+out:
+   return ret;
+}
+
 static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
   int slot_num,
   struct inode **tl_inode,
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index f3dc1b0..4a5152e 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
  u64 start_blk,
  unsigned int num_clusters);
 int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+  unsigned int needed);
 
 /*
  * Process local structure which describes the block unlinks done
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index c034edf..1802aef 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, struct 
buffer_head *di_bh,
return ret;
 }
 
-/*
- * Try to flush truncate logs if we can free enough clusters from it.
- * As for return value, "< 0" means error, "0" no space and "1" means
- * we have freed enough spaces and let the caller try to allocate again.
- */
-static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
- unsigned int needed)
-{
-   tid_t target;
-   int ret = 0;
-   unsigned int truncated_clusters;
-
-   inode_lock(osb->osb_tl_inode);
-   truncated_clusters = osb->truncated_clusters;
-   inode_unlock(osb->osb_tl_inode);
-
-   /*
-* Check whether we can succeed in allocating if we free
-* the truncate log.
-*/
-   if (truncated_clusters < needed)
-   goto out;
-
-   ret = ocfs2_flush_truncate_log(osb);
-   if (ret) {
-   mlog_errno(ret);
-   goto out;
-   }
-
-   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
-   jbd2_log_wait_commit(osb->journal->j_journal, target);
-   ret = 1;
-   }
-out:
-   return ret;
-}
-
 int ocfs2_write_begin_nolock(struct address_space *mapping,

Re: [Ocfs2-devel] [PATCH] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-06 Thread Eric Ren

Hi Joseph,

On 07/06/2016 12:21 PM, Joseph Qi wrote:
> NAK, if ocfs2_try_to_free_truncate_log fails, it will lead to double
> ocfs2_inode_unlock and then BUG.

Thanks for pointing out this! Will fix this and resend.

Eric

>
> On 2016/6/22 17:07, Eric Ren wrote:
>> The testcase "mmaptruncate" in ocfs2 test suite always fails with
>> ENOSPC error on small volume (say less than 10G). This testcase
>> creates 2 threads T1/T2 which race to "truncate"/"extend" a same
>> file repeatedly. Specifically, T1 truncates 1/2 size of a small file
>> while T2 extend to 100% size. The main bitmap will quickly run out
>> of space because the "truncate" code prevent truncate log from being
>> flushed by ocfs2_schedule_truncate_log_flush(osb, 1), while truncate
>> log may have cached lots of clusters.
>>
>> So retry to allocate after flushing truncate log when ENOSPC is
>> returned. And we cannot reuse the deleted blocks before the transaction
>> committed. Fortunately, we already have a function to do this -
>> ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
>> modifier and put it into a right place.
>>
>> Signed-off-by: Eric Ren <z...@suse.com>
>> ---
>>   fs/ocfs2/alloc.c| 37 +
>>   fs/ocfs2/alloc.h|  2 ++
>>   fs/ocfs2/aops.c | 37 -
>>   fs/ocfs2/suballoc.c | 17 -
>>   4 files changed, 55 insertions(+), 38 deletions(-)
>>
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index 460c0ce..7dabbc3 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct 
>> ocfs2_super *osb,
>>  }
>>   }
>>
>> +/*
>> + * Try to flush truncate logs if we can free enough clusters from it.
>> + * As for return value, "< 0" means error, "0" no space and "1" means
>> + * we have freed enough spaces and let the caller try to allocate again.
>> + */
>> +int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
>> +unsigned int needed)
>> +{
>> +tid_t target;
>> +int ret = 0;
>> +unsigned int truncated_clusters;
>> +
>> +inode_lock(osb->osb_tl_inode);
>> +truncated_clusters = osb->truncated_clusters;
>> +inode_unlock(osb->osb_tl_inode);
>> +
>> +/*
>> + * Check whether we can succeed in allocating if we free
>> + * the truncate log.
>> + */
>> +if (truncated_clusters < needed)
>> +goto out;
>> +
>> +ret = ocfs2_flush_truncate_log(osb);
>> +if (ret) {
>> +mlog_errno(ret);
>> +goto out;
>> +}
>> +
>> +if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
>> +jbd2_log_wait_commit(osb->journal->j_journal, target);
>> +ret = 1;
>> +}
>> +out:
>> +return ret;
>> +}
>> +
>>   static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
>> int slot_num,
>> struct inode **tl_inode,
>> diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
>> index f3dc1b0..4a5152e 100644
>> --- a/fs/ocfs2/alloc.h
>> +++ b/fs/ocfs2/alloc.h
>> @@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
>>u64 start_blk,
>>unsigned int num_clusters);
>>   int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
>> +int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
>> +   unsigned int needed);
>>
>>   /*
>>* Process local structure which describes the block unlinks done
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index c034edf..1802aef 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, 
>> struct buffer_head *di_bh,
>>  return ret;
>>   }
>>
>> -/*
>> - * Try to flush truncate logs if we can free enough clusters from it.
>> - * As for return value, "< 0" means error, "0" no space and "1" means
>> - * we have freed enough spaces and let the caller try to allocate again.
>> - */
>> -static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
>> -

Re: [Ocfs2-devel] ocfs2: cleanup implemented prototypes

2016-07-03 Thread Eric Ren

Hi Joseph,

On 07/04/2016 11:48 AM, Joseph Qi wrote:
> It's comments of ocfs2_read_dir_block, I have kept it as I think it
> still makes sense.

Yes, I see it. Well, perhaps reword the comments is better? Comment with
a no longer existing function name is confusing, anyway;-)

> Thanks,
> Joseph
>
> On 2016/7/4 11:36, Eric Ren wrote:
>> Hi Joseph,
>>
>> Please see comments inline;-)
>>
>> On 07/01/2016 05:27 PM, Joseph Qi wrote:
>>> Several prototypes in inode.h are just defined but not actually
>>> implemented and used, so remove them.
>>>
>>> Signed-off-by: Joseph Qi <joseph...@huawei.com>
>>> ---
>>>fs/ocfs2/inode.h | 7 ---
>>>fs/ocfs2/super.c | 1 -
>>>2 files changed, 8 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
>>> index d8f3fc8..50cc550 100644
>>> --- a/fs/ocfs2/inode.h
>>> +++ b/fs/ocfs2/inode.h
>>> @@ -145,22 +145,15 @@ int ocfs2_drop_inode(struct inode *inode);
>>>struct inode *ocfs2_ilookup(struct super_block *sb, u64 feoff);
>>>struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 feoff, unsigned 
>>> flags,
>>> int sysfile_type);
>>> -int ocfs2_inode_init_private(struct inode *inode);
>>>int ocfs2_inode_revalidate(struct dentry *dentry);
>>>void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
>>>  int create_ino);
>>> -void ocfs2_read_inode(struct inode *inode);
>>> -void ocfs2_read_inode2(struct inode *inode, void *opaque);
>>> -ssize_t ocfs2_rw_direct(int rw, struct file *filp, char *buf,
>>> -size_t size, loff_t *offp);
>>>void ocfs2_sync_blockdev(struct super_block *sb);
>>>void ocfs2_refresh_inode(struct inode *inode,
>>> struct ocfs2_dinode *fe);
>>>int ocfs2_mark_inode_dirty(handle_t *handle,
>>>   struct inode *inode,
>>>   struct buffer_head *bh);
>>> -struct buffer_head *ocfs2_bread(struct inode *inode,
>>> -int block, int *err, int reada);
>>
>> grep shows "ocfs2_bread" also appears in dir.c:511, could you take a
>> look at?
>>
>> Eric
>>
>>>
>>>void ocfs2_set_inode_flags(struct inode *inode);
>>>void ocfs2_get_inode_flags(struct ocfs2_inode_info *oi);
>>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>>> index d7cae33..d97de21 100644
>>> --- a/fs/ocfs2/super.c
>>> +++ b/fs/ocfs2/super.c
>>> @@ -2072,7 +2072,6 @@ static int ocfs2_initialize_super(struct super_block 
>>> *sb,
>>>osb->osb_dx_seed[3] = le32_to_cpu(di->id2.i_super.s_uuid_hash);
>>>
>>>osb->sb = sb;
>>> -/* Save off for ocfs2_rw_direct */
>>>osb->s_sectsize_bits = blksize_bits(sector_size);
>>>BUG_ON(!osb->s_sectsize_bits);
>>>
>>
>>
>> .
>>
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: cleanup implemented prototypes

2016-07-03 Thread Eric Ren

Hi Joseph,

Please see comments inline;-)

On 07/01/2016 05:27 PM, Joseph Qi wrote:
> Several prototypes in inode.h are just defined but not actually
> implemented and used, so remove them.
>
> Signed-off-by: Joseph Qi 
> ---
>   fs/ocfs2/inode.h | 7 ---
>   fs/ocfs2/super.c | 1 -
>   2 files changed, 8 deletions(-)
>
> diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
> index d8f3fc8..50cc550 100644
> --- a/fs/ocfs2/inode.h
> +++ b/fs/ocfs2/inode.h
> @@ -145,22 +145,15 @@ int ocfs2_drop_inode(struct inode *inode);
>   struct inode *ocfs2_ilookup(struct super_block *sb, u64 feoff);
>   struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 feoff, unsigned flags,
>int sysfile_type);
> -int ocfs2_inode_init_private(struct inode *inode);
>   int ocfs2_inode_revalidate(struct dentry *dentry);
>   void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
> int create_ino);
> -void ocfs2_read_inode(struct inode *inode);
> -void ocfs2_read_inode2(struct inode *inode, void *opaque);
> -ssize_t ocfs2_rw_direct(int rw, struct file *filp, char *buf,
> - size_t size, loff_t *offp);
>   void ocfs2_sync_blockdev(struct super_block *sb);
>   void ocfs2_refresh_inode(struct inode *inode,
>struct ocfs2_dinode *fe);
>   int ocfs2_mark_inode_dirty(handle_t *handle,
>  struct inode *inode,
>  struct buffer_head *bh);
> -struct buffer_head *ocfs2_bread(struct inode *inode,
> - int block, int *err, int reada);

grep shows "ocfs2_bread" also appears in dir.c:511, could you take a
look at?

Eric

>
>   void ocfs2_set_inode_flags(struct inode *inode);
>   void ocfs2_get_inode_flags(struct ocfs2_inode_info *oi);
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index d7cae33..d97de21 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2072,7 +2072,6 @@ static int ocfs2_initialize_super(struct super_block 
> *sb,
>   osb->osb_dx_seed[3] = le32_to_cpu(di->id2.i_super.s_uuid_hash);
>
>   osb->sb = sb;
> - /* Save off for ocfs2_rw_direct */
>   osb->s_sectsize_bits = blksize_bits(sector_size);
>   BUG_ON(!osb->s_sectsize_bits);
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: remove obscure BUG_ON in dlmglue

2016-07-03 Thread Eric Ren

Good catch, thanks!
Reviewed-by: Eric Ren <z...@suse.com>

On 07/01/2016 05:10 PM, Joseph Qi wrote:
> These BUG_ON(!inode) are obscure because we have already used inode to
> get osb. And actually we can guarantee here inode is valid in the
> context. So we can safely remove them.
>
> Signed-off-by: Joseph Qi <joseph...@huawei.com>
> ---
>   fs/ocfs2/dlmglue.c | 9 -
>   1 file changed, 9 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 1cfa1b6bf..2108960 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1634,7 +1634,6 @@ int ocfs2_create_new_inode_locks(struct inode *inode)
>   int ret;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>
> - BUG_ON(!inode);
>   BUG_ON(!ocfs2_inode_is_new(inode));
>
>   mlog(0, "Inode %llu\n", (unsigned long long)OCFS2_I(inode)->ip_blkno);
> @@ -1677,8 +1676,6 @@ int ocfs2_rw_lock(struct inode *inode, int write)
>   struct ocfs2_lock_res *lockres;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>
> - BUG_ON(!inode);
> -
>   mlog(0, "inode %llu take %s RW lock\n",
>(unsigned long long)OCFS2_I(inode)->ip_blkno,
>write ? "EXMODE" : "PRMODE");
> @@ -1721,8 +1718,6 @@ int ocfs2_open_lock(struct inode *inode)
>   struct ocfs2_lock_res *lockres;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>
> - BUG_ON(!inode);
> -
>   mlog(0, "inode %llu take PRMODE open lock\n",
>(unsigned long long)OCFS2_I(inode)->ip_blkno);
>
> @@ -1746,8 +1741,6 @@ int ocfs2_try_open_lock(struct inode *inode, int write)
>   struct ocfs2_lock_res *lockres;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>
> - BUG_ON(!inode);
> -
>   mlog(0, "inode %llu try to take %s open lock\n",
>(unsigned long long)OCFS2_I(inode)->ip_blkno,
>write ? "EXMODE" : "PRMODE");
> @@ -2325,8 +2318,6 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct buffer_head *local_bh = NULL;
>
> - BUG_ON(!inode);
> -
>   mlog(0, "inode %llu, take %s META lock\n",
>(unsigned long long)OCFS2_I(inode)->ip_blkno,
>ex ? "EXMODE" : "PRMODE");
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-06-22 Thread Eric Ren

The testcase "mmaptruncate" in ocfs2 test suite always fails with
ENOSPC error on small volume (say less than 10G). This testcase
creates 2 threads T1/T2 which race to "truncate"/"extend" a same
file repeatedly. Specifically, T1 truncates 1/2 size of a small file
while T2 extend to 100% size. The main bitmap will quickly run out
of space because the "truncate" code prevent truncate log from being
flushed by ocfs2_schedule_truncate_log_flush(osb, 1), while truncate
log may have cached lots of clusters.

So retry to allocate after flushing truncate log when ENOSPC is
returned. And we cannot reuse the deleted blocks before the transaction
committed. Fortunately, we already have a function to do this -
ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
modifier and put it into a right place.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/alloc.c| 37 +
 fs/ocfs2/alloc.h|  2 ++
 fs/ocfs2/aops.c | 37 -
 fs/ocfs2/suballoc.c | 17 -
 4 files changed, 55 insertions(+), 38 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 460c0ce..7dabbc3 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct 
ocfs2_super *osb,
}
 }
 
+/*
+ * Try to flush truncate logs if we can free enough clusters from it.
+ * As for return value, "< 0" means error, "0" no space and "1" means
+ * we have freed enough spaces and let the caller try to allocate again.
+ */
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+   unsigned int needed)
+{
+   tid_t target;
+   int ret = 0;
+   unsigned int truncated_clusters;
+
+   inode_lock(osb->osb_tl_inode);
+   truncated_clusters = osb->truncated_clusters;
+   inode_unlock(osb->osb_tl_inode);
+
+   /*
+* Check whether we can succeed in allocating if we free
+* the truncate log.
+*/
+   if (truncated_clusters < needed)
+   goto out;
+
+   ret = ocfs2_flush_truncate_log(osb);
+   if (ret) {
+   mlog_errno(ret);
+   goto out;
+   }
+
+   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
+   jbd2_log_wait_commit(osb->journal->j_journal, target);
+   ret = 1;
+   }
+out:
+   return ret;
+}
+
 static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
   int slot_num,
   struct inode **tl_inode,
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index f3dc1b0..4a5152e 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
  u64 start_blk,
  unsigned int num_clusters);
 int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+  unsigned int needed);
 
 /*
  * Process local structure which describes the block unlinks done
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index c034edf..1802aef 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, struct 
buffer_head *di_bh,
return ret;
 }
 
-/*
- * Try to flush truncate logs if we can free enough clusters from it.
- * As for return value, "< 0" means error, "0" no space and "1" means
- * we have freed enough spaces and let the caller try to allocate again.
- */
-static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
- unsigned int needed)
-{
-   tid_t target;
-   int ret = 0;
-   unsigned int truncated_clusters;
-
-   inode_lock(osb->osb_tl_inode);
-   truncated_clusters = osb->truncated_clusters;
-   inode_unlock(osb->osb_tl_inode);
-
-   /*
-* Check whether we can succeed in allocating if we free
-* the truncate log.
-*/
-   if (truncated_clusters < needed)
-   goto out;
-
-   ret = ocfs2_flush_truncate_log(osb);
-   if (ret) {
-   mlog_errno(ret);
-   goto out;
-   }
-
-   if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
-   jbd2_log_wait_commit(osb->journal->j_journal, target);
-   ret = 1;
-   }
-out:
-   return ret;
-}
-
 int ocfs2_write_begin_nolock(struct address_space *mapping,
 loff_t pos, unsigned len, ocfs2_write_type_t type,
 struct page **pagep, void **fsdata,
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
ind

[Ocfs2-devel] [PATCH] ocfs2: fix improper handling of return errno

2016-05-22 Thread Eric Ren

Previously, if bad inode was found in ocfs2_iget(), -ESTALE was
returned back to the caller anyway. Since commit d2b9d71a2da7
("ocfs2: check/fix inode block for online file check") can handle
with return value from ocfs2_read_locked_inode() now, we know the
exact errno returned for us.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/inode.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index ba495be..fee5ec6 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -176,12 +176,7 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 
blkno, unsigned flags,
}
if (is_bad_inode(inode)) {
iput(inode);
-   if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) ||
-   (flags & OCFS2_FI_FLAG_FILECHECK_FIX))
-   /* Return OCFS2_FILECHECK_ERR_XXX related errno */
-   inode = ERR_PTR(rc);
-   else
-   inode = ERR_PTR(-ESTALE);
+   inode = ERR_PTR(rc);
goto bail;
}
 
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: fix a redundant re-initialization

2016-05-22 Thread Eric Ren

Obviously, memset() has zeroed the whole struct locking_max_version.
So, it's no need to zero its two fields individually.

Signed-off-by: Eric Ren <z...@suse.com>
---
 fs/ocfs2/stackglue.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 5d965e8..855fb44 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -734,8 +734,6 @@ static void __exit ocfs2_stack_glue_exit(void)
 {
memset(_max_version, 0,
   sizeof(struct ocfs2_protocol_version));
-   locking_max_version.pv_major = 0;
-   locking_max_version.pv_minor = 0;
ocfs2_sysfs_exit();
if (ocfs2_table_header)
unregister_sysctl_table(ocfs2_table_header);
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

1 2 >

1 - 100 of 146 matches

Mail list logo