from:"Eric Ren"

[Ocfs2-devel] FYI: all testcases of ocfs2-test passed on 4.15.0-rc8-1.g05e4405-vanilla

2018-01-18 Thread Eric Ren


Hi,

As the subject, ocfs2-test ran with "-b 4096 -c 32768" parameters and 
fsdlm plugin,  passed

all cases on the recent upstream kernel. The overall results attached.

Eric


2018-01-18-20-33-09-discontig-bg-single-run.log
Description: Binary data


single_run.log
Description: Binary data


2018-01-19-04-28-18-discontig-bg-multiple-run.log
Description: Binary data


multiple-run-x86_64-2018-01-18-18-03-21.log
Description: Binary data
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

2018-01-12 Thread Eric Ren

Hi,

On 01/12/2018 11:43 AM, Shichangkuo wrote:
> Hi all,
> 　　Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with 
> umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows:
> journal recovery work:
> [] call_rwsem_down_read_failed+0x14/0x30
> [] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
> [] ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
> [] process_one_work+0x130/0x350
> [] worker_thread+0x46/0x3b0
> [] kthread+0x101/0x140
> [] ret_from_fork+0x1f/0x30
> [] 0x
>
> /bin/umount:
> [] flush_workqueue+0x104/0x3e0
> [] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
> [] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
> [] ocfs2_put_super+0x31/0xa0 [ocfs2]
> [] generic_shutdown_super+0x6d/0x120
> [] kill_block_super+0x2d/0x60
> [] deactivate_locked_super+0x51/0x90
> [] cleanup_mnt+0x3b/0x70
> [] task_work_run+0x86/0xa0
> [] exit_to_usermode_loop+0x6d/0xa9
> [] do_syscall_64+0x11d/0x130
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 　　
> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was 
> already locked by umount thread, then get a deadlock.

Good catch, thanks for reporting.  Is it reproducible? Can you please 
share the steps for reproducing this issue?
> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 
> 5f530de63cfc6ca8571cbdf58af63fb166cc6517.
> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already 
> removed.
> Shall we add a new mutex?

@Jan, I don't look into the code yet, could you help me understand why 
we need to get sb->s_umount in ocfs2_finish_quota_recovery?
Is it because that the quota recovery process will start at umounting? 
or some where else?

Thanks,
Eric



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Eric Ren

Hi,


On 12/27/2017 05:29 PM, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>
>dump_stack+0x5c/0x82
>panic+0xd5/0x21e
>watchdog_timer_fn+0x208/0x210
>? watchdog_park_threads+0x70/0x70
>__hrtimer_run_queues+0xcc/0x200
>hrtimer_interrupt+0xa6/0x1f0
>smp_apic_timer_interrupt+0x34/0x50
>apic_timer_interrupt+0x96/0xa0
>
>   RIP: 0010:unlock_page+0x17/0x30
>   RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>   RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>   RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>   RBP:  R08:  R09: af154080bb00
>   R10: af154080bc30 R11: 0040 R12: 993749a39518
>   R13:  R14: f21e009f5300 R15: f21e009f5300
>ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>ocfs2_readpage+0x41/0x2d0 [ocfs2]
>? pagecache_get_page+0x30/0x200
>filemap_fault+0x12b/0x5c0
>? recalc_sigpending+0x17/0x50
>? __set_task_blocked+0x28/0x70
>? __set_current_blocked+0x3d/0x60
>ocfs2_fault+0x29/0xb0 [ocfs2]
>__do_fault+0x1a/0xa0
>__handle_mm_fault+0xbe8/0x1090
>handle_mm_fault+0xaa/0x1f0
>__do_page_fault+0x235/0x4b0
>trace_do_page_fault+0x3c/0x110
>async_page_fault+0x28/0x30
>   RIP: 0033:0x7fa75ded638e
>   RSP: 002b:7ffd6657db18 EFLAGS: 00010287
>   RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
>   RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
>   RBP: 0003 R08: 000e R09: 
>   R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
>   R13: 000e R14: 1770 R15: 
>
> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
> Signed-off-by: Gang He 

On most linux server, CONFIG_PREEMPT is not set for better system-wide 
throughtput.
The long-time retry logic for getting page lock and inode lock can 
easily cause softlock,
resulting in real time task like corosync when using pcmk stack cannot 
be scheduled
on time.

When multiple nodes concurrently write the same file, the performance 
cannot be good
anyway, and it's also less possibility.

The trick for avoiding the busy loop looks good to me.

Reviewed-by: z...@suse.com

Thanks,
Eric

> ---
>   fs/ocfs2/dlmglue.c | 9 +
>   1 file changed, 9 insertions(+)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5193218 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>   if (ret == -EAGAIN) {
>   unlock_page(page);
> + /*
> +  * If we can't get inode lock immediately, we should not return
> +  * directly here, since this will lead to a softlockup problem.
> +  * The method is to get a blocking lock and immediately unlock
> +  * before returning, this can avoid CPU resource waste due to
> +  * lots of retries, and benefits fairness in getting lock.
> +  */
> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> + ocfs2_inode_unlock(inode, ex);
>   ret = AOP_TRUNCATED_PAGE;
>   }
>   


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-12-14 Thread Eric Ren


Hi,


On 12/05/2017 11:19 PM, BASSAGET Cédric wrote:

Hello
Retried from scratch; and still have an error when trying to bring up 
the second cluster :


root@LAB-virtm6:/# o2cb register-cluster ocfs2new
o2cb: Internal logic failure while registering cluster 'ocfs2new'

root@LAB-virtm6:/mnt/vol1_iscsi_san1# o2cb list-clusters
ocfs2
ocfs2new

Can anybody help me please ?


I don't know if o2cb stack can have multiple clusters on the same node. 
With pacemaker
stack, once we setup a pacemaker cluster stack, we can have multiple 
ocfs2 instance, i.e

mkfs and mount multiple ocfs2 FS on the same node.

Why do you want to setup multiple o2cb cluster on the same node?  I 
never know this need

of usage so far :)

Maybe, others can help on your question.

Thanks,
Eric




2017-11-29 8:28 GMT+01:00 BASSAGET Cédric 
mailto:cedric.bassaget...@gmail.com>>:


Hello,
I guess I did something wrong the first time.
I retried three times, and it worked three times. So I guess ocfs
1.6 and 1.8 are compatibles :)

Not it's time to set up a second ocfs2 cluster on my debian 9
server (ocfs 1.8), and I get this error when trying to mkfs.ocfs2 :

root@LAB-virtm6:~# mkfs.ocfs2 /dev/mapper/data_san_2
mkfs.ocfs2 1.8.4
On disk cluster (o2cb,ocfs2new,0) does not match the active
cluster (o2cb,ocfs2,0).
mkfs.ocfs2 will not be able to determine if this operation can be
done safely.
To skip this check, use --force or -F


The running cluster on this host is :
root@LAB-virtm6:~# o2cluster -r
o2cb,ocfs2,local

I'm trying to add an "ocfs2new" cluster :
root@LAB-virtm6:~# o2cb add-cluster ocfs2new
root@LAB-virtm6:~# o2cb add-node --ip 192.168.0.12 --port 
--number 1 ovfs2new LAB-virtm6
root@LAB-virtm6:~# o2cb add-node --ip 192.168.0.13 --port 
--number 2 ovfs2new LAB-virtm7

root@LAB-virtm6:~# o2cb list-clusters
ocfs2
ocfs2new

root@LAB-virtm6:~# o2cb cluster-status
Cluster 'ocfs2' is online

Even if I restart services or reboot, cluster 'ocfs2new' never
goes online.

What am I doing wrong ?



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-11-23 Thread Eric Ren


Hi,


On 11/23/2017 09:42 PM, BASSAGET Cédric wrote:

Tried to mkfs.ocfs2 on the debian 9 side (ocfs 1.8) :

root@LAB-virtm6:~# o2cluster -o /dev/mapper/data_san_2
o2cb,ocfs2new,local


Sorry, my fault. For mount failure, first thing is to check if your o2cb 
stack

is running via o2cb init service `rco2cb status` IIRC. Then, it's probably
a compatible issue: see the "FEATURE FLAGS" section in `man ocfs2`
for more instructions.



root@LAB-virtm6:~# mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2
*mount.ocfs2: Cluster name is invalid while trying to join the group*


Quote from `man ocfs2`:
"""
DETECTING FEATURE INCOMPATIBILITY

  Say one tries to mount a volume with an incompatible 
feature. What happens then? How does
  one detect the problem? How does one know the name of 
that incompatible feature?


  To begin with, one should look for error messages in 
dmesg(8). Mount  failures  that  are
  due to an incompatible feature will always result in an 
error message like the following:


  ERROR: couldn't mount because of unsupported optional 
features (200).


  Here  the  file  system is unable to mount the volume due 
to an unsupported optional fea-
  ture. That means that that feature is an Incompat 
feature.  By  referring  to  the  table
  above,  one can then deduce that the user failed to mount 
a volume with the xattr feature

  enabled. (The value in the error message is in hexadecimal.)
"""

Please show your dmesg.

Eric


root@LAB-virtm6:~# cat /etc/ocfs2/cluster.conf
node:
        ip_port = 
        ip_address = 192.168.0.11
        number = 1
        name = LAB-virtm5
        cluster = ocfs2new
node:
        ip_port = 
        ip_address = 192.168.0.12
        number = 2
        name = LAB-virtm6
        cluster = ocfs2new
node:
        ip_port = 
        ip_address = 192.168.0.13
        number = 3
        name = LAB-virtm7
        cluster = ocfs2new
cluster:
        node_count = 5
        name = ocfs2new




2017-11-23 14:02 GMT+01:00 BASSAGET Cédric 
mailto:cedric.bassaget...@gmail.com>>:


Hi Eric
on debian 9 (ocfs2 v1.8) :

# o2cluster -o /dev/mapper/data_san_2
default

on debian 8 (ocfs2 v1.6), I don't have the "o2cluster" tool

:(



2017-11-23 13:41 GMT+01:00 Eric Ren mailto:z...@suse.com>>:

Hi,

On 11/23/2017 06:23 PM, BASSAGET Cédric wrote:

hello,
I'm trying to set-up an OCFS2 cluster between hosts running
debian8 and debian9

2*debian 8 : ocfs2-tools 1.6.4-3
1*debian 9 : ocfs2-tools 1.8.4-4

I created the FS on debian 8 node :
 mkfs.ocfs2 -L "ocfs2_new" -N 5 /dev/mapper/data_san_2

then mounted it without problem
mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2/

I mounted it on second debian 8 host too, without problem.

Trying to mount in on debian9 returns :
mount.ocfs2: Cluster name is invalid while trying to join the
group

I saw in "man mkfs.ocfs2" that debian9 version
has --cluster-stack and --cluster-name options.

Is this option mandatory on ocfs2 1.8 ? That would say that
ocfs2 1.6 and 1.8 are not compatible ? Nothing is said about
1.8 on https://oss.oracle.com/projects/ocfs2/'re
<https://oss.oracle.com/projects/ocfs2/>


Not sure if they're compatible. So can you try again with
--cluster-stack and --cluster-name?

# o2cluster -o /dev/sda1
pcmk,cluster,none

pcmk is the cluster-stack, cluster is the name.

Usually, these two option is optional, the tools will detect
the right cluster stack automatically.

Eric





___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 cluster debian8 / debian9

2017-11-23 Thread Eric Ren


Hi,

On 11/23/2017 06:23 PM, BASSAGET Cédric wrote:

hello,
I'm trying to set-up an OCFS2 cluster between hosts running debian8 
and debian9


2*debian 8 : ocfs2-tools 1.6.4-3
1*debian 9 : ocfs2-tools 1.8.4-4

I created the FS on debian 8 node :
 mkfs.ocfs2 -L "ocfs2_new" -N 5 /dev/mapper/data_san_2

then mounted it without problem
mount /dev/mapper/data_san_2 /mnt/vol1_iscsi_san2/

I mounted it on second debian 8 host too, without problem.

Trying to mount in on debian9 returns :
mount.ocfs2: Cluster name is invalid while trying to join the group

I saw in "man mkfs.ocfs2" that debian9 version has --cluster-stack and 
--cluster-name options.


Is this option mandatory on ocfs2 1.8 ? That would say that ocfs2 1.6 
and 1.8 are not compatible ? Nothing is said about 1.8 on 
https://oss.oracle.com/projects/ocfs2/'re




Not sure if they're compatible. So can you try again with 
--cluster-stack and --cluster-name?


# o2cluster -o /dev/sda1
pcmk,cluster,none

pcmk is the cluster-stack, cluster is the name.

Usually, these two option is optional, the tools will detect the right 
cluster stack automatically.


Eric
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] Adding new node to an online OCFS2 cluster

2017-11-09 Thread Eric Ren

Hi,

On 11/09/2017 06:56 PM, BASSAGET Cédric wrote:
> Hello,
> As I did not get help on users mailing list, I allow myself to post my 
> question here. Sorry if it's not the right place, but I can't find any 
> documentation.

Not at all. Actually, other mail lists are under very low traffic now. I 
think it's not bad to ask question on this devel list.

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] How to share my notes about ocfs2 source code?

2017-11-09 Thread Eric Ren

Hi Larry,

Awesome,  can you share the charts in this thread as attachments?


On 11/08/2017 04:48 PM, Larry Chen wrote:
> Hi everyone,
>
> Recently, I have read a lot of ocfs2 source code, and made several notes
> meanwhile. Since I found that there are not enough docs describing
> how ocfs2 inside works, I would like to share my notes, hoping that
> they could help other new beginners. The notes is made through draw.io
> (a good drawing website), and I think it does well in illustrating 
> internal
> data structure and there relationships.
> The examples could be found attached.
> But I have no idea where and how to put them and how to ask everyone 
> to review them.
> Could anyone give me some instructions?

You can just share the plain charts with us,  or write blogs or article 
on lwn.net like [1] :)

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_402287_&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=rQ7ettRalN3Q_HzqipCl3kKWWYWWcQ4W-hpSpZKI1ao&s=KDlVI0uJ2wDTexISARaimd2Qlku8AOEFvbLNolO_VVk&e=

Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test result on 4.14.0-rc7-1.gdbf3e9b-vanilla kernel

2017-11-06 Thread Eric Ren

Hi,

On 11/07/2017 10:43 AM, Changwei Ge wrote:
> Hi Eric,
>
> On 2017/11/7 10:33, Eric Ren wrote:
>> Hi,
>>
>> The testing result against the recent kernel looks good. The attachments are
>> overall results. If the detailed logs are needed, please let me know.
>>
>> Pattern  Failed  Passed  Skipped Total
>> DiscontigBgMultiNode 0   4   0   4
>> DiscontigBgSingleNode0   5   0   5
>> MultipleNodes0   9   1   10
>> SingleNode   0   18  1   19
> 18 SingleNode cases were failed. Is that normal?

It's 18 passed. Sorry, the format is little messy, please see the 
attachment.

>
>> Notes:
>> - This testing only use blocksize=4096 and clustersize=32768 to reduce the 
>> time;
>> - The 2 skipped cases are on purpose: filecheck and lvb_torture.
>>
> I wonder why case - 'lvb_torture' is skipped on purpose?
We are using fsdlm, unfortunately this test case uses some user space 
API that fsdlm plugin doesn't implement :)

Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs2-test result on 4.14.0-rc7-1.gdbf3e9b-vanilla kernel

2017-11-06 Thread Eric Ren


Hi,

The testing result against the recent kernel looks good. The attachments are
overall results. If the detailed logs are needed, please let me know.

Pattern Failed  Passed  Skipped Total
DiscontigBgMultiNode0   4   0   4
DiscontigBgSingleNode   0   5   0   5
MultipleNodes   0   9   1   10
SingleNode  0   18  1   19

Notes:
- This testing only use blocksize=4096 and clustersize=32768 to reduce the time;
- The 2 skipped cases are on purpose: filecheck and lvb_torture.

Cheers,
Eric



single_run.log
Description: Binary data


multiple-run-x86_64-2017-11-06-19-57-44.log
Description: Binary data
=Discontiguous block group test starts:  Mon Nov  6 21:24:07 CST 2017=
<- Running test with 4096 bs and 32768 cs ->
[*] Inodes Block Group Test:[80G[[1;32m PASS [0;39m]
[*] Extent Block Group Test:[80G[[1;32m PASS [0;39m]
[*] Inline File Test:[80G[[1;32m PASS [0;39m]
[*] Xattr Block Test:[80G[[1;32m PASS [0;39m]
[*] Refcount Block Test:[80G[[1;32m PASS [0;39m]
=Discontiguous block group test ends: Tue Nov  7 01:35:29 CST 2017=

=Discontiguous block group test starts:  Tue Nov  7 01:35:34 CST 2017=
<- Running test with 4096 bs and 32768 cs ->
[*] Multi-nodes Inodes Block Group Test:[80G[[1;32m PASS [0;39m]
[*] Multi-nodes Extents Block Group Test:[80G[[1;32m PASS [0;39m]
[*] Multi-nodes Xattr Block Group Test:[80G[[1;32m PASS [0;39m]
[*] Multi-nodes Refcount Block Group Test:[80G[[1;32m PASS [0;39m]
=Discontiguous block group test ends: Tue Nov  7 02:31:54 CST 2017=

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: mknod: fix recursive locking hung

2017-10-22 Thread Eric Ren

Hi,

On 10/18/2017 12:44 PM, Junxiao Bi wrote:
> On 10/18/2017 12:41 PM, Gang He wrote:
>> Hi Junxiao,
>>
>> The problem looks easy to reproduce?
>> Could you share the trigger script/code for this issue?
> Please run ocfs2-test multiple reflink test.
Hmm, strange, we do run ocfs2-test quite often.

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: the ip_alloc_sem should be taken in ocfs2_get_block()

2017-10-22 Thread Eric Ren

Hi,

On 10/20/2017 05:03 PM, alex chen wrote:
> The ip_alloc_sem should be taken in ocfs2_get_block() when reading file
> in DIRECT mode to prevent concurrent access to extent tree with
> ocfs2_dio_end_io_write(), which may cause BUGON in
> ocfs2_get_clusters_nocache()->BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))

This maybe seem a obvious fix, but it would be great if you can
write a more detailed commit log, like paste the crash backtrace
here so that people can pick this fix easily when they see the same issue.

Thanks,
Eric
>
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
>
> ---
>   fs/ocfs2/aops.c | 21 +++--
>   1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 88a31e9..5cb939f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -134,6 +134,19 @@ static int ocfs2_symlink_get_block(struct inode *inode, 
> sector_t iblock,
>   return err;
>   }
>
> +static int ocfs2_get_block_lock(struct inode *inode, sector_t iblock,
> + struct buffer_head *bh_result, int create)
> +{
> + int ret;
> + struct ocfs2_inode_info *oi = OCFS2_I(inode);
> +
> + down_read(&oi->ip_alloc_sem);
> + ret = ocfs2_get_block(inode, iblock, bh_result, create);
> + up_read(&oi->ip_alloc_sem);
> +
> + return ret;
> +}
> +
>   int ocfs2_get_block(struct inode *inode, sector_t iblock,
>   struct buffer_head *bh_result, int create)
>   {
> @@ -2154,12 +2167,8 @@ static int ocfs2_dio_get_block(struct inode *inode, 
> sector_t iblock,
>* while file size will be changed.
>*/
>   if (pos + total_len <= i_size_read(inode)) {
> - down_read(&oi->ip_alloc_sem);
>   /* This is the fast path for re-write. */
> - ret = ocfs2_get_block(inode, iblock, bh_result, create);
> -
> - up_read(&oi->ip_alloc_sem);
> -
> + ret = ocfs2_get_block_lock(inode, iblock, bh_result, create);
>   if (buffer_mapped(bh_result) &&
>   !buffer_new(bh_result) &&
>   ret == 0)
> @@ -2424,7 +2433,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   return 0;
>
>   if (iov_iter_rw(iter) == READ)
> - get_block = ocfs2_get_block;
> + get_block = ocfs2_get_block_lock;
>   else
>   get_block = ocfs2_dio_get_block;
>
> -- 1.9.5.msysgit.1
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test supports Python 3

2017-09-27 Thread Eric Ren

Hi,

> As you know, some Linux distributions(e.g. SUSE Enterprise Linux 15) will 
> introduce Python3 as the default,
> our Python scripts in ocfs2-test still use Python2, we will have to do proper 
> modifications to migration to Python3 (Larry Chen has worked on this 
> investigation).
> But, the problem is how to maintain the ocfs2-test code between Python2 and 
> Python3? since some existing Linux distributions(include their further SP) 
> use Python2.
> there are two options,
> 1) one code branch, in which the code is compatible with Python2 and Python3, 
> but I feel it is difficult after I talked with Larry Chen.
> 2) two code branches, one branch is for Python2, the other branch is for 
> Python3.
> what are your thoughts? please let us know, then we can select a way to work 
> on this task.
I think we can make a "snapshot" branch of current master, i.e. name it 
as "snapshot-2017.9", and don't change that branch anymore.
For users who want to test ocfs2 on OS with python2, they can use it 
without problems, and we should document this in README.txt.

For the python3 and other changes,  we can merge them into master 
branch, making master move on forward...

Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-22 Thread Eric Ren

Hi Andrew,

On 06/23/2017 05:24 AM, Andrew Morton wrote:
> On Thu, 22 Jun 2017 14:10:38 +0800 Joseph Qi  wrote:
>
>> Looks good.
>> Reviewed-by: Joseph Qi 
> Should this fix be backported into -stable kernels?

No, I think, because the previous patches that this one needs to be on,

- commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid 
recursive cluster lock").
- commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs 
entry points")

is not in --stable too.

I don't know if it's possible to make them all into stable.

Thanks,
Eric

>
>> On 17/6/22 09:47, Eric Ren wrote:
>>> Another deadlock path caused by recursive locking is reported.
>>> This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
>>> take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
>>> have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
>>> taking inode lock at vfs entry points"). Yes, we intend to fix this
>>> kind of case in incremental way, because it's hard to find out all
>>> possible paths at once.
>>>
>>> This one can be reproduced like this. On node1, cp a large file from
>>> home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
>>> Both nodes will hang up there. The backtraces:
>>>
>>> On node1:
>>> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
>>> [] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
>>> [] generic_perform_write+0xa9/0x180
>>> [] __generic_file_write_iter+0x1aa/0x1d0
>>> [] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
>>> [] __vfs_write+0xc3/0x130
>>> [] vfs_write+0xb1/0x1a0
>>> [] SyS_write+0x46/0xa0
>>>
>>> On node2:
>>> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
>>> [] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
>>> [] ocfs2_set_acl+0x22d/0x260 [ocfs2]
>>> [] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
>>> [] set_posix_acl+0x75/0xb0
>>> [] posix_acl_xattr_set+0x49/0xa0
>>> [] __vfs_setxattr+0x69/0x80
>>> [] __vfs_setxattr_noperm+0x72/0x1a0
>>> [] vfs_setxattr+0xa7/0xb0
>>> [] setxattr+0x12d/0x190
>>> [] path_setxattr+0x9f/0xb0
>>> [] SyS_setxattr+0x14/0x20
>>>
>>> Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
>>> exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
>>> logic to avoid recursive cluster lock").
>>>
>>> Changes since v1:
>>> - Revised git commit description style in commit log.
>>>
>>> Reported-by: Thomas Voegtle 
>>> Tested-by: Thomas Voegtle 
>>> Signed-off-by: Eric Ren 
>>> ---
>>>   fs/ocfs2/dlmglue.c |  4 
>>>   fs/ocfs2/xattr.c   | 23 +--
>>>   2 files changed, 17 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 3b7c937..4689940 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
>>> struct ocfs2_lock_res *lockres;
>>>   
>>> lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>> +   /* had_lock means that the currect process already takes the cluster
>>> +* lock previously. If had_lock is 1, we have nothing to do here, and
>>> +* it will get unlocked where we got the lock.
>>> +*/
>>> if (!had_lock) {
>>> ocfs2_remove_holder(lockres, oh);
>>> ocfs2_inode_unlock(inode, ex);
>>> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
>>> index 3c5384d..f70c377 100644
>>> --- a/fs/ocfs2/xattr.c
>>> +++ b/fs/ocfs2/xattr.c
>>> @@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
>>>void *buffer,
>>>size_t buffer_size)
>>>   {
>>> -   int ret;
>>> +   int ret, had_lock;
>>> struct buffer_head *di_bh = NULL;
>>> +   struct ocfs2_lock_holder oh;
>>>   
>>> -   ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> -   if (ret < 0) {
>>> -   mlog_errno(ret);
>>> -   return ret;
>>> +   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
>>> +   if (had_lock < 0) {
>>>

[Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-21 Thread Eric Ren

Another deadlock path caused by recursive locking is reported.
This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
taking inode lock at vfs entry points"). Yes, we intend to fix this
kind of case in incremental way, because it's hard to find out all
possible paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
Both nodes will hang up there. The backtraces:

On node1:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
[] generic_perform_write+0xa9/0x180
[] __generic_file_write_iter+0x1aa/0x1d0
[] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
[] __vfs_write+0xc3/0x130
[] vfs_write+0xb1/0x1a0
[] SyS_write+0x46/0xa0

On node2:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
[] ocfs2_set_acl+0x22d/0x260 [ocfs2]
[] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
[] set_posix_acl+0x75/0xb0
[] posix_acl_xattr_set+0x49/0xa0
[] __vfs_setxattr+0x69/0x80
[] __vfs_setxattr_noperm+0x72/0x1a0
[] vfs_setxattr+0xa7/0xb0
[] setxattr+0x12d/0x190
[] path_setxattr+0x9f/0xb0
[] SyS_setxattr+0x14/0x20

Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
logic to avoid recursive cluster lock").

Changes since v1:
- Revised git commit description style in commit log.

Reported-by: Thomas Voegtle 
Tested-by: Thomas Voegtle 
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c |  4 
 fs/ocfs2/xattr.c   | 23 +--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 3b7c937..4689940 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
struct ocfs2_lock_res *lockres;
 
lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   /* had_lock means that the currect process already takes the cluster
+* lock previously. If had_lock is 1, we have nothing to do here, and
+* it will get unlocked where we got the lock.
+*/
if (!had_lock) {
ocfs2_remove_holder(lockres, oh);
ocfs2_inode_unlock(inode, ex);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3c5384d..f70c377 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
   void *buffer,
   size_t buffer_size)
 {
-   int ret;
+   int ret, had_lock;
struct buffer_head *di_bh = NULL;
+   struct ocfs2_lock_holder oh;
 
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   mlog_errno(ret);
-   return ret;
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
+   if (had_lock < 0) {
+   mlog_errno(had_lock);
+   return had_lock;
}
down_read(&OCFS2_I(inode)->ip_xattr_sem);
ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
 name, buffer, buffer_size);
up_read(&OCFS2_I(inode)->ip_xattr_sem);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
 
brelse(di_bh);
 
@@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
 {
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
-   int ret, credits, ref_meta = 0, ref_credits = 0;
+   int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct inode *tl_inode = osb->osb_tl_inode;
struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
struct ocfs2_refcount_tree *ref_tree = NULL;
+   struct ocfs2_lock_holder oh;
 
struct ocfs2_xattr_info xi = {
.xi_name_index = name_index,
@@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct inode *inode,
return -ENOMEM;
}
 
-   ret = ocfs2_inode_lock(inode, &di_bh, 1);
-   if (ret < 0) {
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 1, &oh);
+   if (had_lock < 0) {
+   ret = had_lock;
mlog_errno(ret);
goto cleanup_nolock;
}
@@ -3670,7 +3673,7 @@ int ocfs2_xattr_set(struct inode *inode,
if (ret)
mlog_errno(ret);
}
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
 cleanup_nolock:
brels

Re: [Ocfs2-devel] [PATCH] ocfs2: get rid of ocfs2_is_o2cb_active function

2017-06-21 Thread Eric Ren

On 05/22/17 16:17, Gang He wrote:
> This patch is used to get rid of ocfs2_is_o2cb_active() function,
> Why? First, we had the similar functions to identify which cluster
> stack is being used via osb->osb_cluster_stack. Second, the current
> implementation of ocfs2_is_o2cb_active() function is not total safe,
> base on the design of stackglue, we need to get ocfs2_stack_lock lock
> before using ocfs2_stack related data structures, and that
> active_stack pointer can be NULL in case mount failure.
>
> Signed-off-by: Gang He 
Looks good.
Reviewed-by: Eric Ren 

Eric

> ---
>   fs/ocfs2/dlmglue.c   | 2 +-
>   fs/ocfs2/stackglue.c | 6 --
>   fs/ocfs2/stackglue.h | 3 ---
>   3 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 3b7c937..a54196a 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -3409,7 +3409,7 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
> *osb,
>* we can recover correctly from node failure. Otherwise, we may get
>* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID being set.
>*/
> - if (!ocfs2_is_o2cb_active() &&
> + if (ocfs2_userspace_stack(osb) &&
>   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
>   lvb = 1;
>   
> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> index 8203590..52c07346b 100644
> --- a/fs/ocfs2/stackglue.c
> +++ b/fs/ocfs2/stackglue.c
> @@ -48,12 +48,6 @@
>*/
>   static struct ocfs2_stack_plugin *active_stack;
>   
> -inline int ocfs2_is_o2cb_active(void)
> -{
> - return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
> -}
> -EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
> -
>   static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
>   {
>   struct ocfs2_stack_plugin *p;
> diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
> index e3036e1..f2dce10 100644
> --- a/fs/ocfs2/stackglue.h
> +++ b/fs/ocfs2/stackglue.h
> @@ -298,9 +298,6 @@ int ocfs2_plock(struct ocfs2_cluster_connection *conn, 
> u64 ino,
>   int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
>   void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
>   
> -/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
> -int ocfs2_is_o2cb_active(void);
> -
>   extern struct kset *ocfs2_kset;
>   
>   #endif  /* STACKGLUE_H */



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-21 Thread Eric Ren

Another deadlock path caused by recursive locking is reported.
This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
taking inode lock at vfs entry points"). Yes, we intend to fix this
kind of case in incremental way, because it's hard to find out all
possible paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
Both nodes will hang up there. The backtraces:

On node1:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
[] generic_perform_write+0xa9/0x180
[] __generic_file_write_iter+0x1aa/0x1d0
[] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
[] __vfs_write+0xc3/0x130
[] vfs_write+0xb1/0x1a0
[] SyS_write+0x46/0xa0

On node2:
[] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
[] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
[] ocfs2_set_acl+0x22d/0x260 [ocfs2]
[] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
[] set_posix_acl+0x75/0xb0
[] posix_acl_xattr_set+0x49/0xa0
[] __vfs_setxattr+0x69/0x80
[] __vfs_setxattr_noperm+0x72/0x1a0
[] vfs_setxattr+0xa7/0xb0
[] setxattr+0x12d/0x190
[] path_setxattr+0x9f/0xb0
[] SyS_setxattr+0x14/0x20

Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to
avoid recursive cluster lock").

Reported-by:Thomas Voegtle 
Tested-by:      Thomas Voegtle 
Signed-off-by:  Eric Ren 
---
 fs/ocfs2/dlmglue.c |  4 
 fs/ocfs2/xattr.c   | 23 +--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 3b7c937..4689940 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
struct ocfs2_lock_res *lockres;
 
lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   /* had_lock means that the currect process already takes the cluster
+* lock previously. If had_lock is 1, we have nothing to do here, and
+* it will get unlocked where we got the lock.
+*/
if (!had_lock) {
ocfs2_remove_holder(lockres, oh);
ocfs2_inode_unlock(inode, ex);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3c5384d..f70c377 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
   void *buffer,
   size_t buffer_size)
 {
-   int ret;
+   int ret, had_lock;
struct buffer_head *di_bh = NULL;
+   struct ocfs2_lock_holder oh;
 
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   mlog_errno(ret);
-   return ret;
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
+   if (had_lock < 0) {
+   mlog_errno(had_lock);
+   return had_lock;
}
down_read(&OCFS2_I(inode)->ip_xattr_sem);
ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
 name, buffer, buffer_size);
up_read(&OCFS2_I(inode)->ip_xattr_sem);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
 
brelse(di_bh);
 
@@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
 {
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
-   int ret, credits, ref_meta = 0, ref_credits = 0;
+   int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct inode *tl_inode = osb->osb_tl_inode;
struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
struct ocfs2_refcount_tree *ref_tree = NULL;
+   struct ocfs2_lock_holder oh;
 
struct ocfs2_xattr_info xi = {
.xi_name_index = name_index,
@@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct inode *inode,
return -ENOMEM;
}
 
-   ret = ocfs2_inode_lock(inode, &di_bh, 1);
-   if (ret < 0) {
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 1, &oh);
+   if (had_lock < 0) {
+   ret = had_lock;
mlog_errno(ret);
goto cleanup_nolock;
}
@@ -3670,7 +3673,7 @@ int ocfs2_xattr_set(struct inode *inode,
if (ret)
mlog_errno(ret);
}
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
 cleanup_nolock:
brelse(di_bh);
brelse(xbs.xattr_bh);
-- 
2.10.2

Re: [Ocfs2-devel] deadlock with setfacl

2017-06-21 Thread Eric Ren


Hi Thomas,

I'm attaching a patch for the issue you reported. I've tested myself.
Could you please also try it out?

If it's OK, I'll submit a formal patch later.

Thanks,
Eric

On 06/20/2017 04:38 PM, Eric Ren wrote:

Hi!

Thanks for reporting! I will get to this issue quickly.

Eric

Sent from my iPhone


On 20 Jun 2017, at 16:02, Thomas Voegtle  wrote:


Hello,


We see a deadlock with setfacl on 4.4.70 and on 4.12-rc5, too.

node1: copies a big file from /home/user to the ocfs2 mountpoint
node2: runs setfacl on that file in the ocfs2 mountpoint while cp still running
=> both jobs never end.


When we revert
743b5f1434f57a147226c747fe228cadeb7b05ed ocfs2: take inode lock in
ocfs2_iop_set/get_acl()
and the other two follow-up fixes (5ee0fbd50fdf1c132 and b891fa5024a95c77)
we see no deadlock anymore.

commit b891fa5024a95c77 fixed it for getacl (we can confirm this) but not
for setacl, as we encounter?

Reference:
https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html

Thanks,

Thomas



This gets printed in the dmesg on node1:

[  484.345226] INFO: task cp:10633 blocked for more than 120 seconds.
[  484.345230]   Not tainted 4.12.0-rc5 #1
[  484.345230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  484.345232] cp  D0 10633   5594 0x
[  484.345235] Call Trace:
[  484.345295]  __schedule+0x2e8/0x5f7
[  484.345298]  schedule+0x35/0x80
[  484.345300]  schedule_timeout+0x1a7/0x230
[  484.345326]  ? check_preempt_curr+0x61/0x90
[  484.345358]  ? ocfs2_control_read+0x60/0x60 [ocfs2_stack_user]
[  484.345360]  wait_for_completion+0x9b/0x100
[  484.345361]  ? try_to_wake_up+0x250/0x250
[  484.345447]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
[  484.345463]  ? radix_tree_tag_set+0x7e/0xf0
[  484.345475]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
[  484.345486]  ? ocfs2_wake_downconvert_thread+0x4d/0x60 [ocfs2]
[  484.345497]  ocfs2_write_begin+0x4a/0x190 [ocfs2]
[  484.345509]  generic_perform_write+0xa7/0x190
[  484.345516]  __generic_file_write_iter+0x191/0x1e0
[  484.345528]  ocfs2_file_write_iter+0x1a5/0x490 [ocfs2]
[  484.345541]  ? ext4_file_read_iter+0xae/0xf0
[  484.345550]  new_sync_write+0xc0/0x100
[  484.345552]  __vfs_write+0x27/0x40
[  484.345553]  vfs_write+0xc4/0x1b0
[  484.34]  SyS_write+0x4a/0xa0
[  484.345561]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  484.345563] RIP: 0033:0x7fb5111a8150
[  484.345564] RSP: 002b:7fff125140d8 EFLAGS: 0246 ORIG_RAX: 
0001
[  484.345566] RAX: ffda RBX: 0001 RCX: 7fb5111a8150
[  484.345567] RDX: 0002 RSI: 7fb511c3f000 RDI: 0004
[  484.345678] RBP: 7fff125141d0 R08:  R09: 7fff12515c82
[  484.345680] R10: 7fff12513e70 R11: 0246 R12: 004030b0
[  484.345681] R13: 7fff12514ca0 R14:  R15: 


This gets printed in the dmesg on node2:

[  484.483726] INFO: task setfacl:10279 blocked for more than 120 seconds.
[  484.483729]   Not tainted 4.12.0-rc5 #1
[  484.483730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  484.483731] setfacl D0 10279  10278 0x
[  484.483734] Call Trace:
[  484.483793]  __schedule+0x2e8/0x5f7
[  484.483797]  schedule+0x35/0x80
[  484.483799]  schedule_timeout+0x1a7/0x230
[  484.483825]  ? default_wake_function+0xd/0x10
[  484.483832]  ? autoremove_wake_function+0x11/0x40
[  484.483834]  ? __wake_up_common+0x4f/0x80
[  484.483835]  wait_for_completion+0x9b/0x100
[  484.483837]  ? try_to_wake_up+0x250/0x250
[  484.483973]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
[  484.483993]  ? radix_tree_lookup_slot+0x13/0x30
[  484.484005]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
[  484.484018]  ocfs2_xattr_set+0x143/0x740 [ocfs2]
[  484.484035]  ? jbd2_journal_cancel_revoke+0xbf/0xf0
[  484.484049]  ocfs2_set_acl+0x177/0x190 [ocfs2]
[  484.484061]  ? ocfs2_inode_lock_tracker+0xee/0x180 [ocfs2]
[  484.484074]  ocfs2_iop_set_acl+0x60/0xa0 [ocfs2]
[  484.484084]  set_posix_acl+0x84/0xc0
[  484.484090]  posix_acl_xattr_set+0x4c/0xb0
[  484.484099]  __vfs_setxattr+0x71/0x90
[  484.484102]  __vfs_setxattr_noperm+0x70/0x1b0
[  484.484104]  vfs_setxattr+0xae/0xb0
[  484.484106]  setxattr+0x160/0x190
[  484.484112]  ? strncpy_from_user+0x43/0x140
[  484.484118]  ? getname_flags.part.41+0x56/0x1c0
[  484.484121]  ? __mnt_want_write+0x4d/0x60
[  484.484123]  path_setxattr+0x85/0xb0
[  484.484125]  SyS_setxattr+0xf/0x20
[  484.484131]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  484.484133] RIP: 0033:0x7f203f2b23f9
[  484.484134] RSP: 002b:7ffd7d8585d8 EFLAGS: 0246 ORIG_RAX: 
00bc
[  484.484136] RAX: ffda RBX: 01d9d3a0 RCX: 7f203f2b23f9
[  484.484137] RDX: 01d9d5a0 RSI: 7f203f782b5f RDI: 7ffd7d858890
[  484.484137] RBP:  R08:

Re: [Ocfs2-devel] deadlock with setfacl

2017-06-20 Thread Eric Ren

Hi!

Thanks for reporting! I will get to this issue quickly.

Eric

Sent from my iPhone

> On 20 Jun 2017, at 16:02, Thomas Voegtle  wrote:
> 
> 
> Hello,
> 
> 
> We see a deadlock with setfacl on 4.4.70 and on 4.12-rc5, too.
> 
> node1: copies a big file from /home/user to the ocfs2 mountpoint
> node2: runs setfacl on that file in the ocfs2 mountpoint while cp still 
> running
> => both jobs never end.
> 
> 
> When we revert
> 743b5f1434f57a147226c747fe228cadeb7b05ed ocfs2: take inode lock in
> ocfs2_iop_set/get_acl()
> and the other two follow-up fixes (5ee0fbd50fdf1c132 and b891fa5024a95c77)
> we see no deadlock anymore.
> 
> commit b891fa5024a95c77 fixed it for getacl (we can confirm this) but not
> for setacl, as we encounter?
> 
> Reference:
> https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html
> 
> Thanks,
> 
> Thomas
> 
> 
> 
> This gets printed in the dmesg on node1:
> 
> [  484.345226] INFO: task cp:10633 blocked for more than 120 seconds.
> [  484.345230]   Not tainted 4.12.0-rc5 #1
> [  484.345230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  484.345232] cp  D0 10633   5594 0x
> [  484.345235] Call Trace:
> [  484.345295]  __schedule+0x2e8/0x5f7
> [  484.345298]  schedule+0x35/0x80
> [  484.345300]  schedule_timeout+0x1a7/0x230
> [  484.345326]  ? check_preempt_curr+0x61/0x90
> [  484.345358]  ? ocfs2_control_read+0x60/0x60 [ocfs2_stack_user]
> [  484.345360]  wait_for_completion+0x9b/0x100
> [  484.345361]  ? try_to_wake_up+0x250/0x250
> [  484.345447]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
> [  484.345463]  ? radix_tree_tag_set+0x7e/0xf0
> [  484.345475]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
> [  484.345486]  ? ocfs2_wake_downconvert_thread+0x4d/0x60 [ocfs2]
> [  484.345497]  ocfs2_write_begin+0x4a/0x190 [ocfs2]
> [  484.345509]  generic_perform_write+0xa7/0x190
> [  484.345516]  __generic_file_write_iter+0x191/0x1e0
> [  484.345528]  ocfs2_file_write_iter+0x1a5/0x490 [ocfs2]
> [  484.345541]  ? ext4_file_read_iter+0xae/0xf0
> [  484.345550]  new_sync_write+0xc0/0x100
> [  484.345552]  __vfs_write+0x27/0x40
> [  484.345553]  vfs_write+0xc4/0x1b0
> [  484.34]  SyS_write+0x4a/0xa0
> [  484.345561]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  484.345563] RIP: 0033:0x7fb5111a8150
> [  484.345564] RSP: 002b:7fff125140d8 EFLAGS: 0246 ORIG_RAX: 
> 0001
> [  484.345566] RAX: ffda RBX: 0001 RCX: 
> 7fb5111a8150
> [  484.345567] RDX: 0002 RSI: 7fb511c3f000 RDI: 
> 0004
> [  484.345678] RBP: 7fff125141d0 R08:  R09: 
> 7fff12515c82
> [  484.345680] R10: 7fff12513e70 R11: 0246 R12: 
> 004030b0
> [  484.345681] R13: 7fff12514ca0 R14:  R15: 
> 
> 
> 
> This gets printed in the dmesg on node2:
> 
> [  484.483726] INFO: task setfacl:10279 blocked for more than 120 seconds.
> [  484.483729]   Not tainted 4.12.0-rc5 #1
> [  484.483730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  484.483731] setfacl D0 10279  10278 0x
> [  484.483734] Call Trace:
> [  484.483793]  __schedule+0x2e8/0x5f7
> [  484.483797]  schedule+0x35/0x80
> [  484.483799]  schedule_timeout+0x1a7/0x230
> [  484.483825]  ? default_wake_function+0xd/0x10
> [  484.483832]  ? autoremove_wake_function+0x11/0x40
> [  484.483834]  ? __wake_up_common+0x4f/0x80
> [  484.483835]  wait_for_completion+0x9b/0x100
> [  484.483837]  ? try_to_wake_up+0x250/0x250
> [  484.483973]  __ocfs2_cluster_lock.isra.42+0x29b/0x740 [ocfs2]
> [  484.483993]  ? radix_tree_lookup_slot+0x13/0x30
> [  484.484005]  ocfs2_inode_lock_full_nested+0x1d2/0x3a0 [ocfs2]
> [  484.484018]  ocfs2_xattr_set+0x143/0x740 [ocfs2]
> [  484.484035]  ? jbd2_journal_cancel_revoke+0xbf/0xf0
> [  484.484049]  ocfs2_set_acl+0x177/0x190 [ocfs2]
> [  484.484061]  ? ocfs2_inode_lock_tracker+0xee/0x180 [ocfs2]
> [  484.484074]  ocfs2_iop_set_acl+0x60/0xa0 [ocfs2]
> [  484.484084]  set_posix_acl+0x84/0xc0
> [  484.484090]  posix_acl_xattr_set+0x4c/0xb0
> [  484.484099]  __vfs_setxattr+0x71/0x90
> [  484.484102]  __vfs_setxattr_noperm+0x70/0x1b0
> [  484.484104]  vfs_setxattr+0xae/0xb0
> [  484.484106]  setxattr+0x160/0x190
> [  484.484112]  ? strncpy_from_user+0x43/0x140
> [  484.484118]  ? getname_flags.part.41+0x56/0x1c0
> [  484.484121]  ? __mnt_want_write+0x4d/0x60
> [  484.484123]  path_setxattr+0x85/0xb0
> [  484.484125]  SyS_setxattr+0xf/0x20
> [  484.484131]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  484.484133] RIP: 0033:0x7f203f2b23f9
> [  484.484134] RSP: 002b:7ffd7d8585d8 EFLAGS: 0246 ORIG_RAX: 
> 00bc
> [  484.484136] RAX: ffda RBX: 01d9d3a0 RCX: 
> 7f203f2b23f9
> [  484.484137] RDX: 01d9d5a0 RSI: 7f203f782b5f RDI: 
> 7ffd7d858890
> [  484.484137] RBP:  R08:  R09: 
>

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io. review

2017-06-02 Thread Eric Ren

Hi Guanghui,

Please sort out your mail more orderly. It looks really messy! So, rework your 
mail by asking
question yourself like:

- What is the problem you are facing?
Looks like a BUG_ON() is triggered. but which BUG_ON()? the backtrace?  How 
this can be
reroduced?

- What help or answer do you hope for?
You didn't ask any question below!


On 05/26/2017 11:46 AM, Zhangguanghui wrote:
> This patch replace that function ocfs2_direct_IO_get_blocks with
Which patch? Don't analyze code before telling the problem.

>
> this function ocfs2_get_blocks  in ocfs2_direct_IO, and remove the  
> ip_alloc_sem.
>
> but i think ip_alloc_sem is still needed because protect  allocation changes 
> is very correct.
"still needed" - so, which commit dropped it?

>
> Now, BUG_ON have been tiggered  in the process of testing direct-io.
>
> Comments and questions are, as always, welcome. Thanks

Comments on what?

>
>
> As wangww631 described
A mail thread link is useful for people to know the discussion and background.

>
> In ocfs2, ip_alloc_sem is used to protect allocation changes on the node.
> In direct IO, we add ip_alloc_sem to protect date consistent between
> direct-io and ocfs2_truncate_file race (buffer io use ip_alloc_sem
> already).  Although inode->i_mutex lock is used to avoid concurrency of
> above situation, i think ip_alloc_sem is still needed because protect
> allocation changes is significant.
>
> Other filesystem like ext4 also uses rw_semaphore to protect data
> consistent between get_block-vs-truncate race by other means, So
> ip_alloc_sem in ocfs2 direct io is needed.
>
>
> Date: Fri, 11 Sep 2015 16:19:18 +0800
> From: Ryan Ding 
> Subject: [Ocfs2-devel] [PATCH 7/8] ocfs2: fix sparse file & data
>  orderingissue in direct io.
You email subject is almost the same as this patch, which brings confusion...

> To: ocfs2-devel@oss.oracle.com
> Cc: mfas...@suse.de
> Message-ID: <1441959559-29947-8-git-send-email-ryan.d...@oracle.com>
Don't copy & paste patch content, only making you mail too long to scare reader 
away.

Eric

>
> There are mainly 3 issue in the direct io code path after commit 24c40b329e03 
> ("ocfs2: implement ocfs2_direct_IO_write"):
>* Do not support sparse file.
>* Do not support data ordering. eg: when write to a file hole, it will 
> alloc
>  extent first. If system crashed before io finished, data will corrupt.
>* Potential risk when doing aio+dio. The -EIOCBQUEUED return value is 
> likely
>  to be ignored by ocfs2_direct_IO_write().
>
> To resolve above problems, re-design direct io code with following ideas:
>* Use buffer io to fill in holes. And this will make better performance 
> also.
>* Clear unwritten after direct write finished. So we can make sure meta 
> data
>  changes after data write to disk. (Unwritten extent is invisible to user,
>  from user's view, meta data is not changed when allocate an unwritten
>  extent.)
>* Clear ocfs2_direct_IO_write(). Do all ending work in end_io.
>
> This patch has passed 
> fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4
> test cases of ltp.
>
> Signed-off-by: Ryan Ding 
> Reviewed-by: Junxiao Bi 
> cc: Joseph Qi 
> ---
>   fs/ocfs2/aops.c |  851 
> ++-
>   1 files changed, 342 insertions(+), 509 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index b4ec600..4bb9921 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -499,152 +499,6 @@ bail:
>  return status;
>   }
>
> -/*
> - * TODO: Make this into a generic get_blocks function.
> - *
> - * From do_direct_io in direct-io.c:
> - *  "So what we do is to permit the ->get_blocks function to populate
> - *   bh.b_size with the size of IO which is permitted at this offset and
> - *   this i_blkbits."
> - *
> - * This function is called directly from get_more_blocks in direct-io.c.
> - *
> - * called like this: dio->get_blocks(dio->inode, fs_startblk,
> - * fs_count, map_bh, dio->rw == WRITE);
> - */
> -static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock,
> -struct buffer_head *bh_result, int 
> create)
> -{
> -   int ret;
> -   u32 cpos = 0;
> -   int alloc_locked = 0;
> -   u64 p_blkno, inode_blocks, contig_blocks;
> -   unsigned int ext_flags;
> -   unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits;
> -   unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits;
> -   unsigned long len = bh_result->b_size;
> -   unsigned int clusters_to_alloc = 0, contig_clusters = 0;
> -
> -   cpos = ocfs2_blocks_to_clusters(inode->i_sb, iblock);
> -
> -   /* This function won't even be called if the request isn't all
> -* nicely aligned and of the right size, so there's no need
> -* for us to check any of that. */
> -
> -   inode_blocks = ocfs2_blocks_for_bytes(inode->i_sb,

Re: [Ocfs2-devel] DLM Questions

2017-05-24 Thread Eric Ren

Hi,
On 05/25/2017 12:26 AM, Jim Wayda (Stellus) wrote:
> I am interested in using a DLM for my user mode application and have been 
> examining the ocfs2 DLM. I have been experimenting with the o2dlm_test code 
> and have noticed an issue with the trylock command. When I start the o2dlm 
> code with no options, it is operating in classic mode and is using /dlm.
>
>
> 1.   On node 1 and node 2  I issue the command "register test".
>
>
> 2.   On node 1 I issue the command "lock ex mylock" and the operation 
> succeeds.
>
>
> 3.   On node 2 I issue the command "trylock ex mylock" and the operation 
> fails as expected.
>
>
> 4.   On node 1, I then unlock by issuing the command "unlock mylock" and 
> the operation succeeds.
>
>
> 5.   On node 2 I issue the command "trylock ex mylock" and the operation 
> fails. This seems to be an error. The operation should succeed.
>
> Note that if I try the same operations shown above with o2dlm_test invoked 
> with the -u option which causes it to use fsdlm, I don't see the error in 
> step 5 above. Everything works as expected.
>
> It is also my understanding from examining the code that when using /dlm 
> (this mode is referred to as "classic" in the source code), blocking ASTs are 
> supported, but when using fsdlm blocking operations are not supported. Is 
> this a correct understanding?
>
> In the tests that I am running I am using the DLM with a two node ocfs2 
> cluster, but in my final design I want to remove the DLM code from the file 
> system and use it in another application. Is this feasible?
If you're aiming to use DLM in userland application, I recommend:
1) dlm project
https://pagure.io/dlm

2) dlm book
http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf

3) an example you can refer to
clvm:
https://sourceware.org/git/?p=lvm2.git;a=tree;f=daemons/clvmd;h=3daee0b265ef40789ee00264dbc2afde0389f0a7;hb=HEAD

Regards,
Eric
>
> Who is currently maintaining the DLM code?
>
> Thanks,
> -jim
>
>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a static checker warning

2017-05-23 Thread Eric Ren

On 05/23/2017 01:17 PM, Gang He wrote:
> This patch will fix a static code checker warning, which looks
> like below,
> fs/ocfs2/inode.c:179 ocfs2_iget()
> warn: passing zero to 'ERR_PTR'
>
> this warning was caused by the
> commit d56a8f32e4c6 ("ocfs2: check/fix inode block for online file check").
> after apply this patch, the error return value will not be NULL(zero).
>
> Signed-off-by: Gang He 
Looks good to me.

Reviewed-by: Eric Ren 
> ---
>   fs/ocfs2/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 382401d..1a1e007 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -136,7 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
> blkno)
>   struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>int sysfile_type)
>   {
> - int rc = 0;
> + int rc = -ESTALE;
>   struct inode *inode = NULL;
>   struct super_block *sb = osb->sb;
>   struct ocfs2_find_inode_args args;



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] Fwd: OCFS2 test report against kernel 4.11.0-rc8-2.g540c429-vanilla

2017-05-03 Thread Eric Ren


FYI,


The testing result is good. Refer to the attached testing logs for more 
information.


run-dev-test
*BUILD SUCCESS*
Build URL   http://147.2.207.231:8080/job/zren-testing/job/run-dev-test/81/
Work Space  
http://147.2.207.231:8080/job/zren-testing/job/run-dev-test//ws/81
Build Log   
http://147.2.207.231:8080/job/zren-testing/job/run-dev-test/81//console
Project:run-dev-test
Date of build:  Wed, 03 May 2017 09:38:08 +0800
Build duration: 17 hr
Build cause:Started by user eric
Build description:  
Built on:   HA-236


 Health Report

W   Description Score
Test Result: 0 tests failing out of a total of 38 tests.100
Build stability: No recent builds failed.   100


 Tests Reports




   Test Trend

[Test result trend chart]


   JUnit Tests

Package Failed  Passed  Skipped Total
DiscontigBgMultiNode0   4   0   *4*
DiscontigBgSingleNode   0   5   0   *5*
MultipleNodes   0   9   1   *10*
SingleNode  0   18  1   *19*


 Changes



No Changes




single_run.log
Description: Binary data


multiple-run-x86_64-2017-05-03-12-47-17.log
Description: Binary data
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 02/17] Single Run: kernel building is little broken now

2017-03-13 Thread Eric Ren

Hi Junxiao,

On 03/13/2017 04:12 PM, Junxiao Bi wrote:
> On 12/13/2016 01:29 PM, Eric Ren wrote:
>> Only check kernel source if we specify "buildkernel" test case.
>> The original kernel source web-link cannot be reached,
>> so give a new link instead but the md5sum check is missing
>> now.
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   programs/python_common/single_run-WIP.sh | 56 
>> 
>>   1 file changed, 28 insertions(+), 28 deletions(-)
>>
>> diff --git a/programs/python_common/single_run-WIP.sh 
>> b/programs/python_common/single_run-WIP.sh
>> index fe0056c..61008d8 100755
>> --- a/programs/python_common/single_run-WIP.sh
>> +++ b/programs/python_common/single_run-WIP.sh
>> @@ -20,9 +20,9 @@ WGET=`which wget`
>>   WHOAMI=`which whoami`
>>   SED=`which sed`
>>   
>> -DWNLD_PATH="http://oss.oracle.com/~smushran/ocfs2-test";
>> -KERNEL_TARBALL="linux-kernel.tar.gz"
>> -KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
>> +DWNLD_PATH="https://cdn.kernel.org/pub/linux/kernel/v3.x/";
>> +KERNEL_TARBALL="linux-3.2.80.tar.xz"
>> +#KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
> Can we compute the md5sum manually and put it here?

OK.

Thanks for your review.

Thanks,
Eric
>
> Thanks,
> Junxiao.
>
>>   USERID=`${WHOAMI}`
>>   
>>   DEBUGFS_BIN="${SUDO} `which debugfs.ocfs2`"
>> @@ -85,7 +85,7 @@ get_bits()
>>   # get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> $KERNEL_TARBALL_CHECK
>>   get_kernel_source()
>>   {
>> -if [ "$#" -lt "4" ]; then
>> +if [ "$#" -lt "3" ]; then
>>  ${ECHO} "Error in get_kernel_source()"
>>  exit 1
>>  fi
>> @@ -93,18 +93,18 @@ get_kernel_source()
>>  logdir=$1
>>  dwnld_path=$2
>>  kernel_tarball=$3
>> -kernel_tarball_check=$4
>> +#kernel_tarball_check=$4
>>   
>>  cd ${logdir}
>>   
>>  outlog=get_kernel_source.log
>>   
>> -${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
>> -if [ $? -ne 0 ]; then
>> -${ECHO} "ERROR downloading 
>> ${dwnld_path}/${kernel_tarball_check}"
>> -cd -
>> -exit 1
>> -fi
>> +#   ${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
>> +#   if [ $? -ne 0 ]; then
>> +#   ${ECHO} "ERROR downloading 
>> ${dwnld_path}/${kernel_tarball_check}"
>> +#   cd -
>> +#   exit 1
>> +#   fi
>>   
>>  ${WGET} -a ${outlog} ${dwnld_path}/${kernel_tarball}
>>  if [ $? -ne 0 ]; then
>> @@ -113,13 +113,13 @@ get_kernel_source()
>>  exit 1
>>  fi
>>   
>> -${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
>> -if [ $? -ne 0 ]; then
>> -${ECHO} "ERROR ${kernel_tarball_check} check failed"
>> -cd -
>> -exit 1
>> -fi
>> -cd -
>> +#   ${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
>> +#   if [ $? -ne 0 ]; then
>> +#   ${ECHO} "ERROR ${kernel_tarball_check} check failed"
>> +#   cd -
>> +#   exit 1
>> +#   fi
>> +#   cd -
>>   }
>>   
>>   # do_format() ${BLOCKSIZE} ${CLUSTERSIZE} ${FEATURES} ${DEVICE}
>> @@ -1012,16 +1012,6 @@ LOGFILE=${LOGDIR}/single_run.log
>>   
>>   do_mkdir ${LOGDIR}
>>   
>> -if [ -z ${KERNELSRC} ]; then
>> -get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> $KERNEL_TARBALL_CHECK
>> -KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
>> -fi
>> -
>> -if [ ! -f ${KERNELSRC} ]; then
>> -${ECHO} "No kernel source"
>> -usage
>> -fi
>> -
>>   STARTRUN=$(date +%s)
>>   log_message "*** Start Single Node test ***"
>>   
>> @@ -1058,6 +1048,16 @@ for tc in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; 
>> do
>>  fi
>>   
>>  if [ "$tc"X = "buildkernel"X -o "$tc"X = "all"X ];then
>> +if [ -z ${KERNELSRC} ]; then
>> +get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
>> #$KERNEL_TARBALL_CHECK
>> +KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
>> +fi
>> +
>> +if [ ! -f ${KERNELSRC} ]; then
>> +${ECHO} "No kernel source"
>> +usage
>> +fi
>> +
>>  run_buildkernel ${LOGDIR} ${DEVICE} ${MOUNTPOINT} ${KERNELSRC}
>>  fi
>>   
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3] ocfs2/dlm: Optimization of code while free dead node locks.

2017-01-18 Thread Eric Ren

Hi,

On 01/17/2017 07:22 PM, Guozhonghua wrote:
> Three loops can be optimized into one and its sub loops, so as small code can 
> do the same work.  ===> (1)
>
>  From 8a1e682503f4e5a5299fe8316cbf559f9b9701f1 Mon Sep 17 00:00:00 2001
> From: Guozhonghua 
> Date: Fri, 13 Jan 2017 11:27:32 +0800
> Subject: [PATCH] Optimization of code while free dead locks, changed for
>   reviews.
>   
>===> (2)
>
> Signed-off-by: Guozhonghua 
The patch looks good to me, except some formatting issues:
1. The commit message at (1) should be placed at (2);
2. Change log is still missing;

I think it's not a big deal, though. The fix is quite simple. Wish your patch 
has good
formatting next time;-)

Reviewed-by: Eric Ren 

Eric

> ---
>   fs/ocfs2/dlm/dlmrecovery.c |   39 ++-
>   1 file changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index dd5cb8b..93b71dd 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2268,6 +2268,8 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm,
>   {
>  struct dlm_lock *lock, *next;
>  unsigned int freed = 0;
> +   struct list_head *queue = NULL;
> +   int i;
>
>  /* this node is the lockres master:
>   * 1) remove any stale locks for the dead node
> @@ -2280,31 +2282,18 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm,
>   * to force the DLM_UNLOCK_FREE_LOCK action so as to free the locks 
> */
>
>  /* TODO: check pending_asts, pending_basts here */
> -   list_for_each_entry_safe(lock, next, &res->granted, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->converting, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->blocked, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> +   for (i = DLM_GRANTED_LIST; i <= DLM_BLOCKED_LIST; i++) {
> +   queue = dlm_list_idx_to_ptr(res, i);
> +   list_for_each_entry_safe(lock, next, queue, list) {
> +   if (lock->ml.node == dead_node) {
> +   list_del_init(&lock->list);
> +   dlm_lock_put(lock);
> +   /* Can't schedule DLM_UNLOCK_FREE_LOCK
> +* do manually
> +*/
> +   dlm_lock_put(lock);
> +   freed++;
> +   }
>  }
>  }
>
> --
> 1.7.9.5
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v4 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-17 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 
Reviewed-by: Junxiao Bi 
Reviewed-by: Joseph Qi 
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, &bh, 1, &oh);
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, &bh, 1, &oh);
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* lock

[Ocfs2-devel] [PATCH v4 0/2] fix deadlock caused by recursive cluster locking

2017-01-17 Thread Eric Ren

Hi Andrew,

This patch set version has got reviewed by Joseph and Junxiao Bi. I
think it's good to queued up now.

Thanks for all of you!
Eric

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Change since v3:
- Fixes redundant space, spotted by: Joseph Qi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH v4 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-17 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

Change since v3:
- Fixes redundant space, spotted by: Joseph Qi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
Reviewed-by: Junxiao Bi 
Reviewed-by: Joseph Qi 
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..8dce409 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_owner_pid = get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pi

Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-17 Thread Eric Ren

Hi!

On 01/17/2017 04:43 PM, Joseph Qi wrote:
> On 17/1/17 15:55, Eric Ren wrote:
>> Hi!
>>
>> On 01/17/2017 03:39 PM, Joseph Qi wrote:
>>>
>>> On 17/1/17 14:30, Eric Ren wrote:
>>>> We are in the situation that we have to avoid recursive cluster locking,
>>>> but there is no way to check if a cluster lock has been taken by a
>>>> precess already.
>>>>
>>>> Mostly, we can avoid recursive locking by writing code carefully.
>>>> However, we found that it's very hard to handle the routines that
>>>> are invoked directly by vfs code. For instance:
>>>>
>>>> const struct inode_operations ocfs2_file_iops = {
>>>>  .permission = ocfs2_permission,
>>>>  .get_acl= ocfs2_iop_get_acl,
>>>>  .set_acl= ocfs2_iop_set_acl,
>>>> };
>>>>
>>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>>>> do_sys_open
>>>>   may_open
>>>>inode_permission
>>>> ocfs2_permission
>>>>  ocfs2_inode_lock() <=== first time
>>>>   generic_permission
>>>>get_acl
>>>> ocfs2_iop_get_acl
>>>> ocfs2_inode_lock() <=== recursive one
>>>>
>>>> A deadlock will occur if a remote EX request comes in between two
>>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>>
>>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>>> on behalf of the remote EX lock request. Another hand, the recursive
>>>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>>>> because there is no chance for the first cluster lock on this node to be
>>>> unlocked - we block ourselves in the code path.
>>>>
>>>> The idea to fix this issue is mostly taken from gfs2 code.
>>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>>> keep track of the processes' pid  who has taken the cluster lock
>>>> of this lock resource;
>>>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>>>> it means just getting back disk inode bh for us if we've got cluster lock.
>>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>>> have got the cluster lock in the upper code path.
>>>>
>>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>>> to solve the recursive locking issue cuased by the fact that vfs routines
>>>> can call into each other.
>>>>
>>>> The performance penalty of processing the holder list should only be seen
>>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>>
>>>> You may ask what if the first time we got a PR lock, and the second time
>>>> we want a EX lock? fortunately, this case never happens in the real world,
>>>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>>>> the gfs2 code also do so.
>>>>
>>>> Changes since v1:
>>>> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>>>> process gets the cluster lock - suggested by: Joseph Qi 
>>>> 
>>>> and Junxiao Bi .
>>>>
>>>> - Change "struct ocfs2_holder" to a more meaningful name 
>>>> "ocfs2_lock_holder",
>>>> suggested by: Junxiao Bi.
>>>>
>>>> - Do not inline functions whose bodies are not in scope, changed by:
>>>> Stephen Rothwell .
>>>>
>>>> Changes since v2:
>>>> - Wrap the tracking logic code of recursive locking into functions,
>>>> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
>>>> suggested by: Junxiao Bi.
>>>>
>>>> [s...@canb.auug.org.au remove some inlines]
>>>> Signed-off-by: Eric Ren 
>>>> ---
>>>>   fs/ocfs2/dlmglue.c | 105 
>>>> +++--
>>>>   fs/ocfs2/dlmglue.h |  18 +
>>>>   fs/ocfs2/ocfs2.h   |   1 +
>>>>   3 files changed, 121 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dl

Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

Hi!

On 01/17/2017 03:39 PM, Joseph Qi wrote:
>
> On 17/1/17 14:30, Eric Ren wrote:
>> We are in the situation that we have to avoid recursive cluster locking,
>> but there is no way to check if a cluster lock has been taken by a
>> precess already.
>>
>> Mostly, we can avoid recursive locking by writing code carefully.
>> However, we found that it's very hard to handle the routines that
>> are invoked directly by vfs code. For instance:
>>
>> const struct inode_operations ocfs2_file_iops = {
>>  .permission = ocfs2_permission,
>>  .get_acl= ocfs2_iop_get_acl,
>>  .set_acl= ocfs2_iop_set_acl,
>> };
>>
>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>> do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== first time
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>> ocfs2_inode_lock() <=== recursive one
>>
>> A deadlock will occur if a remote EX request comes in between two
>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>> on behalf of the remote EX lock request. Another hand, the recursive
>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>> because there is no chance for the first cluster lock on this node to be
>> unlocked - we block ourselves in the code path.
>>
>> The idea to fix this issue is mostly taken from gfs2 code.
>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>> keep track of the processes' pid  who has taken the cluster lock
>> of this lock resource;
>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>> it means just getting back disk inode bh for us if we've got cluster lock.
>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>> have got the cluster lock in the upper code path.
>>
>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>> to solve the recursive locking issue cuased by the fact that vfs routines
>> can call into each other.
>>
>> The performance penalty of processing the holder list should only be seen
>> at a few cases where the tracking logic is used, such as get/set acl.
>>
>> You may ask what if the first time we got a PR lock, and the second time
>> we want a EX lock? fortunately, this case never happens in the real world,
>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>> the gfs2 code also do so.
>>
>> Changes since v1:
>> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>> process gets the cluster lock - suggested by: Joseph Qi 
>> 
>> and Junxiao Bi .
>>
>> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
>> suggested by: Junxiao Bi.
>>
>> - Do not inline functions whose bodies are not in scope, changed by:
>> Stephen Rothwell .
>>
>> Changes since v2:
>> - Wrap the tracking logic code of recursive locking into functions,
>> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
>> suggested by: Junxiao Bi.
>>
>> [s...@canb.auug.org.au remove some inlines]
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/dlmglue.c | 105 
>> +++--
>>   fs/ocfs2/dlmglue.h |  18 +
>>   fs/ocfs2/ocfs2.h   |   1 +
>>   3 files changed, 121 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 77d1632..c75b9e9 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>>   init_waitqueue_head(&res->l_event);
>>   INIT_LIST_HEAD(&res->l_blocked_list);
>>   INIT_LIST_HEAD(&res->l_mask_waiters);
>> +INIT_LIST_HEAD(&res->l_holders);
>>   }
>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>> @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>   res->l_flags = 0UL;
>>   }
>>   +/*
>> + * Keep a list of processes who have interest in a lockres.
>> + * Note: this is now only uesed for check recursive cluster lockin

[Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock

[Ocfs2-devel] [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, &bh, 1, &oh);
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, &oh, had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, &bh, 1, &oh);
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+

[Ocfs2-devel] [PATCH v3 0/2] fix deadlock caused by recursive cluster locking

2017-01-16 Thread Eric Ren

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Hi!

On 01/16/2017 02:58 PM, Junxiao Bi wrote:
> On 01/16/2017 02:42 PM, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
>>
>> Changes since v1:
>> 1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>> process gets the cluster lock - suggested by: Joseph Qi 
>> 
>> and Junxiao Bi .
>>
>> 2. Change "struct ocfs2_holder" to a more meaningful name 
>> "ocfs2_lock_holder",
>> suggested by: Junxiao Bi .
>>
>> 3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
>> catch exceptional cases, suggested by: Junxiao Bi .
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/acl.c  | 39 +
>>   fs/ocfs2/file.c | 76 
>> +
>>   2 files changed, 100 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..3e47262 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>>  int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, &bh, 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>  if (status < 0) {
>>  if (status != -ENOENT)
>>  mlog_errno(status);
>>  return status;
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>> +
> Same code pattern showed here and *get_acl, can it be abstracted to one
> function?
> The same issue for *setattr and *permission. Sorry for not mention that
> in last review.

Good idea! I will do it in the next version;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>  brelse(bh);
>> +
>>  return status;
>>   }
>>   
>> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>>  int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +ret = ocfs2_inode_lock_full(inode, &d

[Ocfs2-devel] [PATCH v2 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .

3. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 48 +---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..b045f02 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,46 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(&lockres->l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, &lockres->l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid) {
+   spin_unlock(&lockres->l_lock);
+   return 1;
+   }
+   }
+   spin_unlock(&lockres->l_lock);
+

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .

3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi .

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 39 +
 fs/ocfs2/file.c | 76 +
 2 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..3e47262 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
int status = 0;
-
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
return status;
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 1);
+   }
brelse(bh);
+
return status;
 }
 
@@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
int ret;
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
if (ret < 0) {
if (ret != -ENOENT)
mlog_errno(ret);
return ERR_PTR(ret);
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 0);
+   }
brelse(di_bh);
+
return acl;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..b620c25 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int arg_flags = 0, had_lock;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,13 +1176,41 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   lockres = &OCFS2_

[Ocfs2-devel] [PATCH v2 0/2] fix deadlock caused by recursive cluster locking

2017-01-15 Thread Eric Ren

This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .
 
3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi .
 
4. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c | 39 
 fs/ocfs2/dlmglue.c | 48 +++---
 fs/ocfs2/dlmglue.h | 18 +
 fs/ocfs2/file.c| 76 +++---
 fs/ocfs2/ocfs2.h   |  1 +
 5 files changed, 164 insertions(+), 18 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Eric Ren

Hi Junxiao,
>> OK, good suggestion. Hrm, but in order to align with "ocfs2_inc_holders", I
>> think it's good to keep those function names as it is;-)
> that name is also not very clear. Maybe you can make another patch to
> clear it.

Maybe, the name completeness needs to compromise with the name length at
some time.  One of basic rules is whether the name may confuse the reader.
In this case,  "ocfs2_inc_holders"  in dlmglue.c sounds good to me, not 
ambiguous.

I want to go with it. Anyone who don't like the name can propose their patch 
for it;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>
>> Thanks for your review!
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
 +struct list_head oh_list;
 +struct pid *oh_owner_pid;
 +};
 +
/* ocfs2_inode_lock_full() 'arg_flags' flags */
/* don't wait on recovery. */
#define OCFS2_META_LOCK_RECOVERY(0x01)
 @@ -77,6 +82,8 @@ struct ocfs2_orphan_scan_lvb {
#define OCFS2_META_LOCK_NOQUEUE(0x02)
/* don't block waiting for the downconvert thread, instead return
 -EAGAIN */
#define OCFS2_LOCK_NONBLOCK(0x04)
 +/* just get back disk inode bh if we've got cluster lock. */
 +#define OCFS2_META_LOCK_GETBH(0x08)
  /* Locking subclasses of inode cluster lock */
enum {
 @@ -170,4 +177,15 @@ void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug
 *dlm_debug);
  /* To set the locking protocol on module initialization */
void ocfs2_set_locking_protocol(void);
 +
 +/*
 + * Keep a list of processes who have interest in a lockres.
 + * Note: this is now only uesed for check recursive cluster lock.
 + */
 +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
 + struct ocfs2_holder *oh);
 +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
 + struct ocfs2_holder *oh);
 +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct
 ocfs2_lock_res *lockres);
 +
#endif/* DLMGLUE_H */
 diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
 index 7e5958b..0c39d71 100644
 --- a/fs/ocfs2/ocfs2.h
 +++ b/fs/ocfs2/ocfs2.h
 @@ -172,6 +172,7 @@ struct ocfs2_lock_res {
  struct list_head l_blocked_list;
struct list_head l_mask_waiters;
 +struct list_head l_holders;
  unsigned long l_flags;
char l_name[OCFS2_LOCK_ID_MAX_LEN];

>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

On 01/16/2017 11:13 AM, Junxiao Bi wrote:
> On 01/16/2017 11:06 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>>>> can help us catch bug at the first time.
>>>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>>>> who takes the cluster lock.
>>>> It's harder for me to name all the possible paths;-/
>>> The BUG_ON() can help catch the path where ocfs2_setattr is not the
>>> first one.
>> Yes, I understand. But, the problem is that the vfs entries calling
>> order is out of our control.
>> I don't want to place an assertion where I'm not 100% sure it's
>> absolutely right;-)
> If it is not the first one, is it another recursive locking bug? In this
> case, if you don't like BUG_ON(), you can dump the call trace and print
> some warning message.

Yes! I like this idea, will add it in next version, thanks!

Eric

>
> Thanks,
> Junxiao.
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
>>>>>> +if (had_lock)
>>>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>>>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>>>>> if (status < 0) {
>>>>>> if (status != -ENOENT)
>>>>>> mlog_errno(status);
>>>>>> goto bail_unlock_rw;
>>>>>> }
>>>>>> -inode_locked = 1;
>>>>>> +if (!had_lock) {
>>>>>> +ocfs2_add_holder(lockres, &oh);
>>>>>> +inode_locked = 1;
>>>>>> +}
>>>>>>   if (size_change) {
>>>>>> status = inode_newsize_ok(inode, attr->ia_size);
>>>>>> @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
>>>>>> iattr *attr)
>>>>>> bail_commit:
>>>>>> ocfs2_commit_trans(osb, handle);
>>>>>> bail_unlock:
>>>>>> -if (status) {
>>>>>> +if (status && inode_locked) {
>>>>>> +ocfs2_remove_holder(lockres, &oh);
>>>>>> ocfs2_inode_unlock(inode, 1);
>>>>>> inode_locked = 0;
>>>>>> }
>>>>>> @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
>>>>>> struct iattr *attr)
>>>>>> if (status < 0)
>>>>>> mlog_errno(status);
>>>>>> }
>>>>>> -if (inode_locked)
>>>>>> +if (inode_locked) {
>>>>>> +ocfs2_remove_holder(lockres, &oh);
>>>>>> ocfs2_inode_unlock(inode, 1);
>>>>>> +}
>>>>>>   brelse(bh);
>>>>>> return status;
>>>>>> @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
>>>>>> int ocfs2_permission(struct inode *inode, int mask)
>>>>>> {
>>>>>> int ret;
>>>>>> +int has_locked;
>>>>>> +struct ocfs2_holder oh;
>>>>>> +struct ocfs2_lock_res *lockres;
>>>>>>   if (mask & MAY_NOT_BLOCK)
>>>>>> return -ECHILD;
>>>>>> -ret = ocfs2_inode_lock(inode, NULL, 0);
>>>>>> -if (ret) {
>>>>>> -if (ret != -ENOENT)
>>>>>> -mlog_errno(ret);
>>>>>> -goto out;
>>>>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>>> The same thing as ocfs2_setattr.
>>>> OK. I will think over your suggestions!
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>> Thanks,
>>>>> Junxiao.
>>>>>> +if (!has_locked) {
>>>>>> +ret = ocfs2_inode_lock(inode, NULL, 0);
>>>>>> +if (ret) {
>>>>>> +if (ret != -ENOENT)
>>>>>> +mlog_errno(ret);
>>>>>> +goto out;
>>>>>> +}
>>>>>> +ocfs2_add_holder(lockres, &oh);
>>>>>> }
>>>>>>   ret = generic_permission(inode, mask);
>>>>>> -ocfs2_inode_unlock(inode, 0);
>>>>>> +if (!has_locked) {
>>>>>> +ocfs2_remove_holder(lockres, &oh);
>>>>>> +ocfs2_inode_unlock(inode, 0);
>>>>>> +}
>>>>>> out:
>>>>>> return ret;
>>>>>> }
>>>>>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren

Hi Junxiao,

On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>> can help us catch bug at the first time.
>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>> who takes the cluster lock.
>> It's harder for me to name all the possible paths;-/
> The BUG_ON() can help catch the path where ocfs2_setattr is not the
> first one.
Yes, I understand. But, the problem is that the vfs entries calling order is 
out of our control.
I don't want to place an assertion where I'm not 100% sure it's absolutely 
right;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>>>
 +if (had_lock)
 +arg_flags = OCFS2_META_LOCK_GETBH;
 +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
goto bail_unlock_rw;
}
 -inode_locked = 1;
 +if (!had_lock) {
 +ocfs2_add_holder(lockres, &oh);
 +inode_locked = 1;
 +}
  if (size_change) {
status = inode_newsize_ok(inode, attr->ia_size);
 @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
 iattr *attr)
bail_commit:
ocfs2_commit_trans(osb, handle);
bail_unlock:
 -if (status) {
 +if (status && inode_locked) {
 +ocfs2_remove_holder(lockres, &oh);
ocfs2_inode_unlock(inode, 1);
inode_locked = 0;
}
 @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
 struct iattr *attr)
if (status < 0)
mlog_errno(status);
}
 -if (inode_locked)
 +if (inode_locked) {
 +ocfs2_remove_holder(lockres, &oh);
ocfs2_inode_unlock(inode, 1);
 +}
  brelse(bh);
return status;
 @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
int ocfs2_permission(struct inode *inode, int mask)
{
int ret;
 +int has_locked;
 +struct ocfs2_holder oh;
 +struct ocfs2_lock_res *lockres;
  if (mask & MAY_NOT_BLOCK)
return -ECHILD;
-ret = ocfs2_inode_lock(inode, NULL, 0);
 -if (ret) {
 -if (ret != -ENOENT)
 -mlog_errno(ret);
 -goto out;
 +lockres = &OCFS2_I(inode)->ip_inode_lockres;
 +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> The same thing as ocfs2_setattr.
>> OK. I will think over your suggestions!
>>
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
 +if (!has_locked) {
 +ret = ocfs2_inode_lock(inode, NULL, 0);
 +if (ret) {
 +if (ret != -ENOENT)
 +mlog_errno(ret);
 +goto out;
 +}
 +ocfs2_add_holder(lockres, &oh);
}
  ret = generic_permission(inode, mask);
-ocfs2_inode_unlock(inode, 0);
 +if (!has_locked) {
 +ocfs2_remove_holder(lockres, &oh);
 +ocfs2_inode_unlock(inode, 0);
 +}
out:
return ret;
}

>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-13 Thread Eric Ren

On 01/13/2017 10:52 AM, Changwei Ge wrote:
> Hi Joseph,
>
> Do you think my last version of patch to fix umount hang after journal
> flushing failure is OK?
>
> If so, I 'd like to ask Andrew's help to merge this patch into his test
> tree.
>
>
> Thanks,
>
> Br.
>
> Changwei

The message above should not occur in a formal patch.  It should be put in 
"cover-letter" if
you want to say something to the other developers. See "git format-patch 
--cover-letter".

>
>
>
>  From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
> From: Changwei Ge 
> Date: Wed, 11 Jan 2017 09:05:35 +0800
> Subject: [PATCH] fix umount hang after journal flushing failure

The commit message is needed here! It should describe what's your problem, how 
to reproduce it,
and what's your solution, things like that.

>
> Signed-off-by: Changwei Ge 
> ---
>   fs/ocfs2/journal.c |   18 ++
>   1 file changed, 18 insertions(+)
>
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index a244f14..5f3c862 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
>   "commit_thread: %u transactions pending on "
>   "shutdown\n",
>   atomic_read(&journal->j_num_trans));
> +
> +   if (status < 0) {
> +   mlog(ML_ERROR, "journal is already abort
> and cannot be "
> +"flushed any more. So ignore
> the pending "
> +"transactions to avoid blocking
> ocfs2 unmount.\n");

Can you find any example in the kernel source to print out message like that?!

I saw Joseph showed you the right way in previous email:
"

if (status < 0) {

  mlog(ML_ERROR, "journal is already abort and cannot be "

  "flushed any more. So ignore the pending "

  "transactions to avoid blocking ocfs2 unmount.\n");

"
So, please be careful and learn from the kernel source and the right way other 
developers do in
their patch work. Otherwise, it's meaningless to waste others' time in such 
basic issues.

> +   /*
> +* This may a litte hacky, however, no
> chance
> +* for ocfs2/journal to decrease this
> variable
> +* thourgh commit-thread. I have to do so to
> +* avoid umount hang after journal flushing
> +* failure. Since jounral has been
> marked ABORT
> +* within jbd2_journal_flush, commit
> cache will
> +* never do any real work to flush
> journal to
> +* disk.Set it to ZERO so that umount will
> +* continue during shutting down journal
> +*/
> +   atomic_set(&journal->j_num_trans, 0);
It's possible to corrupt data doing this way. Why not just crash the kernel 
when jbd2 aborts?
and let the other node to do the journal recovery. It's the strength of cluster 
filesystem.

Anyway, it's glad to see you guys making contributions!

Thanks,
Eric


> +   }
>  }
>  }
>
> --
> 1.7.9.5
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-12 Thread Eric Ren

Hi!

On 01/13/2017 12:22 PM, Junxiao Bi wrote:
> On 01/05/2017 11:31 PM, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/acl.c  | 39 ++-
>>   fs/ocfs2/file.c | 44 ++--
>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..c539890 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>>  int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, &bh, 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>  if (status < 0) {
>>  if (status != -ENOENT)
>>  mlog_errno(status);
>>  return status;
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>> +
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>  brelse(bh);
>> +
>>  return status;
>>   }
>>   
>> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>>  int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
>>  if (ret < 0) {
>>  if (ret != -ENOENT)
>>  mlog_errno(ret);
>>  return ERR_PTR(ret);
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>>   
>>  acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   
>> -ocfs2_inode_unlock(inode, 0);
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 0);
>> +}
>>  brelse(di_bh);
>> +
>>  return acl;
>>   }
>>   
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index c488965..62be75d 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-12 Thread Eric Ren

Hi Junxiao!

On 01/13/2017 11:59 AM, Junxiao Bi wrote:
> On 01/05/2017 11:31 PM, Eric Ren wrote:
>> We are in the situation that we have to avoid recursive cluster locking,
>> but there is no way to check if a cluster lock has been taken by a
>> precess already.
>>
>> Mostly, we can avoid recursive locking by writing code carefully.
>> However, we found that it's very hard to handle the routines that
>> are invoked directly by vfs code. For instance:
>>
>> const struct inode_operations ocfs2_file_iops = {
>>  .permission = ocfs2_permission,
>>  .get_acl= ocfs2_iop_get_acl,
>>  .set_acl= ocfs2_iop_set_acl,
>> };
>>
>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>> do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== first time
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== recursive one
>>
>> A deadlock will occur if a remote EX request comes in between two
>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>> on behalf of the remote EX lock request. Another hand, the recursive
>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>> because there is no chance for the first cluster lock on this node to be
>> unlocked - we block ourselves in the code path.
>>
>> The idea to fix this issue is mostly taken from gfs2 code.
>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>> keep track of the processes' pid  who has taken the cluster lock
>> of this lock resource;
>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>> it means just getting back disk inode bh for us if we've got cluster lock.
>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>> have got the cluster lock in the upper code path.
>>
>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>> to solve the recursive locking issue cuased by the fact that vfs routines
>> can call into each other.
>>
>> The performance penalty of processing the holder list should only be seen
>> at a few cases where the tracking logic is used, such as get/set acl.
>>
>> You may ask what if the first time we got a PR lock, and the second time
>> we want a EX lock? fortunately, this case never happens in the real world,
>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>> the gfs2 code also do so.
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/dlmglue.c | 47 ---
>>   fs/ocfs2/dlmglue.h | 18 ++
>>   fs/ocfs2/ocfs2.h   |  1 +
>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 83d576f..500bda4 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>>  init_waitqueue_head(&res->l_event);
>>  INIT_LIST_HEAD(&res->l_blocked_list);
>>  INIT_LIST_HEAD(&res->l_mask_waiters);
>> +INIT_LIST_HEAD(&res->l_holders);
>>   }
>>   
>>   void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>  res->l_flags = 0UL;
>>   }
>>   
>> +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>> +   struct ocfs2_holder *oh)
>> +{
>> +INIT_LIST_HEAD(&oh->oh_list);
>> +oh->oh_owner_pid =  get_pid(task_pid(current));
> struct pid(oh->oh_owner_pid) looks complicated here, why not use
> task_struct(current) or pid_t(current->pid) directly? Also i didn't see
> the ref count needs to be considered.

This is learned from gfs2 code, which is tested by practice. So, I think it's 
not bad
to keep it;-)

>
>> +
>> +spin_lock(&lockres->l_lock);
>> +list_add_tail(&oh->oh_list, &lockres->l_holders);
>> +spin_unlock(&lockres->l_lock);
>> +}
>> +
>> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
>> +

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-12 Thread Eric Ren

Hi Joseph,

On 01/09/2017 10:13 AM, Eric Ren wrote:
>>>> So you are trying to fix it by making phase3 finish without really doing
>>> Phase3 can go ahead because this node is already under protection of 
>>> cluster lock.
>> You said it was blocked...
> Oh, sorry, I meant phase3 can go ahead if this patch set is applied;-)
>
>> "Another hand, the recursive cluster lock (the second one) will be blocked in
>> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED."
>>>> __ocfs2_cluster_lock, then Process B can continue either.
>>>> Let us bear in mind that phase1 and phase3 are in the same context and
>>>> executed in order. That's why I think there is no need to check if locked
>>>> by myself in phase1.
> Sorry, I still cannot see it. Without keeping track of the first cluster 
> lock, how can we
> know if
> we are under a context that has already been in the protecting of cluster 
> lock? How can we
> handle
> the recursive locking (the second cluster lock) if we don't have this 
> information?
>>>> If phase1 finds it is already locked by myself, that means the holder
>>>> is left by last operation without dec holder. That's why I think it is a 
>>>> bug
>>>> instead of a recursive lock case.
> I think I got your point here. Do you mean that we should just add the lock 
> holder at the
> first locking position
> without checking before that? Unfortunately, it's tricky here to know exactly 
> which ocfs2
> routine will be the first vfs
> entry point, such as ocfs2_get_acl() which can be both the first vfs entry 
> point and the
> second vfs entry point after
> ocfs2_permission(), right?
>
> It will be a coding bug if the problem you concern about happens. I think we 
> don't need to
> worry about this much because
> the code logic here is quite simple;-)
Ping...

Did I clear your doubts by the last email? I really want to get your point, if 
not.

If there's any problem, I will fix them in the next version;-)

Thanks,
Eric

>
> Thanks for your patience!
> Eric
>
>>> D


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-08 Thread Eric Ren

Hi Fengguang,

On 01/06/2017 10:52 PM, kbuild test robot wrote:
> Hi Eric,
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.10-rc2 next-20170106]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Eric-Ren/fix-deadlock-caused-by-recursive-cluster-locking/20170106-200837
> config: ia64-allyesconfig (attached as .config)
> compiler: ia64-linux-gcc (GCC) 6.2.0
> reproduce:
>  wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>  chmod +x ~/bin/make.cross
>  # save the attached .config to linux build tree
>  make.cross ARCH=ia64

I failed to reproduce this issue locally by following the above instructions, 
after rebasing 
my patch set onto the lastest
mainline(Linux 4.10-rc3), only seeing this compiler error message:
"
test:/mnt/build/linux # make.cross ARCH=ia64
make CROSS_COMPILE=/opt/gcc-4.9.0-nolibc/ia64-linux/bin/ia64-linux- --jobs=4 
ARCH=ia64
...
   CALLscripts/checksyscalls.sh
:1184:2: warning: #warning syscall perf_event_open not implemented 
[-Wcpp]
:1238:2: warning: #warning syscall seccomp not implemented [-Wcpp]
:1316:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp]
:1319:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp]
:1322:2: warning: #warning syscall pkey_free not implemented [-Wcpp]
...
  AS  arch/ia64/kernel/gate.o
arch/ia64/kernel/entry.S: Assembler messages:
arch/ia64/kernel/entry.S:622: Error: Operand 2 of `adds' should be a 14-bit 
integer (-8192-8191)
arch/ia64/kernel/entry.S:729: Error: Operand 2 of `adds' should be a 14-bit 
integer (-8192-8191)
arch/ia64/kernel/entry.S:860: Error: Operand 2 of `adds' should be a 14-bit 
integer (-8192-8191)
make[1]: *** [scripts/Makefile.build:393: arch/ia64/kernel/entry.o] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:988: arch/ia64/kernel] Error 2
make: *** Waiting for unfinished jobs
"

The obvious difference I noticed is my gcc version is little newer than kbuild, 
not sure if 
it's related:
"
test:/mnt/build/linux # gcc -v
gcc version 6.2.1 20160830 [gcc-6-branch revision 239856] (SUSE Linux)
"

>
> All errors (new ones prefixed by >>):
>
> In file included from fs/ocfs2/acl.c:31:0:
> fs/ocfs2/acl.c: In function 'ocfs2_iop_set_acl':
>>> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline 
>>> 'ocfs2_is_locked_by_me': function body not available
>  inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
> *lockres);

This error is probably because I should not add "inline" at the declaration 
while putting 
the function body into source file.
But, no error or warning occurred when I built and tested locally this way:
"
test:/mnt/build/linux/fs/ocfs2 # make -C /lib/modules/4.9.0-2-vanilla/build 
M=`pwd` modules
"

Anyway, I wanna make kbuild silent before resending again;-) Please correct me 
if I'm 
missing something?

Thanks!
Eric
>  ^
> fs/ocfs2/acl.c:292:16: note: called from here
>   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> ^~
> In file included from fs/ocfs2/acl.c:31:0:
>>> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline 
>>> 'ocfs2_is_locked_by_me': function body not available
>  inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
> *lockres);
>  ^
> fs/ocfs2/acl.c:292:16: note: called from here
>   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> ^~
> In file included from fs/ocfs2/acl.c:31:0:
>>> fs/ocfs2/dlmglue.h:185:13: error: inlining failed in call to always_inline 
>>> 'ocfs2_add_holder': function body not available
>  inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>  ^~~~
> fs/ocfs2/acl.c:302:3: note: called from here
>ocfs2_add_holder(lockres, &oh);
>^~
> In file included from fs/ocfs2/acl.c:31:0:
>>> fs/ocfs2/dlmglue.h:187:13: error: inlining failed in call to always_inline 
>>> 'ocfs2_remove_holder': function body not available
>  inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
>  ^~~
> fs/ocfs2/acl.c:307:3: note: called from here
>ocfs2_remove_holder(lockres, &oh);
>^~

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-08 Thread Eric Ren

Hi,

On 01/09/2017 09:13 AM, Joseph Qi wrote:
> ...
>>
>>> The issue case you are trying to fix is:
>>> Process A
>>> take inode lock (phase1)
>>> ...
>>> <<< race window (phase2, Process B)
>>
>> The deadlock only happens if process B is on a remote node and request EX 
>> lock.
>>
>> Quote the patch[1/2]'s commit message:
>>
>> A deadlock will occur if a remote EX request comes in between two of
>> ocfs2_inode_lock().  Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
>> the remote EX lock request.  Another hand, the recursive cluster lock (the
>> second one) will be blocked in in __ocfs2_cluster_lock() because of
>> OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why? because
>> there is no chance for the first cluster lock on this node to be unlocked
>> - we block ourselves in the code path.
>> ---
>>
>>> ...
>>> take inode lock again (phase3)
>>>
>>> Deadlock happens because Process B in phase2 and Process A in phase3
>>> are waiting for each other.
>> It's local lock's (like i_mutex) responsibility to protect critical section 
>> from racing
>> among processes on the same node.
> I know we are talking a cluster lock issue. And the Process B I described is
> downconvert thread.

That's fine!

>>
>>> So you are trying to fix it by making phase3 finish without really doing
>>
>> Phase3 can go ahead because this node is already under protection of cluster 
>> lock.
> You said it was blocked...

Oh, sorry, I meant phase3 can go ahead if this patch set is applied;-)

> "Another hand, the recursive cluster lock (the second one) will be blocked in
> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED."
>>
>>> __ocfs2_cluster_lock, then Process B can continue either.
>>> Let us bear in mind that phase1 and phase3 are in the same context and
>>> executed in order. That's why I think there is no need to check if locked
>>> by myself in phase1.
Sorry, I still cannot see it. Without keeping track of the first cluster lock, 
how can we 
know if
we are under a context that has already been in the protecting of cluster lock? 
How can we 
handle
the recursive locking (the second cluster lock) if we don't have this 
information?
>>> If phase1 finds it is already locked by myself, that means the holder
>>> is left by last operation without dec holder. That's why I think it is a bug
>>> instead of a recursive lock case.
I think I got your point here. Do you mean that we should just add the lock 
holder at the 
first locking position
without checking before that? Unfortunately, it's tricky here to know exactly 
which ocfs2 
routine will be the first vfs
entry point, such as ocfs2_get_acl() which can be both the first vfs entry 
point and the 
second vfs entry point after
ocfs2_permission(), right?

It will be a coding bug if the problem you concern about happens. I think we 
don't need to 
worry about this much because
the code logic here is quite simple;-)

Thanks for your patience!
Eric

>>
>> Did I answer your question?
>>
>> Thanks!
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>>
>>>>>> Thanks,
>>>>>> Eric
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Joseph
>>>>>>>>
>>>>>>>> Thanks for your review;-)
>>>>>>>> Eric
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Joseph
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eric Ren 
>>>>>>>>>> ---
>>>>>>>>>>   fs/ocfs2/acl.c  | 39 ++-
>>>>>>>>>>   fs/ocfs2/file.c | 44 ++--
>>>>>>>>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>>>>>>>> index bed1fcb..c539890 100644
>>>>>>>>&g

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Eric Ren

Hi!

On 01/06/2017 05:55 PM, Joseph Qi wrote:
> On 17/1/6 17:13, Eric Ren wrote:
>> Hi,
>>
>>>>>>>>
>>>>>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>>>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>>>>>> ocfs2_setattr().
>>>>>>> As described cases above, shall we just add the tracking logic only for 
>>>>>>> set/get_acl()?
>>>>>>
>>>>>> The idea is to detect recursive locking on the running task stack. Take 
>>>>>> case 1) for 
>>>>>> example if ocfs2_permisssion()
>>>>>> is not changed:
>>>>>>
>>>>>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>>>>>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder 
>>>>>> on the 
>>>>>> tracking list
>>>>> I mean we have no need to check if locked by me, just do inode lock and 
>>>>> add holder.
>>>>> This will make code more clean, IMO.
>>>> Oh, sorry, I get your point this time. I think we need to check it if 
>>>> there are more 
>>>> than one processes that hold
>>>> PR lock on the same resource.  If I don't understand you correctly, please 
>>>> tell me why 
>>>> you think it's not neccessary
>>>> to check before getting lock?
>>> The code logic can only check if it is locked by myself. In the case
>> Why only...?
>>> described above, ocfs2_permission is the first entry to take inode lock.
>>> And even if check succeeds, it is a bug without unlock, but not the case
>>> of recursive lock.
>>
>> By checking succeeds, you mean it's locked by me, right? If so, this flag
>>   "arg_flags = OCFS2_META_LOCK_GETBH"
>> will be passed down to ocfs2_inode_lock_full(), which gets back buffer head 
>> of
>> the disk inode for us if necessary, but doesn't take cluster locking again. 
>> So, there is
>> no need to unlock in such case.
> I am trying to state my point more clearly...

Thanks a lot!

> The issue case you are trying to fix is:
> Process A
> take inode lock (phase1)
> ...
> <<< race window (phase2, Process B)

The deadlock only happens if process B is on a remote node and request EX lock.

Quote the patch[1/2]'s commit message:

A deadlock will occur if a remote EX request comes in between two of
ocfs2_inode_lock().  Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of
the remote EX lock request.  Another hand, the recursive cluster lock (the
second one) will be blocked in in __ocfs2_cluster_lock() because of
OCFS2_LOCK_BLOCKED.  But, the downconvert never complete, why?  because
there is no chance for the first cluster lock on this node to be unlocked
- we block ourselves in the code path.
---

> ...
> take inode lock again (phase3)
>
> Deadlock happens because Process B in phase2 and Process A in phase3
> are waiting for each other.
It's local lock's (like i_mutex) responsibility to protect critical section 
from racing
among processes on the same node.

> So you are trying to fix it by making phase3 finish without really doing

Phase3 can go ahead because this node is already under protection of cluster 
lock.

> __ocfs2_cluster_lock, then Process B can continue either.
> Let us bear in mind that phase1 and phase3 are in the same context and
> executed in order. That's why I think there is no need to check if locked
> by myself in phase1.
> If phase1 finds it is already locked by myself, that means the holder
> is left by last operation without dec holder. That's why I think it is a bug
> instead of a recursive lock case.

Did I answer your question?

Thanks!
Eric

>
> Thanks,
> Joseph
>>
>> Thanks,
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Thanks,
>>>> Eric
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>>
>>>>>> Thanks for your review;-)
>>>>>> Eric
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Joseph
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Ren 
>>>>>>>> ---
>>>>>>>>   fs/ocfs2/acl.c  | 39 ++

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Eric Ren

Hi,

>>>>>>
>>>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>>>> ocfs2_setattr().
>>>>> As described cases above, shall we just add the tracking logic only for 
>>>>> set/get_acl()?
>>>>
>>>> The idea is to detect recursive locking on the running task stack. Take 
>>>> case 1) for 
>>>> example if ocfs2_permisssion()
>>>> is not changed:
>>>>
>>>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>>>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder 
>>>> on the 
>>>> tracking list
>>> I mean we have no need to check if locked by me, just do inode lock and add 
>>> holder.
>>> This will make code more clean, IMO.
>> Oh, sorry, I get your point this time. I think we need to check it if there 
>> are more than 
>> one processes that hold
>> PR lock on the same resource.  If I don't understand you correctly, please 
>> tell me why 
>> you think it's not neccessary
>> to check before getting lock?
> The code logic can only check if it is locked by myself. In the case
Why only...?
> described above, ocfs2_permission is the first entry to take inode lock.
> And even if check succeeds, it is a bug without unlock, but not the case
> of recursive lock.

By checking succeeds, you mean it's locked by me, right? If so, this flag
   "arg_flags = OCFS2_META_LOCK_GETBH"
will be passed down to ocfs2_inode_lock_full(), which gets back buffer head of
the disk inode for us if necessary, but doesn't take cluster locking again. So, 
there is
no need to unlock in such case.

Thanks,
Eric

>
> Thanks,
> Joseph
>>
>> Thanks,
>> Eric
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Thanks for your review;-)
>>>> Eric
>>>>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>>
>>>>>> Signed-off-by: Eric Ren 
>>>>>> ---
>>>>>>   fs/ocfs2/acl.c  | 39 ++-
>>>>>>   fs/ocfs2/file.c | 44 ++--
>>>>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>>>> index bed1fcb..c539890 100644
>>>>>> --- a/fs/ocfs2/acl.c
>>>>>> +++ b/fs/ocfs2/acl.c
>>>>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>>>>>> posix_acl 
>>>>>> *acl, int type)
>>>>>>   {
>>>>>>   struct buffer_head *bh = NULL;
>>>>>>   int status = 0;
>>>>>> -
>>>>>> -status = ocfs2_inode_lock(inode, &bh, 1);
>>>>>> +int arg_flags = 0, has_locked;
>>>>>> +struct ocfs2_holder oh;
>>>>>> +struct ocfs2_lock_res *lockres;
>>>>>> +
>>>>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>>>> +if (has_locked)
>>>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>>>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>>>>>   if (status < 0) {
>>>>>>   if (status != -ENOENT)
>>>>>>   mlog_errno(status);
>>>>>>   return status;
>>>>>>   }
>>>>>> +if (!has_locked)
>>>>>> +ocfs2_add_holder(lockres, &oh);
>>>>>> +
>>>>>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>>>>> -ocfs2_inode_unlock(inode, 1);
>>>>>> +
>>>>>> +if (!has_locked) {
>>>>>> +ocfs2_remove_holder(lockres, &oh);
>>>>>> +ocfs2_inode_unlock(inode, 1);
>>>>>> +}
>>>>>>   brelse(bh);
>>>>>> +
>>>>>>   return status;
>>>>>>   }
>>>>>>   @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>>>>>> *inode, int 
>>>>>>

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Eric Ren

On 01/06/2017 03:14 PM, Joseph Qi wrote:
>
>
> On 17/1/6 14:56, Eric Ren wrote:
>> On 01/06/2017 02:09 PM, Joseph Qi wrote:
>>> Hi Eric,
>>>
>>>
>>> On 17/1/5 23:31, Eric Ren wrote:
>>>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>>>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>>>> after the patch was merged. The discussion happened here
>>>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>>>
>>>> The reason why taking cluster inode lock at vfs entry points opens up
>>>> a self deadlock window, is explained in the previous patch of this
>>>> series.
>>>>
>>>> So far, we have seen two different code paths that have this issue.
>>>> 1. do_sys_open
>>>>   may_open
>>>>inode_permission
>>>> ocfs2_permission
>>>>  ocfs2_inode_lock() <=== take PR
>>>>   generic_permission
>>>>get_acl
>>>> ocfs2_iop_get_acl
>>>>  ocfs2_inode_lock() <=== take PR
>>>> 2. fchmod|fchmodat
>>>>  chmod_common
>>>>   notify_change
>>>>ocfs2_setattr <=== take EX
>>>> posix_acl_chmod
>>>>  get_acl
>>>>   ocfs2_iop_get_acl <=== take PR
>>>>  ocfs2_iop_set_acl <=== take EX
>>>>
>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>> ocfs2_setattr().
>>> As described cases above, shall we just add the tracking logic only for 
>>> set/get_acl()?
>>
>> The idea is to detect recursive locking on the running task stack. Take case 
>> 1) for 
>> example if ocfs2_permisssion()
>> is not changed:
>>
>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder on 
>> the tracking 
>> list
> I mean we have no need to check if locked by me, just do inode lock and add 
> holder.
> This will make code more clean, IMO.
Oh, sorry, I get your point this time. I think we need to check it if there are 
more than 
one processes that hold
PR lock on the same resource.  If I don't understand you correctly, please tell 
me why you 
think it's not neccessary
to check before getting lock?

Thanks,
Eric
>
> Thanks,
> Joseph
>>
>> Thanks for your review;-)
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>>
>>>> Signed-off-by: Eric Ren 
>>>> ---
>>>>   fs/ocfs2/acl.c  | 39 ++-
>>>>   fs/ocfs2/file.c | 44 ++--
>>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>> index bed1fcb..c539890 100644
>>>> --- a/fs/ocfs2/acl.c
>>>> +++ b/fs/ocfs2/acl.c
>>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>>>> posix_acl *acl, 
>>>> int type)
>>>>   {
>>>>   struct buffer_head *bh = NULL;
>>>>   int status = 0;
>>>> -
>>>> -status = ocfs2_inode_lock(inode, &bh, 1);
>>>> +int arg_flags = 0, has_locked;
>>>> +struct ocfs2_holder oh;
>>>> +struct ocfs2_lock_res *lockres;
>>>> +
>>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>> +if (has_locked)
>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>>>   if (status < 0) {
>>>>   if (status != -ENOENT)
>>>>   mlog_errno(status);
>>>>   return status;
>>>>   }
>>>> +if (!has_locked)
>>>> +ocfs2_add_holder(lockres, &oh);
>>>> +
>>>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>>> -ocfs2_inode_unlock(inode, 1);
>>>> +
>>>> +if (!has_locked) {
>>>> +ocfs2_remove_holder(lockres, &oh);
>>>> +ocfs2_inode_unlock(inode, 1);
>&g

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-06 Thread Eric Ren

On 01/06/2017 03:24 PM, Joseph Qi wrote:
>
>
> On 17/1/6 15:03, Eric Ren wrote:
>> On 01/06/2017 02:07 PM, Joseph Qi wrote:
>>> Hi Eric,
>>>
>>>
>>> On 17/1/5 23:31, Eric Ren wrote:
>>>> We are in the situation that we have to avoid recursive cluster locking,
>>>> but there is no way to check if a cluster lock has been taken by a
>>>> precess already.
>>>>
>>>> Mostly, we can avoid recursive locking by writing code carefully.
>>>> However, we found that it's very hard to handle the routines that
>>>> are invoked directly by vfs code. For instance:
>>>>
>>>> const struct inode_operations ocfs2_file_iops = {
>>>>  .permission = ocfs2_permission,
>>>>  .get_acl= ocfs2_iop_get_acl,
>>>>  .set_acl= ocfs2_iop_set_acl,
>>>> };
>>>>
>>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>>>> do_sys_open
>>>>   may_open
>>>>inode_permission
>>>> ocfs2_permission
>>>>  ocfs2_inode_lock() <=== first time
>>>>   generic_permission
>>>>get_acl
>>>> ocfs2_iop_get_acl
>>>> ocfs2_inode_lock() <=== recursive one
>>>>
>>>> A deadlock will occur if a remote EX request comes in between two
>>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>>
>>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>>> on behalf of the remote EX lock request. Another hand, the recursive
>>>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>>>> because there is no chance for the first cluster lock on this node to be
>>>> unlocked - we block ourselves in the code path.
>>>>
>>>> The idea to fix this issue is mostly taken from gfs2 code.
>>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>>> keep track of the processes' pid  who has taken the cluster lock
>>>> of this lock resource;
>>>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>>>> it means just getting back disk inode bh for us if we've got cluster lock.
>>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>>> have got the cluster lock in the upper code path.
>>>>
>>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>>> to solve the recursive locking issue cuased by the fact that vfs routines
>>>> can call into each other.
>>>>
>>>> The performance penalty of processing the holder list should only be seen
>>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>>
>>>> You may ask what if the first time we got a PR lock, and the second time
>>>> we want a EX lock? fortunately, this case never happens in the real world,
>>>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>>>> the gfs2 code also do so.
>>>>
>>>> Signed-off-by: Eric Ren 
>>>> ---
>>>>   fs/ocfs2/dlmglue.c | 47 ---
>>>>   fs/ocfs2/dlmglue.h | 18 ++
>>>>   fs/ocfs2/ocfs2.h   |  1 +
>>>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>> index 83d576f..500bda4 100644
>>>> --- a/fs/ocfs2/dlmglue.c
>>>> +++ b/fs/ocfs2/dlmglue.c
>>>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res 
>>>> *res)
>>>>   init_waitqueue_head(&res->l_event);
>>>>   INIT_LIST_HEAD(&res->l_blocked_list);
>>>>   INIT_LIST_HEAD(&res->l_mask_waiters);
>>>> +INIT_LIST_HEAD(&res->l_holders);
>>>>   }
>>>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>>>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>>>   res->l_flags = 0UL;
>>>>   }
>>>>   +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>>>> +   struct ocfs2_holder *oh)
>&g

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Eric Ren

On 01/06/2017 02:07 PM, Joseph Qi wrote:
> Hi Eric,
>
>
> On 17/1/5 23:31, Eric Ren wrote:
>> We are in the situation that we have to avoid recursive cluster locking,
>> but there is no way to check if a cluster lock has been taken by a
>> precess already.
>>
>> Mostly, we can avoid recursive locking by writing code carefully.
>> However, we found that it's very hard to handle the routines that
>> are invoked directly by vfs code. For instance:
>>
>> const struct inode_operations ocfs2_file_iops = {
>>  .permission = ocfs2_permission,
>>  .get_acl= ocfs2_iop_get_acl,
>>  .set_acl= ocfs2_iop_set_acl,
>> };
>>
>> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
>> do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== first time
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>> ocfs2_inode_lock() <=== recursive one
>>
>> A deadlock will occur if a remote EX request comes in between two
>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>
>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>> on behalf of the remote EX lock request. Another hand, the recursive
>> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>> because there is no chance for the first cluster lock on this node to be
>> unlocked - we block ourselves in the code path.
>>
>> The idea to fix this issue is mostly taken from gfs2 code.
>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>> keep track of the processes' pid  who has taken the cluster lock
>> of this lock resource;
>> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
>> it means just getting back disk inode bh for us if we've got cluster lock.
>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>> have got the cluster lock in the upper code path.
>>
>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>> to solve the recursive locking issue cuased by the fact that vfs routines
>> can call into each other.
>>
>> The performance penalty of processing the holder list should only be seen
>> at a few cases where the tracking logic is used, such as get/set acl.
>>
>> You may ask what if the first time we got a PR lock, and the second time
>> we want a EX lock? fortunately, this case never happens in the real world,
>> as far as I can see, including permission check, (get|set)_(acl|attr), and
>> the gfs2 code also do so.
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/dlmglue.c | 47 ---
>>   fs/ocfs2/dlmglue.h | 18 ++
>>   fs/ocfs2/ocfs2.h   |  1 +
>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 83d576f..500bda4 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>>   init_waitqueue_head(&res->l_event);
>>   INIT_LIST_HEAD(&res->l_blocked_list);
>>   INIT_LIST_HEAD(&res->l_mask_waiters);
>> +INIT_LIST_HEAD(&res->l_holders);
>>   }
>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>>   res->l_flags = 0UL;
>>   }
>>   +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>> +   struct ocfs2_holder *oh)
>> +{
>> +INIT_LIST_HEAD(&oh->oh_list);
>> +oh->oh_owner_pid =  get_pid(task_pid(current));
>> +
>> +spin_lock(&lockres->l_lock);
>> +list_add_tail(&oh->oh_list, &lockres->l_holders);
>> +spin_unlock(&lockres->l_lock);
>> +}
>> +
>> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
>> +   struct ocfs2_holder *oh)
>> +{
>> +spin_lock(&lockres->l_lock);
>> +list_del(&oh->oh_list);
>> +spin_unlock(&lockres->l_lock);
>> +
>> +put_pid(oh->oh_owner_pid);
>> +}
>> +
>> +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-05 Thread Eric Ren

On 01/06/2017 02:09 PM, Joseph Qi wrote:
> Hi Eric,
>
>
> On 17/1/5 23:31, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
> As described cases above, shall we just add the tracking logic only for 
> set/get_acl()?

The idea is to detect recursive locking on the running task stack. Take case 1) 
for example 
if ocfs2_permisssion()
is not changed:

ocfs2_permission() <=== take PR, ocfs2_holder is not added
ocfs2_iop_get_acl <=== still take PR, because there is no lock holder on 
the tracking list

Thanks for your review;-)
Eric

>
> Thanks,
> Joseph
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/acl.c  | 39 ++-
>>   fs/ocfs2/file.c | 44 ++--
>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..c539890 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, 
>> int type)
>>   {
>>   struct buffer_head *bh = NULL;
>>   int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, &bh, 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>   if (status < 0) {
>>   if (status != -ENOENT)
>>   mlog_errno(status);
>>   return status;
>>   }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>> +
>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>   brelse(bh);
>> +
>>   return status;
>>   }
>>   @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>   struct buffer_head *di_bh = NULL;
>>   struct posix_acl *acl;
>>   int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> osb = OCFS2_SB(inode->i_sb);
>>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>   return NULL;
>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
>>   if (ret < 0) {
>>   if (ret != -ENOENT)
>>   mlog_errno(ret);
>>   return ERR_PTR(ret);
>>   }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>> acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   -ocfs2_inode_unlock(inode, 0);
>

[Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-05 Thread Eric Ren

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 39 ++-
 fs/ocfs2/file.c | 44 ++--
 2 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..c539890 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
int status = 0;
-
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   int arg_flags = 0, has_locked;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
return status;
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 1);
+   }
brelse(bh);
+
return status;
 }
 
@@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
int ret;
+   int arg_flags = 0, has_locked;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
if (ret < 0) {
if (ret != -ENOENT)
mlog_errno(ret);
return ERR_PTR(ret);
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 0);
+   }
brelse(di_bh);
+
return acl;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..62be75d 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int arg_flags = 0, had_lock;
+   struct ocfs2_holder oh;
+   struct ocfs2_lock_res *lockres;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,13 +1176,20 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   had_lock = (ocfs2_is_locked_by_me(lockres) != NULL);
+   if (had_lock)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
goto bail_unlock_rw;
}
-   inode

[Ocfs2-devel] [PATCH 0/2] fix deadlock caused by recursive cluster locking

2017-01-05 Thread Eric Ren

This is a formal patch set to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat. 
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.

Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlocks when taking inode lock at vfs entry points

 fs/ocfs2/acl.c | 39 ++-
 fs/ocfs2/dlmglue.c | 47 ---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/file.c| 44 ++--
 fs/ocfs2/ocfs2.h   |  1 +
 5 files changed, 131 insertions(+), 18 deletions(-)

-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Eric Ren

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 47 ---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..500bda4 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
*lockres)
+{
+   struct ocfs2_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(&lockres->l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, &lockres->l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid)
+   goto out;
+   }
+   oh = NULL;
+out:
+   spin_unlock(&lockres->l_lock);
+   return oh;
+}
+
 static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 int level)
 {
@@ -2333,8 +2373,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
goto getbh;
}
 
-   if (ocfs2_mount_local(osb))
-   goto local;
+   if ((arg_flags & OCFS2_META_LOCK_GETBH) ||
+   ocfs2_mount_local(osb))
+   goto update;
 
if (!(arg_f

Re: [Ocfs2-devel] [PATCH 00/17] ocfs2-test: misc improvements and trivial fixes

2017-01-04 Thread Eric Ren

Hi all,

I will push this patches into "suse" branch at Mark's github repo, considering 
no review 
accepted for more than 2 weeks.
According to Mark's advice, patch can be merged only when it has a review;-)

Thanks,
Eric

On 12/13/2016 01:29 PM, Eric Ren wrote:
> - Misc trivial fixes:
>
> [PATCH 01/17] ocfs2 test: correct the check on testcase if supported
> [PATCH 02/17] Single Run: kernel building is little broken now
> [PATCH 03/17] Trivial: better not to depend on where we issue testing
> [PATCH 04/17] Trivial: fix a typo mistake
> [PATCH 05/17] Trivial: fix checking empty return value
> [PATCH 06/17] multi_mmap: make log messages go to right place
> [PATCH 07/17] lvb_torture: failed when pcmk is used as cluster stack
> [PATCH 08/17] multiple node: pass cross_delete the right log file
>
> - This patches add two more parameters: blocksize and clustersize when we
> kick off a testing, which shortens the run time of a testing round.
> It will keep the old behaviors if they are not specified.
>
> [PATCH 09/17] Single run: make blocksize and clustersize as
> [PATCH 10/17] Multiple run: make blocksize and clustersize as
> [PATCH 11/17] discontig bg: make blocksize and clustersize as
>
> - This patch reflects the mkfs.ocfs2 changes that "--cluster-stack" and
> "--cluster-name" were added.
>
> [PATCH 12/17] Add two cluster-aware parameters: cluster stack and cluster name
>
> - More misc trival fixes:
>
> [PATCH 13/17] Save punch_hole details into logfile for debugging
> [PATCH 14/17] Fix openmpi warning by specifying proper slot number
> [PATCH 15/17] Handle the case when a symbolic link device is given
> [PATCH 16/17] inline data: fix build error
> [PATCH 17/17] discontig bg: give single and multiple node test
>
> Comments and questions are, as always, welcome.
>
> Thanks,
> Eric
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2-test passed on next-20161223

2016-12-29 Thread Eric Ren

Hi Junxiao,

On 12/30/2016 10:44 AM, Junxiao Bi wrote:
> Hi Guys,
>
> I just done ocfs2-test single/multiple/discontig test on linux
> next-20161223, all test passed. Thank you for your effort to make the
> good quality.

Thanks for your effort! BTW, how long does the whole testing take usually on 
your side?

Eric

>
> Thanks,
> Junxiao.
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-14 Thread Eric Ren

Hi,

On 12/15/2016 09:46 AM, Joseph Qi wrote:
> In you description, this issue can only happen in case of stack user +
>
> fsdlm.
Yes.
>
> So I feel we'd better to make stack user and o2cb behaves the same,
>
> other than treat it as a special case.
Yes, I agree. But, actually, there is nothing wrong with fsdlm. I think o2cb 
does some tricks
with  DLM_LKF_VALBLK flag in such a special case where down conversion is 
PR->NULL.

I'd like to see this quick and small fix to be merged at this moment, because 
this issue is 
little emergency for us.
Anyway, we can supersede this one easily if someone familiar with o2cb works 
out a patch for 
o2cb in the future.

Does this sounds good to you?

Thanks,
Eric
>
>
> Thanks,
>
> Joseph
>
> On 16/12/9 17:30, Eric Ren wrote:
>> The crash happens rather often when we reset some cluster
>> nodes while nodes contend fiercely to do truncate and append.
>>
>> The crash backtrace is below:
>> "
>> [  245.197849] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 
>> locks on 971 
>> resources
>> [  245.197859] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 
>> generation 5 done: 4 ms
>> [  245.198379] ocfs2: Begin replay journal (node 318952601, slot 2) on 
>> device (253,18)
>> [  247.272338] ocfs2: End replay journal (node 318952601, slot 2) on device 
>> (253,18)
>> [  247.547084] ocfs2: Beginning quota recovery on device (253,18) for slot 2
>> [  247.683263] ocfs2: Finishing quota recovery on device (253,18) for slot 2
>> [  247.833022] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug 
>> expression: 
>> le64_to_cpu(fe->i_size) != i_size_read(inode)
>> [  247.833029] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 
>> 290321, inode 
>> i_size = 732 != di i_size = 937, i_flags = 0x1
>> [  247.833074] [ cut here ]
>> [  247.833077] kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
>> [  247.833079] invalid opcode:  [#1] SMP
>> [  247.833081] Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) 
>> ocfs2_nodemanager 
>> ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_modiscsi_tcp 
>> libiscsi_tcp 
>> libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog 
>> xfs libcrc32c 
>> ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 
>> acpi_cpufreq 
>> button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk 
>> ata_piix   drm_kms_helper ahci syscopyarea libahci sysfillrect 
>> sysimgblt 
>> fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio 
>> ehci_hcd   
>> usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc 
>> scsi_dh_alua 
>> scsi_mod autofs4
>> [  247.833107] Supported: No, Unsupported modules are loaded
>> [  247.833110] CPU: 1 PID: 30154 Comm: truncate Tainted: G   OE   N  
>> 4.4.21-69-default #1
>> [  247.833111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
>> [  247.833112] task: 88004ff6d240 ti: 880074e68000 task.ti: 
>> 880074e68000
>> [  247.833113] RIP: 0010:[] [] 
>> ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
>> [  247.833151] RSP: 0018:880074e6bd50  EFLAGS: 00010282
>> [  247.833152] RAX: 0074 RBX: 029e RCX: 
>> 
>> [  247.833153] RDX: 0001 RSI: 0246 RDI: 
>> 0246
>> [  247.833154] RBP: 880074e6bda8 R08: 3675dc7a R09: 
>> 82013414
>> [  247.833155] R10: 00034c50 R11:  R12: 
>> 88003aab3448
>> [  247.833156] R13: 02dc R14: 00046e11 R15: 
>> 0020
>> [  247.833157] FS:  7f839f965700() GS:88007fc8() 
>> knlGS:
>> [  247.833158] CS:  0010 DS:  ES:  CR0: 8005003b
>> [  247.833159] CR2: 7f839f97e000 CR3: 36723000 CR4: 
>> 06e0
>> [  247.833164] Stack:
>> [  247.833165]  03a9 0001 880060554000 
>> 88004fcaf000
>> [  247.833167]  88003aa7b090 1000 88003aab3448 
>> 880074e6beb0
>> [  247.833169]  0001 2068 0020 
>> 
>> [  247.833171] Call Trace:
>> [  247.833208]  [] ocfs2_setattr+0x698/0xa90 [ocfs2]
>> [  247.833225]  [] notify_change+0x1ae/0x380
>> [  247.833242]  [] do_truncate+0x5e/0x90
>> [  247.833246]  [] 
>> do_sys_ftru

Re: [Ocfs2-devel] [PATCH 1/7] ocfs2: test reflinking to inline data files

2016-12-13 Thread Eric Ren

Hi!

On 12/14/2016 05:35 AM, Darrick J. Wong wrote:
> On Mon, Dec 12, 2016 at 11:11:36PM -0800, Darrick J. Wong wrote:
>> coreutils 8.26 (latest) doesn't have any OCFS2_IOC_REFLINK support; I
>> suspect a distro patch in the srpm or something
> Confirmed.  SuSE's coreutils package has a patch that tries the ocfs2
> reflink ioctl and bails out of cp without trying the btrfs/vfs clone
> ioctl if the ocfs2 ioctl returns EEXIST.  I suppose that'll have to get
> fixed in their coreutils package, but until then it'll just be broken.
> :/
>
> FWIW I also confirm that none of (upstream, RHEL, OL, Debian, or Ubuntu)
> have this patch, so this shouldn't be a problem on any of them.

I'm going to fix this issue for openSUSE. Thanks for your guys' efforts!

Thanks!
Eric
>
> --D
>
>> --D
>>
>>> Thanks,
>>> Eryu
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 16/17] inline data: fix build error

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren 
---
 programs/defrag-test/frager.c | 2 +-
 programs/directio_test/directio_test.c| 2 +-
 programs/discontig_bg_test/spawn_inodes.c | 2 +-
 programs/dx_dirs_tests/index_dir.c| 2 +-
 programs/inline-data/inline-data.c| 2 +-
 programs/inline-data/inline-dirs.c| 2 +-
 programs/reflink_tests/reflink_test.c | 2 +-
 programs/xattr_tests/xattr-test.c | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/programs/defrag-test/frager.c b/programs/defrag-test/frager.c
index f510dde..473c31c 100755
--- a/programs/defrag-test/frager.c
+++ b/programs/defrag-test/frager.c
@@ -86,7 +86,7 @@ static int usage(void)
 static void sigchld_handler()
 {
pid_t   pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/directio_test/directio_test.c 
b/programs/directio_test/directio_test.c
index 7ec5278..21bc32c 100755
--- a/programs/directio_test/directio_test.c
+++ b/programs/directio_test/directio_test.c
@@ -214,7 +214,7 @@ static int teardown(void)
 static void sigchld_handler()
 {
pid_t pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/discontig_bg_test/spawn_inodes.c 
b/programs/discontig_bg_test/spawn_inodes.c
index 6bb7a93..633f0a9 100755
--- a/programs/discontig_bg_test/spawn_inodes.c
+++ b/programs/discontig_bg_test/spawn_inodes.c
@@ -64,7 +64,7 @@ static int usage(void)
 static void sigchld_handler()
 {
pid_t   pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/dx_dirs_tests/index_dir.c 
b/programs/dx_dirs_tests/index_dir.c
index 75ea8bd..ffdfa0f 100755
--- a/programs/dx_dirs_tests/index_dir.c
+++ b/programs/dx_dirs_tests/index_dir.c
@@ -926,7 +926,7 @@ void random_test(void)
 static void sigchld_handler()
 {
 pid_t pid;
-union wait status;
+int status;
 
 while (1) {
 pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/inline-data/inline-data.c 
b/programs/inline-data/inline-data.c
index 13124d7..daaee3c 100644
--- a/programs/inline-data/inline-data.c
+++ b/programs/inline-data/inline-data.c
@@ -256,7 +256,7 @@ static int teardown(void)
 static void sigchld_handler()
 {
pid_t pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/inline-data/inline-dirs.c 
b/programs/inline-data/inline-dirs.c
index ac7882f..0db24b9 100644
--- a/programs/inline-data/inline-dirs.c
+++ b/programs/inline-data/inline-dirs.c
@@ -357,7 +357,7 @@ static void run_large_dir_tests(void)
 static void sigchld_handler()
 {
pid_t pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/reflink_tests/reflink_test.c 
b/programs/reflink_tests/reflink_test.c
index 22386db..2801968 100755
--- a/programs/reflink_tests/reflink_test.c
+++ b/programs/reflink_tests/reflink_test.c
@@ -965,7 +965,7 @@ static int stress_test()
 static void sigchld_handler()
 {
pid_t pid;
-   union wait status;
+   int status;
 
while (1) {
pid = wait3(&status, WNOHANG, NULL);
diff --git a/programs/xattr_tests/xattr-test.c 
b/programs/xattr_tests/xattr-test.c
index d204aba..77be780 100755
--- a/programs/xattr_tests/xattr-test.c
+++ b/programs/xattr_tests/xattr-test.c
@@ -301,7 +301,7 @@ static void judge_sys_return(int ret, const char *sys_func)
 static void sigchld_handler()
 {
pid_t   pid;
-   union wait status;
+   int status;
while (1) {
pid = wait3(&status, WNOHANG, NULL);
if (pid <= 0)
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 15/17] Handle the case when a symbolic link device is given

2016-12-12 Thread Eric Ren

It's a usual case that the shared disk is given in symbolic
name, like iscsi disk "/dev/disk/by-path/disk". So work out
the typical device name in such case.

Signed-off-by: Eric Ren 
---
 programs/discontig_bg_test/discontig_runner.sh |  5 +
 programs/python_common/multiple_run.sh | 13 +
 programs/python_common/single_run-WIP.sh   |  6 ++
 3 files changed, 24 insertions(+)

diff --git a/programs/discontig_bg_test/discontig_runner.sh 
b/programs/discontig_bg_test/discontig_runner.sh
index 4c13adb..182ca3a 100755
--- a/programs/discontig_bg_test/discontig_runner.sh
+++ b/programs/discontig_bg_test/discontig_runner.sh
@@ -126,6 +126,11 @@ function f_setup()
if [ -z "${DEVICE}" ];then
f_usage
fi  
+
+   # if a symbollink is given, work out the typical device name, like 
/dev/sda
+   if [ -L ${DEVICE} ];then
+   DEVICE=`readlink -f ${DEVICE}`
+   fi

if [ -z "${MOUNT_POINT}" ];then
f_usage
diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index 74c3531..3e11abd 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -135,6 +135,19 @@ f_setup()
 
f_getoptions $*
 
+   if [ -z ${DEVICE} ] ; then
+   ${ECHO} "ERROR: No device"
+   f_usage
+   elif [ ! -b ${DEVICE} ] ; then
+   ${ECHO} "ERROR: Invalid device ${DEVICE}"
+   exit 1
+   fi
+
+   # if a symbollink is given, work out the typical device name, like 
/dev/sda
+   if [ -L ${DEVICE} ];then
+DEVICE=`readlink -f ${DEVICE}`
+   fi
+
if [ -z "${MOUNT_POINT}" ];then
 f_usage
 fi
diff --git a/programs/python_common/single_run-WIP.sh 
b/programs/python_common/single_run-WIP.sh
index 5c174f0..92d1216 100755
--- a/programs/python_common/single_run-WIP.sh
+++ b/programs/python_common/single_run-WIP.sh
@@ -1095,6 +1095,7 @@ do
?) usage;;
esac
 done
+
 if [ -z ${DEVICE} ] ; then
${ECHO} "ERROR: No device"
usage
@@ -1103,6 +1104,11 @@ elif [ ! -b ${DEVICE} ] ; then
exit 1
 fi
 
+# if a symbollink is given, work out the typical device name, like /dev/sda
+if [ -L ${DEVICE} ];then
+   DEVICE=`readlink -f ${DEVICE}`
+fi
+
 if [ -z ${MOUNTPOINT} ] ; then
${ECHO} "ERROR: No mountpoint"
usage
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 10/17] Multiple run: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren

It takes too long to get the result of a round testing. This
can shorten a lot time by eliminating 2-layer loops with blocksize
and clustersize. Now blocksize defaults to 4096, while clustersize
to 32768 if not specified.

Signed-off-by: Eric Ren 
---
 programs/inline-data/multi-inline-run.sh | 24 
 programs/python_common/multiple_run.sh   | 28 +++-
 programs/reflink_tests/multi_reflink_test_run.sh | 28 ++--
 programs/xattr_tests/xattr-multi-run.sh  | 24 
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/programs/inline-data/multi-inline-run.sh 
b/programs/inline-data/multi-inline-run.sh
index 1d51443..0a2ffa5 100755
--- a/programs/inline-data/multi-inline-run.sh
+++ b/programs/inline-data/multi-inline-run.sh
@@ -126,12 +126,14 @@ exit_or_not()
 

 f_usage()
 {
-echo "usage: `basename ${0}` [-r MPI_ranks] <-f MPI_hosts> [-a 
access_method] [-o output] <-d > "
+echo "usage: `basename ${0}` [-r MPI_ranks] <-f MPI_hosts> [-a 
access_method] [-o output] <-d > <-b blocksize> -c  
"
 echo "   -r size of MPI rank"
 echo "   -a access method for process propagation,should be ssh or 
rsh,set ssh as a default method when omited."
 echo "   -f MPI hosts list,separated by comma,e.g -f 
node1.us.oracle.com,node2.us.oracle.com."
 echo "   -o output directory for the logs"
 echo "   -d device name used for ocfs2 volume"
+echo "  -b block size"
+echo "  -c cluster size"
 echo "path of mountpoint where the ocfs2 volume 
will be mounted on."
 exit 1;
 
@@ -144,13 +146,15 @@ f_getoptions()
 exit 1
  fi
 
- while getopts "o:hd:r:a:f:" options; do
+ while getopts "o:hd:r:a:f:b:c:" options; do
 case $options in
a ) MPI_ACCESS_METHOD="$OPTARG";;
r ) MPI_RANKS="$OPTARG";;
f ) MPI_HOSTS="$OPTARG";;
 o ) LOG_OUT_DIR="$OPTARG";;
 d ) OCFS2_DEVICE="$OPTARG";;
+   b ) BLOCKSIZE="$OPTARG";;
+   c ) CLUSTERSIZE="$OPTARG";;
 h ) f_usage
 exit 1;;
 * ) f_usage
@@ -327,9 +331,21 @@ trap ' : ' SIGTERM
 
 f_setup $*
 
-for BLOCKSIZE in 512 1024 4096
+if [ -n "$BLOCKSIZE" ];then
+   bslist="$BLOCKSIZE"
+else
+   bslist="512 1024 4096"
+fi
+
+if [ -n "CLUSTERSIZE" ];then
+   cslist="$CLUSTERSIZE"
+else
+   cslist="4096 32768 1048576"
+fi
+
+for BLOCKSIZE in $(echo "$bslist")
 do
-   for CLUSTERSIZE in  4096 32768 1048576
+   for CLUSTERSIZE in $(echo "$cslist")
do
echo "++Multiple node inline-data test with \"-b 
${BLOCKSIZE} -C ${CLUSTERSIZE}\"++" |tee -a ${RUN_LOG_FILE}
echo "++Multiple node inline-data test with \"-b 
${BLOCKSIZE} -C ${CLUSTERSIZE}\"++">>${DATA_LOG_FILE}
diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index 9e2237a..b2d5800 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -72,10 +72,12 @@ set -o pipefail
 

 f_usage()
 {
-echo "usage: `basename ${0}` <-k kerneltarball> <-n nodes> [-i nic] \
+echo "usage: `basename ${0}` <-k kerneltarball> [-b blocksize] [-c 
clustersize] <-n nodes> [-i nic] \
 [-a access_method] [-o logdir] <-d device> [-t testcases] "
 echo "   -k kerneltarball should be path of tarball for kernel src."
 echo "   -n nodelist,should be comma separated."
+echo "   -b blocksize."
+echo "   -c clustersize."
 echo "   -o output directory for the logs"
 echo "   -i network interface name to be used for MPI messaging."
 echo "   -a access method for mpi execution,should be ssh or rsh"
@@ -97,13 +99,15 @@ f_getoptions()
exit 1
fi
 
-   while getopts "n:d:i:a:o:k:t:h:" options; do
+   while getopts "n:d:i:a:o:b:c:k:t:h:" options; do
case $options in
n ) NODE_LIST="$OPTARG";;
d ) DEVICE="$OPTARG";;
i ) INTERFACE="$OPTARG";;
a ) ACCESS_METHOD="$OPTARG";;

[Ocfs2-devel] [PATCH 11/17] discontig bg: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren

Add "-b blocksize" and "-c clustersize" as optional parameters.
It will keep the original behavior if we don't specify their
values.

Signed-off-by: Eric Ren 
---
 programs/discontig_bg_test/discontig_runner.sh | 51 +-
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/programs/discontig_bg_test/discontig_runner.sh 
b/programs/discontig_bg_test/discontig_runner.sh
index bb6a53e..1d94be3 100755
--- a/programs/discontig_bg_test/discontig_runner.sh
+++ b/programs/discontig_bg_test/discontig_runner.sh
@@ -64,7 +64,8 @@ MPI_BTL_IF_ARG=
 

 function f_usage()
 {
-echo "usage: `basename ${0}` <-d device> [-o logdir] [-m multi_hosts] [-a 
access_method] "
+echo "usage: `basename ${0}` <-d device> [-o logdir] [-m multi_hosts] [-a 
access_method] \
+[-b block_size] [-c cluster_size] "
 exit 1;
 
 }
@@ -76,13 +77,15 @@ function f_getoptions()
exit 1
fi

-   while getopts "hd:o:m:a:" options; do
+   while getopts "hd:o:m:a:b:c:" options; do
case $options in
d ) DEVICE="$OPTARG";;
o ) LOG_DIR="$OPTARG";;
a ) MPI_ACCESS_METHOD="$OPTARG";;
m ) MULTI_TEST=1
MPI_HOSTS="$OPTARG";;
+b ) BLOCKSIZE="$OPTARG";;
+c ) CLUSTERSIZE="$OPTARG";;
h ) f_usage
exit 1;;
* ) f_usage
@@ -209,7 +212,7 @@ function f_inodes_test()
local filename=
 
f_LogMsg ${LOG_FILE} "Activate inode discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 200 -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} -l ${LABELNAME} ${MOUNT_POINT} 
>>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 200 -b ${BLOCKSIZE} -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} -l ${LABELNAME} ${MOUNT_POINT} 
>>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -292,7 +295,7 @@ function f_inodes_test()
f_exit_or_not ${RET}
 
f_LogMsg ${LOG_FILE} "[*] Activate inode discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 4096 -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 4096 -b ${BLOCKSIZE} -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -420,7 +423,7 @@ function f_extents_test()
local inc=
 
f_LogMsg ${LOG_FILE} "[*] Activate extent discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t extent -r 2048 -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t extent -r 2048 -b ${BLOCKSIZE}-c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -552,7 +555,7 @@ function f_extents_test()
 function f_inline_test()
 {
f_LogMsg ${LOG_FILE} "[*] Activate inode discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 1024 -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t inode -r 1024 -b ${BLOCKSIZE} -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -622,7 +625,7 @@ function f_inline_test()
 function f_xattr_test()
 {
f_LogMsg ${LOG_FILE} "[*] Activate extent discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t extent -r 10240 -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t extent -r 10240 -b ${BLOCKSIZE} -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -705,7 +708,7 @@ function f_refcount_test()
local inc=
 
f_LogMsg ${LOG_FILE} "[*] Activate extent discontig-bg on ${DEVICE}"
-   ${DISCONTIG_ACTIVATE_BIN} -t extent -r ${remain_space} -b $BLOCKSIZE -c 
${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
+   ${DISCONTIG_ACTIVATE_BIN} -t extent -r ${remain_space} -b ${BLOCKSIZE} 
-c ${CLUSTERSIZE} -d ${DEVICE} -o ${LOG_DIR} ${MOUNT_POINT} >>${LOG_FILE} 2>&1
RET=$?
f_exit_or_not ${RET}
 
@@ -883,7 +886,7 @@ function f_refcount_test()
 function f_dxdir_test()
 {
f_LogMsg ${LOG_FILE} "[*] Activate inode discont

[Ocfs2-devel] [PATCH 03/17] Trivial: better not to depend on where we issue testing

2016-12-12 Thread Eric Ren

If we issue testing outsides directory where executives
are, error likes the below may occur:
"./config.sh No such file or directory".
So let's depend on PATH environment rather that.

Signed-off-by: Eric Ren 
---
 programs/dirop_fileop_racer/racer.sh   | 48 +++---
 programs/dx_dirs_tests/index_dir_run.sh|  2 +-
 programs/dx_dirs_tests/multi_index_dir_run.sh  |  2 +-
 programs/inline-data/multi-inline-run.sh   |  2 +-
 programs/inline-data/single-inline-run.sh  |  2 +-
 .../inode_alloc_perf_tests/inode_alloc_perf.sh |  2 +-
 .../multi_inode_alloc_perf.sh  |  2 +-
 .../multi_inode_alloc_perf_runner.sh   |  2 +-
 .../iozone/iozone3_263/src/current/Generate_Graphs | 26 ++--
 programs/logwriter/enospc.sh   |  2 +-
 programs/logwriter/rename_write_race.sh|  2 +-
 programs/python_common/o2tf.sh |  2 +-
 programs/python_common/single_run-WIP.sh   |  4 +-
 programs/python_common/single_run.sh   |  2 +-
 .../tunefs-test/remove-slot/corrupt_remove_slot.sh |  2 +-
 programs/xattr_tests/xattr-multi-run.sh|  2 +-
 programs/xattr_tests/xattr-single-run.sh   |  2 +-
 17 files changed, 53 insertions(+), 53 deletions(-)

diff --git a/programs/dirop_fileop_racer/racer.sh 
b/programs/dirop_fileop_racer/racer.sh
index 819efa8..7e83b7a 100755
--- a/programs/dirop_fileop_racer/racer.sh
+++ b/programs/dirop_fileop_racer/racer.sh
@@ -37,37 +37,37 @@ DIR="race"
 #
 
 [ -e $DIR ] || mkdir $DIR
-./file_create.sh $DIR $MAX_FILES &
-./file_create.sh $DIR $MAX_FILES &
-./file_create.sh $DIR $MAX_FILES &
+file_create.sh $DIR $MAX_FILES &
+file_create.sh $DIR $MAX_FILES &
+file_create.sh $DIR $MAX_FILES &
 
-./dir_create.sh $DIR $MAX_FILES &
-./dir_create.sh $DIR $MAX_FILES &
-./dir_create.sh $DIR $MAX_FILES &
+dir_create.sh $DIR $MAX_FILES &
+dir_create.sh $DIR $MAX_FILES &
+dir_create.sh $DIR $MAX_FILES &
 
-./file_rename.sh $DIR $MAX_FILES &
-./file_rename.sh $DIR $MAX_FILES &
-./file_rename.sh $DIR $MAX_FILES &
+file_rename.sh $DIR $MAX_FILES &
+file_rename.sh $DIR $MAX_FILES &
+file_rename.sh $DIR $MAX_FILES &
 
-./file_link.sh $DIR $MAX_FILES &
-./file_link.sh $DIR $MAX_FILES &
-./file_link.sh $DIR $MAX_FILES &
+file_link.sh $DIR $MAX_FILES &
+file_link.sh $DIR $MAX_FILES &
+file_link.sh $DIR $MAX_FILES &
 
-./file_symlink.sh $DIR $MAX_FILES &
-./file_symlink.sh $DIR $MAX_FILES &
-./file_symlink.sh $DIR $MAX_FILES &
+file_symlink.sh $DIR $MAX_FILES &
+file_symlink.sh $DIR $MAX_FILES &
+file_symlink.sh $DIR $MAX_FILES &
 
-./file_concat.sh $DIR $MAX_FILES &
-./file_concat.sh $DIR $MAX_FILES &
-./file_concat.sh $DIR $MAX_FILES &
+file_concat.sh $DIR $MAX_FILES &
+file_concat.sh $DIR $MAX_FILES &
+file_concat.sh $DIR $MAX_FILES &
 
-./file_list.sh $DIR &
-./file_list.sh $DIR &
-./file_list.sh $DIR &
+file_list.sh $DIR &
+file_list.sh $DIR &
+file_list.sh $DIR &
 
-./file_rm.sh $DIR $MAX_FILES &
-./file_rm.sh $DIR $MAX_FILES &
-./file_rm.sh $DIR $MAX_FILES &
+file_rm.sh $DIR $MAX_FILES &
+file_rm.sh $DIR $MAX_FILES &
+file_rm.sh $DIR $MAX_FILES &
 
 echo "CTRL-C to exit"
 trap "
diff --git a/programs/dx_dirs_tests/index_dir_run.sh 
b/programs/dx_dirs_tests/index_dir_run.sh
index 381d144..bbd2fdc 100755
--- a/programs/dx_dirs_tests/index_dir_run.sh
+++ b/programs/dx_dirs_tests/index_dir_run.sh
@@ -43,7 +43,7 @@
 

 # Global Variables
 

-. ./o2tf.sh
+. `dirname ${0}`/o2tf.sh
 
 BLOCKSIZE=
 CLUSTERSIZE=
diff --git a/programs/dx_dirs_tests/multi_index_dir_run.sh 
b/programs/dx_dirs_tests/multi_index_dir_run.sh
index eb72a7d..c83b9f7 100755
--- a/programs/dx_dirs_tests/multi_index_dir_run.sh
+++ b/programs/dx_dirs_tests/multi_index_dir_run.sh
@@ -41,7 +41,7 @@
 

 # Global Variables
 

-. ./o2tf.sh
+. `dirname ${0}`/o2tf.sh
 
 BLOCKSIZE=
 CLUSTERSIZE=
diff --git a/programs/inline-data/multi-inline-run.sh 
b/programs/inline-data/multi-inline-run.sh
index 30e2e6a..1d51443 100755
--- a/programs/inline-data/multi-inline-run.sh
+++ b/programs/inline-data/multi-inline-run.sh
@@ -19,7 +19,7 @@
 PATH=$PATH:/sbin  # Add /sbin to the path for ocfs2 tools
 export PATH=$PATH:.
 
-. ./config.sh
+. `dirname ${0}`/config.sh
 
 #MPIRUN="`which mpirun`"
 
diff --git a/programs/inline-data/single-inline-run.sh 
b/programs/inline-data/single-inline-run.sh
index 938f461..5a176cd 100755
--- a/programs/inlin

[Ocfs2-devel] [PATCH 04/17] Trivial: fix a typo mistake

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren 
---
 programs/mkfs-tests/mkfs-test.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/programs/mkfs-tests/mkfs-test.sh b/programs/mkfs-tests/mkfs-test.sh
index 3fc93a4..8fdd02a 100755
--- a/programs/mkfs-tests/mkfs-test.sh
+++ b/programs/mkfs-tests/mkfs-test.sh
@@ -431,7 +431,7 @@ fi;
 echo "Test ${testnum}: -L mylabel" |tee -a ${LOGFILE}
 label="my_label_is_very_very_very_long_to_the_point_of_being_useless"
 echo -n "mkfs . " |tee -a ${LOGFILE}
-${MKFS} -x -F -b 4K -C 4K -N 1 -L ${label} ${device} 262144 >{OUt} 2>&1
+${MKFS} -x -F -b 4K -C 4K -N 1 -L ${label} ${device} 262144 >${OUT} 2>&1
 echo "OK" |tee -a ${LOGFILE}
 echo -n "verify . " |tee -a ${LOGFILE}
 ${DEBUGFS} -R "stats" ${device} >${OUT} 2>&1
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 12/17] Add two cluster-aware parameters: cluster stack and cluster name

2016-12-12 Thread Eric Ren

With pacemaker as cluster stack, single node test always fail on
mkfs in some testcases. On suse OS, we use pcmk plugin as default
cluster stack. But in single node test, some testcases format ocfs2
volume as local filesystem. In such case, o2cb plugin is used. If
the next testcase want to format the volume with multiple slots, it
will fail because mkfs.ocfs2 cannot shift from o2cb to pcmk
automatically.

This patch should be merged into suse branch, without affecting
other OS release only using o2cb.

Signed-off-by: Eric Ren 
---
 programs/backup_super/test_backup_super.sh | 27 ++--
 .../discontig_bg_test/activate_discontig_bg.sh | 12 ++--
 programs/discontig_bg_test/discontig_runner.sh | 36 ++-
 programs/inline-data/multi-inline-run.sh   | 13 ++--
 programs/inline-data/single-inline-run.sh  | 10 ++-
 programs/mkfs-tests/mkfs-test.sh   | 46 -
 programs/python_common/multiple_run.sh | 22 ---
 programs/python_common/o2tf.sh | 16 -
 programs/python_common/single_run-WIP.sh   | 75 ++
 programs/reflink_tests/multi_reflink_test_run.sh   | 12 +++-
 programs/reflink_tests/reflink_test_run.sh |  8 ++-
 programs/tunefs-test/tunefs-test.sh| 11 ++--
 programs/xattr_tests/xattr-multi-run.sh| 15 -
 programs/xattr_tests/xattr-single-run.sh   | 13 ++--
 14 files changed, 215 insertions(+), 101 deletions(-)

diff --git a/programs/backup_super/test_backup_super.sh 
b/programs/backup_super/test_backup_super.sh
index 530109e..05da3e7 100755
--- a/programs/backup_super/test_backup_super.sh
+++ b/programs/backup_super/test_backup_super.sh
@@ -169,7 +169,8 @@ function test_mkfs()
msg1="debugfs should be sucess"
 
blkcount=`expr $vol_byte_size / $blocksize`
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M \
+ --cluster-stack=${CLUSTER_STACK} --cluster-name=${CLUSTER_NAME} ${DEVICE} 
$blkcount
#first check whether mkfs is success
echo "ls //"|${DEBUGFS_BIN} ${DEVICE}|grep global_bitmap
exit_if_bad $? 0 $msg $LINENO
@@ -186,7 +187,8 @@ function test_mkfs()
 
${DD_BIN} if=/dev/zero of=$DEVICE bs=4096 count=3
clear_backup_blocks
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M \
+   --cluster-stack=${CLUSTER_STACK} --cluster-name=${CLUSTER_NAME} 
${DEVICE} $blkcount
#first check whether mkfs is success
echo "ls //"|${DEBUGFS_BIN} ${DEVICE}|grep global_bitmap
exit_if_bad $? 0 $msg1 $LINENO
@@ -217,7 +219,8 @@ function test_fsck()
${DD_BIN} if=/dev/zero of=$DEVICE bs=4096 count=3
clear_backup_blocks
 
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M \
+ --cluster-stack=${CLUSTER_STACK} --cluster-name=${CLUSTER_NAME} ${DEVICE} 
$blkcount
#corrupt the superblock
${DD_BIN} if=/dev/zero of=${DEVICE} bs=$blocksize count=3
${FSCK_BIN} -fy ${DEVICE}   #This should failed.
@@ -247,7 +250,8 @@ function test_tunefs_resize()
clear_backup_blocks
 
#mkfs a volume with no backup superblock
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M \
+   --cluster-stack=${CLUSTER_STACK} 
--cluster-name=${CLUSTER_NAME} ${DEVICE} $blkcount
 
local bpc=`expr $clustersize / $blocksize`
local blkcount=`expr $blkcount + $bpc`
@@ -283,7 +287,8 @@ function test_tunefs_add_backup()
clear_backup_blocks
 
#mkfs a volume with no backup superblock supported
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
--no-backup-super ${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
--no-backup-super \
+ --cluster-stack=${CLUSTER_STACK} --cluster-name=${CLUSTER_NAME} ${DEVICE} 
$blkcount
 
#We can't open the volume by backup superblock now
echo "ls //"|${DEBUGFS_BIN} ${DEVICE} -s 1|grep global_bitmap
@@ -327,7 +332,8 @@ function test_tunefs_refresh()
 
local old_vol_name="old_ocfs2"
local new_vol_name="new_ocfs2"
-   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M 
-L $old_vol_name ${DEVICE} $blkcount
+   echo "y" |${MKFS_BIN} -b $blocksize -C $clustersize -N 4  -J size=64M

[Ocfs2-devel] [PATCH 14/17] Fix openmpi warning by specifying proper slot number

2016-12-12 Thread Eric Ren

The warning message as below:

"
There are not enough slots available in the system to satisfy the 4
slots
that were requested by the application:
  ./xattr-multi-test

  Either request fewer slots for your application, or make more slots
  available
  for use.
"

outputs when specifying a rank number for openmpi more than "mkfs"
slots.

Signed-off-by: Eric Ren 
---
 programs/python_common/multiple_run.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index 3e52eff..74c3531 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -368,7 +368,7 @@ run_xattr_test()
local logdir=${LOG_DIR}/multi-xattr-test
 
LogRunMsg "xattr-test"
-   ${BINDIR}/xattr-multi-run.sh -r 4 -f ${NODE_LIST} -a ssh -o ${logdir} \
+   ${BINDIR}/xattr-multi-run.sh -r ${SLOTS} -f ${NODE_LIST} -a ssh -o 
${logdir} \
 -d ${DEVICE} -b ${BLOCKSIZE} -c ${CLUSTERSIZE} -s ${CLUSTER_STACK} -n 
${CLUSTER_NAME} ${MOUNT_POINT} >> ${LOGFILE} 2>&1
LogRC $?
 }
@@ -378,7 +378,7 @@ run_inline_test()
local logdir=${LOG_DIR}/multi-inline-test
 
LogRunMsg "inline-test"
-   ${BINDIR}/multi-inline-run.sh -r 2 -f ${NODE_LIST} -a ssh -o ${logdir} \
+   ${BINDIR}/multi-inline-run.sh -r ${SLOTS} -f ${NODE_LIST} -a ssh -o 
${logdir} \
 -d ${DEVICE} -b ${BLOCKSIZE} -c ${CLUSTERSIZE} -s ${CLUSTER_STACK} -n 
${CLUSTER_NAME} ${MOUNT_POINT} >> ${LOGFILE} 2>&1
LogRC $?
 }
@@ -389,7 +389,7 @@ run_reflink_test()
 
LogRunMsg "reflink-test"
LogMsg "reflink 'data=ordered' mode test"
-   ${BINDIR}/multi_reflink_test_run.sh -r 4 -f ${NODE_LIST} -a ssh -o \
+   ${BINDIR}/multi_reflink_test_run.sh -r ${SLOTS} -f ${NODE_LIST} -a ssh 
-o \
 ${logdir} -d ${DEVICE} -s ${CLUSTER_STACK} -n ${CLUSTER_NAME} ${MOUNT_POINT} 
>> ${LOGFILE} 2>&1 || {
RET=$?
LogRC $RET
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 08/17] multiple node: pass cross_delete the right log file

2016-12-12 Thread Eric Ren

Pass cross_delete the right log file. However, openmpi should
log into config.LOGFILE,  because other remote nodes only have
this common log file.

Signed-off-by: Eric Ren 
---
 programs/cross_delete/cross_delete.py  | 4 ++--
 programs/python_common/multiple_run.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/programs/cross_delete/cross_delete.py 
b/programs/cross_delete/cross_delete.py
index ec3097c..1694d51 100755
--- a/programs/cross_delete/cross_delete.py
+++ b/programs/cross_delete/cross_delete.py
@@ -305,7 +305,7 @@ o2tf.OpenMPIInit(DEBUGON, ','.join(nodelist), logfile, 
'ssh')
cmdline = os.path.join(config.BINDIR, 'crdel_gen_files.py')
ret = o2tf.openmpi_run( DEBUGON, nproc, str('%s -s %s -l %s -t %s' % \
(cmdline, stagedir,
-   options.logfile,
+   config.LOGFILE,
tarfile) ),
','.join(nodelist),
'ssh',
@@ -324,7 +324,7 @@ o2tf.OpenMPIInit(DEBUGON, ','.join(nodelist), logfile, 
'ssh')
else:
cmdline = os.path.join(config.BINDIR, 'crdel_del_files.py')
ret = o2tf.openmpi_run( DEBUGON, nproc, str('%s -s %s -l %s ' % \
-   (cmdline, stagedir, options.logfile) ),
+   (cmdline, stagedir, config.LOGFILE) ),
','.join(nodelist),
'ssh',
options.interface,
diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index 4340c40..9e2237a 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -318,7 +318,7 @@ run_cross_delete_test()
local workplace=${MOUNT_POINT}/cross_delete_test
 
run_common_testcase "cross_delete" "sparse,unwritten,inline-data" \
-"${BINDIR}/cross_delete.py -c 1 -i ${INTERFACE} -d ${workplace} -n 
${NODE_LIST} -t ${KERNELSRC}"
+"${BINDIR}/cross_delete.py -c 1 -l ${logfile} -i ${INTERFACE} -d ${workplace} 
-n ${NODE_LIST} -t ${KERNELSRC}"
 }
 
 run_write_append_truncate_test()
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 07/17] lvb_torture: failed when pcmk is used as cluster stack

2016-12-12 Thread Eric Ren

This test case failed with "pcmk" stack. Output errors
like:
"rank 1: /dlm/ has no write permission."
"rank 1: o2dlm_initialize failed: -1485330936".

Signed-off-by: Eric Ren 
---
 programs/lvb_torture/lvb_torture.c | 110 -
 programs/python_common/multiple_run.sh |  12 +++-
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/programs/lvb_torture/lvb_torture.c 
b/programs/lvb_torture/lvb_torture.c
index 1459849..93a29ec 100644
--- a/programs/lvb_torture/lvb_torture.c
+++ b/programs/lvb_torture/lvb_torture.c
@@ -180,6 +180,101 @@ static void run_test(struct o2dlm_ctxt *dlm, char *lockid)
}
 }
 
+/*
+ * Copied from run_test(), this is a ugly but straightforward workaround.
+ * "fsdlm" is used when using pcmk as cluster stack, which only supports
+ * 32-bits lvb so far.
+ */
+static void run_test_fsdlm(struct o2dlm_ctxt *dlm, char *lockid)
+{
+   unsigned long long iter = 0;
+   unsigned long long expected, to_write = 0;
+   int ret;
+   unsigned int read, written;
+   errcode_t err;
+   enum o2dlm_lock_level level;
+   __u32 lvb;
+
+   while (iter < max_iter && !caught_sig) {
+   expected = iter;
+
+   if ((iter % num_procs) == rank)
+   level = O2DLM_LEVEL_EXMODE;
+   else
+   level = O2DLM_LEVEL_PRMODE;
+
+   if (level == O2DLM_LEVEL_PRMODE) {
+   ret = MPI_Barrier(MPI_COMM_WORLD);
+   if (ret != MPI_SUCCESS)
+   rprintf(rank, "read MPI_Barrier failed: %d\n", 
ret);
+   err = o2dlm_lock(dlm, lockid, 0, level);
+   if (err)
+   rprintf(rank, "o2dlm_lock failed: %d\n", err);
+
+   expected++;
+   } else {
+   err = o2dlm_lock(dlm, lockid, 0, level);
+   if (err)
+   rprintf(rank, "o2dlm_lock failed: %d\n", err);
+
+   ret = MPI_Barrier(MPI_COMM_WORLD);
+   if (ret != MPI_SUCCESS)
+   rprintf(rank, "read MPI_Barrier failed: %d\n", 
ret);
+   to_write = iter + 1;
+   }
+
+   err = o2dlm_read_lvb(dlm, lockid, (char *)&lvb, sizeof(lvb),
+&read);
+   if (err)
+   rprintf(rank, "o2dlm_read_lvb failed: %d\n", err);
+
+   lvb = be32_to_cpu(lvb);
+
+   if (level == O2DLM_LEVEL_PRMODE)
+   printf("%s: read  iter: %llu, lvb: %llu exp: %llu\n",
+  hostname, (unsigned long long)iter,
+  (unsigned long long)lvb,
+  (unsigned long long)expected);
+   else
+   printf("%s: write iter: %llu, lvb: %llu wri: %llu\n",
+  hostname, (unsigned long long)iter,
+  (unsigned long long)lvb,
+  (unsigned long long)to_write);
+
+   fflush(stdout);
+
+   if (lvb != expected) {
+   printf("Test failed! %s: rank %d, read lvb %llu, 
expected %llu\n",
+  hostname, rank, (unsigned long long) lvb,
+  (unsigned long long) expected);
+   MPI_Abort(MPI_COMM_WORLD, 1);
+   }
+
+   if (level == O2DLM_LEVEL_EXMODE) {
+   lvb = cpu_to_be32(to_write);
+
+   err = o2dlm_write_lvb(dlm, lockid, (char *)&lvb,
+ sizeof(lvb), &written);
+   if (err)
+   rprintf(rank, "o2dlm_write_lvb failed: %d\n", 
err);
+   if (written != sizeof(lvb))
+   rprintf(rank, "o2dlm_write_lvb() wrote %d, we 
asked for %d\n", written, sizeof(lvb));
+   }
+
+   err = o2dlm_unlock(dlm, lockid);
+   if (err)
+   rprintf(rank, "o2dlm_unlock failed: %d\n", err);
+
+   /* This second barrier is not necessary and can be
+* commented out to ramp the test up */
+   ret = MPI_Barrier(MPI_COMM_WORLD);
+   if (ret != MPI_SUCCESS)
+   rprintf(rank, "unlock MPI_Barrier failed: %d\n", ret);
+
+   iter++;
+   }
+}
+
 static void clear_lock(struct o2dlm_ctxt *dlm, char *lockid)
 {
char empty[O2DLM_LOCK_ID_MAX_LEN];
@@ -363,8 +458,7 @@ int main(int argc, char *argv[])
 
 printf("%s: rank: %d, nodes: %d, dlm:

[Ocfs2-devel] [PATCH 17/17] discontig bg: give single and multiple node test different log file name

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren 
---
 programs/discontig_bg_test/discontig_runner.sh | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/programs/discontig_bg_test/discontig_runner.sh 
b/programs/discontig_bg_test/discontig_runner.sh
index 182ca3a..f3a69f5 100755
--- a/programs/discontig_bg_test/discontig_runner.sh
+++ b/programs/discontig_bg_test/discontig_runner.sh
@@ -164,8 +164,13 @@ function f_setup()
LOG_DIR=${LOG_DIR:-$DEFAULT_LOG_DIR}
${MKDIR_BIN} -p ${LOG_DIR} || exit 1

-   RUN_LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date +%F-%H-\
-%M-%S`-discontig-bg-run.log"
+   if [ -n "${MULTI_TEST}" ];then
+   RUN_LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date 
+%F-%H-\
+%M-%S`-discontig-bg-multiple-run.log"
+   else
+   RUN_LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date 
+%F-%H-\
+%M-%S`-discontig-bg-single-run.log"
+   fi
LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date +%F-%H-\
 %M-%S`-discontig-bg.log"
 PUNCH_LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date 
+%F-%H-\
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 09/17] Single run: make blocksize and clustersize as parameters

2016-12-12 Thread Eric Ren

It takes too long to get the result of a round testing. This
can shorten a lot time by eliminating 2-layer loops with blocksize
and clustersize. Now blocksize defaults to 4096, while clustersize
to 32768 if not specified.

Signed-off-by: Eric Ren 
---
 programs/backup_super/test_backup_super.sh |  29 +++-
 programs/inline-data/single-inline-run.sh  |  24 ++-
 programs/mkfs-tests/mkfs-test.sh   |  27 +++-
 programs/python_common/single_run-WIP.sh   | 247 +
 programs/reflink_tests/reflink_test_run.sh |  26 ++-
 programs/tunefs-test/tunefs-test.sh|  28 +++-
 programs/xattr_tests/xattr-single-run.sh   |  26 ++-
 7 files changed, 308 insertions(+), 99 deletions(-)

diff --git a/programs/backup_super/test_backup_super.sh 
b/programs/backup_super/test_backup_super.sh
index ed7c94b..530109e 100755
--- a/programs/backup_super/test_backup_super.sh
+++ b/programs/backup_super/test_backup_super.sh
@@ -49,6 +49,9 @@ LOGFILE=""
 FIRST_BACKUP_OFF=1073741824#1G
 MAX_NUM=6
 
+blocksize=
+clustersize=
+
 #
 # usageDisplay help information and exit.
 #
@@ -65,11 +68,13 @@ function usage()
  --with-mkfs=PROGRAM  use the PROGRAM as fswreck
  --with-debugfs=PROGRAMuse the PROGRAM as mkfs.ocfs2
  --with-tunefs=PROGRAMuse the PROGRAM as tunefs.ocfs2
+ --block-size=blocksize   block size
+ --cluster-size=clustersize   cluster size
 
Examples:
 
- $script --with-debugfs=../debugfs.ocfs2/debugfs.ocfs2 /dev/sde2
- $script --with-mkfs=/sbin/mkfs.ocfs2 --log-dir=/tmp /dev/sde2
+ $script --with-debugfs=../debugfs.ocfs2/debugfs.ocfs2 
--block-size=4096 --clustersize=32768??/dev/sde2
+ $script --with-mkfs=/sbin/mkfs.ocfs2 --log-dir=/tmp --block-size=4096 
--clustersize=32768 /dev/sde2
EOF
 }
 
@@ -376,10 +381,20 @@ function volume_small_test()
 ##
 function normal_test()
 {
-   for blocksize in 512 4096
+   if [ "$blocksize" != "NONE" ];then
+   bslist="$blocksize"
+   else
+   bslist="512 4096"
+   fi
+   if [ "$clustersize" != "NONE" ];then
+   cslist="$clustersize"
+   else
+   cslist="4096 32768 1048576"
+   fi
+   for blocksize in $(echo "$bslist")
do
for clustersize in \
-   4096 32768 1048576
+   $(echo "$cslist")
do
 
vol_byte_size=$FIRST_BACKUP_OFF
@@ -462,6 +477,12 @@ do
"--with-tunefs="*)
TUNEFS_BIN="${1#--with-tunefs=}"
;;
+   "--block-size="*)
+   blocksize="${1#--block-size=}"
+   ;;
+   "--cluster-size="*)
+   clustersize="${1#--cluster-size=}"
+   ;;
*)
DEVICE="$1"
;;
diff --git a/programs/inline-data/single-inline-run.sh 
b/programs/inline-data/single-inline-run.sh
index 5a176cd..89b2f4c 100755
--- a/programs/inline-data/single-inline-run.sh
+++ b/programs/inline-data/single-inline-run.sh
@@ -105,8 +105,10 @@ exit_or_not()
 

 f_usage()
 {
-echo "usage: `basename ${0}` [-o output] <-d > "
+echo "usage: `basename ${0}` [-o output] <-b blocksize> <-c clustersize> 
<-d > "
 echo "   -o output directory for the logs"
+echo "   -b blocksize"
+echo "   -c clustersize"
 echo "   -d device name used for ocfs2 volume"
 echo "path of mountpoint where the ocfs2 volume 
will be mounted on."
 exit 1;
@@ -120,10 +122,12 @@ f_getoptions()
 exit 1
  fi
 
- while getopts "o:hd:" options; do
+ while getopts "o:hd:b:c:" options; do
 case $options in
 o ) LOG_OUT_DIR="$OPTARG";;
 d ) OCFS2_DEVICE="$OPTARG";;
+b ) BLOCKSIZE="$OPTARG";;
+c ) CLUSTERSIZE="$OPTARG";;
 h ) f_usage
 exit 1;;
 * ) f_usage
@@ -132,7 +136,6 @@ f_getoptions()
 done
 shift $(($OPTIND -1))
 MOUNT_POINT=${1}
-
 }
 
 f_setup()
@@ -373,9 +376,20 @@ trap ' : ' SIGTERM
 
 f_setup $*
 
-for BLOCKSIZE in 512 1024 4096
+if [ "$BLOCKSIZE" != "NONE" ];then
+   bslist="$BLOCKSIZE"
+else
+   bslist="512 1024 4096"
+fi
+if [ "$CLUSTERSIZE" != "NONE" ];then
+   cslist="$CLUSTERSIZE"
+el

[Ocfs2-devel] [PATCH 13/17] Save punch_hole details into logfile for debugging convenience

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren 
---
 programs/discontig_bg_test/discontig_runner.sh | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/programs/discontig_bg_test/discontig_runner.sh 
b/programs/discontig_bg_test/discontig_runner.sh
index 3be39c8..4c13adb 100755
--- a/programs/discontig_bg_test/discontig_runner.sh
+++ b/programs/discontig_bg_test/discontig_runner.sh
@@ -41,6 +41,7 @@ DEFAULT_LOG_DIR=${O2TDIR}/log
 LOG_DIR=
 RUN_LOG_FILE=
 LOG_FILE=
+PUNCH_LOG_FILE=
 
 BLOCKSIZE=
 CLUSTERSIZE=
@@ -162,6 +163,8 @@ function f_setup()
 %M-%S`-discontig-bg-run.log"
LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date +%F-%H-\
 %M-%S`-discontig-bg.log"
+PUNCH_LOG_FILE="`dirname ${LOG_DIR}`/`basename ${LOG_DIR}`/`date 
+%F-%H-\
+%M-%S`-punch-hole.log"
 
 }
 
@@ -529,7 +532,7 @@ function f_extents_test()
recs_in_blk=$(((${BLOCKSIZE}-64)/16))
while :;do
if [ "$((${RANDOM}%2))" -eq "0" ];then
-   ${PUNCH_HOLE_BIN} -f ${filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>/dev/null 2>&1 || {
+   ${PUNCH_HOLE_BIN} -f ${filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>${PUNCH_LOG_FILE} 2>&1 || {
f_LogMsg ${LOG_FILE} "Punch hole at 
offset:${offset} failed."
return 1
}
@@ -798,13 +801,13 @@ function f_refcount_test()
fi
while :;do
if [ "$((${RANDOM}%2))" -eq "0" ];then
-   ${PUNCH_HOLE_BIN} -f ${orig_filename} -s ${offset} -l 
${CLUSTERSIZE} >>/dev/null 2>&1 || {
+   ${PUNCH_HOLE_BIN} -f ${orig_filename} -s ${offset} -l 
${CLUSTERSIZE} >>${PUNCH_LOG_FILE} 2>&1 || {
f_LogMsg ${LOG_FILE} "Punch hole at 
offset:${offset} failed on ${orig_filename}."
return 1
}
fi
if [ "$((${RANDOM}%2))" -eq "1" ];then
-   ${PUNCH_HOLE_BIN} -f ${ref_filename} -s ${offset} -l 
${CLUSTERSIZE} >>/dev/null 2>&1 || {
+   ${PUNCH_HOLE_BIN} -f ${ref_filename} -s ${offset} -l 
${CLUSTERSIZE} >>${PUNCH_LOG_FILE} 2>&1 || {
f_LogMsg ${LOG_FILE} "Punch hole at 
offset:${offset} failed on ${ref_filename}."
return 1
}
@@ -822,14 +825,14 @@ function f_refcount_test()
recs_in_blk=$(((${BLOCKSIZE}-64)/16))
while :;do
if [ "$((${RANDOM}%2))" -eq "0" ];then
-   ${PUNCH_HOLE_BIN} -f ${orig_filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>/dev/null 2>&1 || {
+   ${PUNCH_HOLE_BIN} -f ${orig_filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>${PUNCH_LOG_FILE} 2>&1 || {
f_LogMsg ${LOG_FILE} "Punch hole at 
offset:${offset} failed on ${orig_filename}."
return 1
}
fi
 
if [ "$((${RANDOM}%2))" -eq "1" ];then
-   ${PUNCH_HOLE_BIN} -f ${ref_filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>/dev/null 2>&1 || {
+   ${PUNCH_HOLE_BIN} -f ${ref_filename} -s ${offset} -l 
$((${CLUSTERSIZE}*${recs_in_blk})) >>${PUNCH_LOG_FILE} 2>&1 || {
f_LogMsg ${LOG_FILE} "Punch hole at 
offset:${offset} failed on ${ref_filename}."
return 1
}
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 05/17] Trivial: fix checking empty return value

2016-12-12 Thread Eric Ren

We now get the below error even if "reserve space"
testcase succeeds:
"Error in log_end()"
This is because we passed Nil to log_end.

Signed-off-by: Eric Ren 
---
 programs/python_common/single_run-WIP.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/programs/python_common/single_run-WIP.sh 
b/programs/python_common/single_run-WIP.sh
index 99f24cc..d474463 100755
--- a/programs/python_common/single_run-WIP.sh
+++ b/programs/python_common/single_run-WIP.sh
@@ -781,6 +781,7 @@ run_reserve_space()
done
 
do_umount ${mountpoint}
+   RC=$?
 
log_end ${RC}
done
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 02/17] Single Run: kernel building is little broken now

2016-12-12 Thread Eric Ren

Only check kernel source if we specify "buildkernel" test case.
The original kernel source web-link cannot be reached,
so give a new link instead but the md5sum check is missing
now.

Signed-off-by: Eric Ren 
---
 programs/python_common/single_run-WIP.sh | 56 
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/programs/python_common/single_run-WIP.sh 
b/programs/python_common/single_run-WIP.sh
index fe0056c..61008d8 100755
--- a/programs/python_common/single_run-WIP.sh
+++ b/programs/python_common/single_run-WIP.sh
@@ -20,9 +20,9 @@ WGET=`which wget`
 WHOAMI=`which whoami`
 SED=`which sed`
 
-DWNLD_PATH="http://oss.oracle.com/~smushran/ocfs2-test";
-KERNEL_TARBALL="linux-kernel.tar.gz"
-KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
+DWNLD_PATH="https://cdn.kernel.org/pub/linux/kernel/v3.x/";
+KERNEL_TARBALL="linux-3.2.80.tar.xz"
+#KERNEL_TARBALL_CHECK="${KERNEL_TARBALL}.md5sum"
 USERID=`${WHOAMI}`
 
 DEBUGFS_BIN="${SUDO} `which debugfs.ocfs2`"
@@ -85,7 +85,7 @@ get_bits()
 # get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL $KERNEL_TARBALL_CHECK
 get_kernel_source()
 {
-   if [ "$#" -lt "4" ]; then
+   if [ "$#" -lt "3" ]; then
${ECHO} "Error in get_kernel_source()"
exit 1
fi
@@ -93,18 +93,18 @@ get_kernel_source()
logdir=$1
dwnld_path=$2
kernel_tarball=$3
-   kernel_tarball_check=$4
+   #kernel_tarball_check=$4
 
cd ${logdir}
 
outlog=get_kernel_source.log
 
-   ${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
-   if [ $? -ne 0 ]; then
-   ${ECHO} "ERROR downloading 
${dwnld_path}/${kernel_tarball_check}"
-   cd -
-   exit 1
-   fi
+#  ${WGET} -o ${outlog} ${dwnld_path}/${kernel_tarball_check}
+#  if [ $? -ne 0 ]; then
+#  ${ECHO} "ERROR downloading 
${dwnld_path}/${kernel_tarball_check}"
+#  cd -
+#  exit 1
+#  fi
 
${WGET} -a ${outlog} ${dwnld_path}/${kernel_tarball}
if [ $? -ne 0 ]; then
@@ -113,13 +113,13 @@ get_kernel_source()
exit 1
fi
 
-   ${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
-   if [ $? -ne 0 ]; then
-   ${ECHO} "ERROR ${kernel_tarball_check} check failed"
-   cd -
-   exit 1
-   fi
-   cd -
+#  ${MD5SUM} -c ${kernel_tarball_check} >>${outlog} 2>&1
+#  if [ $? -ne 0 ]; then
+#  ${ECHO} "ERROR ${kernel_tarball_check} check failed"
+#  cd -
+#  exit 1
+#  fi
+#  cd -
 }
 
 # do_format() ${BLOCKSIZE} ${CLUSTERSIZE} ${FEATURES} ${DEVICE}
@@ -1012,16 +1012,6 @@ LOGFILE=${LOGDIR}/single_run.log
 
 do_mkdir ${LOGDIR}
 
-if [ -z ${KERNELSRC} ]; then
-   get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
$KERNEL_TARBALL_CHECK
-   KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
-fi
-
-if [ ! -f ${KERNELSRC} ]; then
-   ${ECHO} "No kernel source"
-   usage
-fi
-
 STARTRUN=$(date +%s)
 log_message "*** Start Single Node test ***"
 
@@ -1058,6 +1048,16 @@ for tc in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; do
fi
 
if [ "$tc"X = "buildkernel"X -o "$tc"X = "all"X ];then
+   if [ -z ${KERNELSRC} ]; then
+   get_kernel_source $LOGDIR $DWNLD_PATH $KERNEL_TARBALL 
#$KERNEL_TARBALL_CHECK
+   KERNELSRC=${LOGDIR}/${KERNEL_TARBALL}
+   fi
+
+   if [ ! -f ${KERNELSRC} ]; then
+   ${ECHO} "No kernel source"
+   usage
+   fi
+
run_buildkernel ${LOGDIR} ${DEVICE} ${MOUNTPOINT} ${KERNELSRC}
fi
 
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 06/17] multi_mmap: make log messages go to right place

2016-12-12 Thread Eric Ren

The option "--logfile" is missing now. Thus, log
messages go into "o2t.log", which is a apparent
mistake.

Signed-off-by: Eric Ren 
---
 programs/python_common/multiple_run.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index c4a7da9..2e0ec72 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -339,7 +339,7 @@ run_multi_mmap_test()
local testfile=${workplace}/multi_mmap_test_file
 
run_common_testcase "multi_mmap" "sparse,unwritten,inline-data" \
-"${BINDIR}/run_multi_mmap.py -i 2 -I ${INTERFACE} -n ${NODE_LIST} -c -b 
6000 --hole -f ${testfile}"
+"${BINDIR}/run_multi_mmap.py -i 2 -I ${INTERFACE} -l ${logfile} -n 
${NODE_LIST} -c -b 6000 --hole -f ${testfile}"
 }
 
 run_create_racer_test()
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 01/17] ocfs2 test: correct the check on testcase if supported

2016-12-12 Thread Eric Ren

Signed-off-by: Eric Ren 
---
 programs/python_common/multiple_run.sh   | 2 +-
 programs/python_common/single_run-WIP.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/programs/python_common/multiple_run.sh 
b/programs/python_common/multiple_run.sh
index dd9603f..c4a7da9 100755
--- a/programs/python_common/multiple_run.sh
+++ b/programs/python_common/multiple_run.sh
@@ -201,7 +201,7 @@ f_setup()
fi
 
SUPPORTED_TESTCASES="all xattr inline reflink write_append_truncate 
multi_mmap create_racer flock_unit cross_delete open_delete lvb_torture"
-   for cas in ${TESTCASES}; do
+   for cas in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; do
echo ${SUPPORTED_TESTCASES} | grep -sqw $cas
if [ $? -ne 0 ]; then
echo "testcase [${cas}] not supported."
diff --git a/programs/python_common/single_run-WIP.sh 
b/programs/python_common/single_run-WIP.sh
index 5a8fae1..fe0056c 100755
--- a/programs/python_common/single_run-WIP.sh
+++ b/programs/python_common/single_run-WIP.sh
@@ -997,7 +997,7 @@ fi
 SUPPORTED_TESTCASES="all create_and_open directaio fillverifyholes 
renamewriterace aiostress\
   filesizelimits mmaptruncate buildkernel splice sendfile mmap reserve_space 
inline xattr\
   reflink mkfs tunefs backup_super filecheck"
-for cas in ${TESTCASES}; do
+for cas in `${ECHO} ${TESTCASES} | ${SED} "s:,: :g"`; do
echo ${SUPPORTED_TESTCASES} | grep -sqw $cas
if [ $? -ne 0 ]; then
echo "testcase [${cas}] not supported."
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 00/17] ocfs2-test: misc improvements and trivial fixes

2016-12-12 Thread Eric Ren

- Misc trivial fixes:

[PATCH 01/17] ocfs2 test: correct the check on testcase if supported
[PATCH 02/17] Single Run: kernel building is little broken now
[PATCH 03/17] Trivial: better not to depend on where we issue testing
[PATCH 04/17] Trivial: fix a typo mistake
[PATCH 05/17] Trivial: fix checking empty return value
[PATCH 06/17] multi_mmap: make log messages go to right place
[PATCH 07/17] lvb_torture: failed when pcmk is used as cluster stack
[PATCH 08/17] multiple node: pass cross_delete the right log file

- This patches add two more parameters: blocksize and clustersize when we
kick off a testing, which shortens the run time of a testing round.
It will keep the old behaviors if they are not specified.

[PATCH 09/17] Single run: make blocksize and clustersize as
[PATCH 10/17] Multiple run: make blocksize and clustersize as
[PATCH 11/17] discontig bg: make blocksize and clustersize as

- This patch reflects the mkfs.ocfs2 changes that "--cluster-stack" and
"--cluster-name" were added.

[PATCH 12/17] Add two cluster-aware parameters: cluster stack and cluster name

- More misc trival fixes:

[PATCH 13/17] Save punch_hole details into logfile for debugging
[PATCH 14/17] Fix openmpi warning by specifying proper slot number
[PATCH 15/17] Handle the case when a symbolic link device is given
[PATCH 16/17] inline data: fix build error
[PATCH 17/17] discontig bg: give single and multiple node test

Comments and questions are, as always, welcome.

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-11 Thread Eric Ren

84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
>> [  247.834748] RIP  [] ocfs2_truncate_file+0x640/0x6c0
>> [ocfs2]
>> [  247.834774]  RSP 
>> "
>>
>> It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not
>> equal to the disk i_size. We mistakenly trust the LVB because the
>> underlaying
>> fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly
>> for
>> us. But, why?
>>
>> The current code tries to downconvert lock without DLM_LKF_VALBLK
>> flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion,
>> even if the lock resource type needs LVB. This is not the right way for
>> fsdlm.
>>
>> The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
>> DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
>> DLM_LKF_VALBLK
>> is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the
>> right
>> DLM_SBF_VALNOTVALID appropriately when node failure happens.
>>
>> The following diagram briefly illustrates how this crash happens:
>>
>> RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
>>
>> The 1st round:
>>
>>   Node1Node2
>> RSB1: PR
>>RSB1(master): NULL->EX
>> ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
>>ocfs2_dlm_lock(no DLM_LKF_VALBLK)
>>
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>
>> dlm_lock(no DLM_LKF_VALBLK)
>>convert_lock(overwrite lkb->lkb_exflags
>> with no DLM_LKF_VALBLK)
>>
>> RSB1: NULLRSB1: EX
>>reset Node2
>> dlm_recover_rsbs()
>>recover_lvb()
>>
>> /* The LVB is not trustable if the node with EX fails and
>>   * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
>>   */
>>
>>   if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
>> return;   * to invalid the LVB here.
>>   */
>>
>> The 2nd round:
>>
>>   Node 1Node2
>> RSB1(become master from recovery)
>>
>> ocfs2_setattr()
>>ocfs2_inode_lock(NULL->EX)
>>  /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID
>> */
>>  ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from
>> disk */
>>ocfs2_truncate_file()
>>mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */
>>
>> The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for
>> dlm_lock()
>> if the lock resource type needs LVB and the fsdlm plugin is uesed.
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/dlmglue.c   | 10 ++
>>   fs/ocfs2/stackglue.c |  6 ++
>>   fs/ocfs2/stackglue.h |  3 +++
>>   3 files changed, 19 insertions(+)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 83d576f..77d1632 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -3303,6 +3303,16 @@ static int ocfs2_downconvert_lock(struct ocfs2_super
>> *osb,
>>  mlog(ML_BASTS, "lockres %s, level %d => %d\n", lockres->l_name,
>>   lockres->l_level, new_level);
>>   
>> +/*
>> + * On DLM_LKF_VALBLK, fsdlm behaves differently with o2cb. It always
>> + * expects DLM_LKF_VALBLK being set if the LKB has LVB, so that
>> + * we can recover correctly from node failure. Otherwise, we may get
>> + * invalid LVB in LKB, but without DLM_SBF_VALNOTVALID being set.
>> + */
>> +if (!ocfs2_is_o2cb_active() &&
>> +lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
>> +lvb = 1;
>> +
>>  if (lvb)
>>  dlm_flags |= DLM_LKF_VALBLK;
>>   
>> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
>> index 52c07346b..8203590 100644
>> --- a/fs/ocfs2/stackglue.c
>> +++ b/fs/ocfs2/stackglue.c
>> @@ -48,6 +48,12 @@ static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] =
>> "/sbin/ocfs2_hb_ctl";
>>*/
>>   static struct ocfs2_stack_plugin *active_stack;
>>   
>> +inline int ocfs2_is_o2cb_active(void)
>> +{
>> +return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
>> +}
>> +EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
>> +
>>   static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
>>   {
>>  struct ocfs2_stack_plugin *p;
>> diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
>> index f2dce10..e3036e1 100644
>> --- a/fs/ocfs2/stackglue.h
>> +++ b/fs/ocfs2/stackglue.h
>> @@ -298,6 +298,9 @@ void ocfs2_stack_glue_set_max_proto_version(struct
>> ocfs2_protocol_version *max_p
>>   int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
>>   void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
>>   
>> +/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
>> +int ocfs2_is_o2cb_active(void);
>> +
>>   extern struct kset *ocfs2_kset;
>>   
>>   #endif  /* STACKGLUE_H */
>> -- 
>> 2.6.6
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren

Sorry, this email is not delivered to Mark successfully because of one weird 
character
trailing his email address somehow.

So, resend later...

Thanks,
Eric

On 12/09/2016 05:24 PM, Eric Ren wrote:
> The crash happens rather often when we reset some cluster
> nodes while nodes contend fiercely to do truncate and append.
>
> The crash backtrace is below:
> "
> [  245.197849] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 
> locks on 971 resources
> [  245.197859] dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 
> generation 5 done: 4 ms
> [  245.198379] ocfs2: Begin replay journal (node 318952601, slot 2) on device 
> (253,18)
> [  247.272338] ocfs2: End replay journal (node 318952601, slot 2) on device 
> (253,18)
> [  247.547084] ocfs2: Beginning quota recovery on device (253,18) for slot 2
> [  247.683263] ocfs2: Finishing quota recovery on device (253,18) for slot 2
> [  247.833022] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug 
> expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
> [  247.833029] (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 
> 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
> [  247.833074] [ cut here ]
> [  247.833077] kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
> [  247.833079] invalid opcode:  [#1] SMP
> [  247.833081] Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) 
> ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse 
> sd_modiscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet 
> iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr 
> parport  joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button 
> processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix  
>  drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt 
> fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio 
> ehci_hcd   usbcore serio_raw usb_common sg dm_multipath dm_mod 
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> [  247.833107] Supported: No, Unsupported modules are loaded
> [  247.833110] CPU: 1 PID: 30154 Comm: truncate Tainted: G   OE   N  
> 4.4.21-69-default #1
> [  247.833111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
> [  247.833112] task: 88004ff6d240 ti: 880074e68000 task.ti: 
> 880074e68000
> [  247.833113] RIP: 0010:[]  [] 
> ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
> [  247.833151] RSP: 0018:880074e6bd50  EFLAGS: 00010282
> [  247.833152] RAX: 0074 RBX: 029e RCX: 
> 
> [  247.833153] RDX: 0001 RSI: 0246 RDI: 
> 0246
> [  247.833154] RBP: 880074e6bda8 R08: 3675dc7a R09: 
> 82013414
> [  247.833155] R10: 00034c50 R11:  R12: 
> 88003aab3448
> [  247.833156] R13: 02dc R14: 00046e11 R15: 
> 0020
> [  247.833157] FS:  7f839f965700() GS:88007fc8() 
> knlGS:
> [  247.833158] CS:  0010 DS:  ES:  CR0: 8005003b
> [  247.833159] CR2: 7f839f97e000 CR3: 36723000 CR4: 
> 06e0
> [  247.833164] Stack:
> [  247.833165]  03a9 0001 880060554000 
> 88004fcaf000
> [  247.833167]  88003aa7b090 1000 88003aab3448 
> 880074e6beb0
> [  247.833169]  0001 2068 0020 
> 
> [  247.833171] Call Trace:
> [  247.833208]  [] ocfs2_setattr+0x698/0xa90 [ocfs2]
> [  247.833225]  [] notify_change+0x1ae/0x380
> [  247.833242]  [] do_truncate+0x5e/0x90
> [  247.833246]  [] do_sys_ftruncate.constprop.11+0x108/0x160
> [  247.833257]  [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [  247.834724] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
> [  247.834725]
> [  247.834726] Leftover inexact backtrace:
>
> [  247.834728] Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 
> 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 
> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
> [  247.834748] RIP  [] ocfs2_truncate_file+0x640/0x6c0 
> [ocfs2]
> [  247.834774]  RSP 
> "
>
> It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not
> equal to the disk i_size. We mistakenly trust the LVB because the underlaying
> fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for
> us. But, why?
>
> The current code tries to downconvert lock without DLM_LKF_VALBLK
> flag to tell o2cb don't update RSB

[Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren

 RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
   with no DLM_LKF_VALBLK)

RSB1: NULLRSB1: EX
  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
   return;   * to invalid the LVB here.
 */

The 2nd round:

 Node 1Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
/* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from 
disk */
  ocfs2_truncate_file()
  mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for 
dlm_lock()
if the lock resource type needs LVB and the fsdlm plugin is uesed.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c   | 10 ++
 fs/ocfs2/stackglue.c |  6 ++
 fs/ocfs2/stackglue.h |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..77d1632 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3303,6 +3303,16 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
*osb,
mlog(ML_BASTS, "lockres %s, level %d => %d\n", lockres->l_name,
 lockres->l_level, new_level);
 
+   /*
+* On DLM_LKF_VALBLK, fsdlm behaves differently with o2cb. It always
+* expects DLM_LKF_VALBLK being set if the LKB has LVB, so that
+* we can recover correctly from node failure. Otherwise, we may get
+* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID??being set.
+*/
+   if (!ocfs2_is_o2cb_active() &&
+   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
+   lvb = 1;
+
if (lvb)
dlm_flags |= DLM_LKF_VALBLK;
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 52c07346b..8203590 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -48,6 +48,12 @@ static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = 
"/sbin/ocfs2_hb_ctl";
  */
 static struct ocfs2_stack_plugin *active_stack;
 
+inline int ocfs2_is_o2cb_active(void)
+{
+   return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
+}
+EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
+
 static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
 {
struct ocfs2_stack_plugin *p;
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index f2dce10..e3036e1 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -298,6 +298,9 @@ void ocfs2_stack_glue_set_max_proto_version(struct 
ocfs2_protocol_version *max_p
 int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
 void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 
+/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
+int ocfs2_is_o2cb_active(void);
+
 extern struct kset *ocfs2_kset;
 
 #endif  /* STACKGLUE_H */
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-09 Thread Eric Ren

 RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
   with no DLM_LKF_VALBLK)

RSB1: NULLRSB1: EX
  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
   return;   * to invalid the LVB here.
 */

The 2nd round:

 Node 1Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
/* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from 
disk */
  ocfs2_truncate_file()
  mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for 
dlm_lock()
if the lock resource type needs LVB and the fsdlm plugin is uesed.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c   | 10 ++
 fs/ocfs2/stackglue.c |  6 ++
 fs/ocfs2/stackglue.h |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..77d1632 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3303,6 +3303,16 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
*osb,
mlog(ML_BASTS, "lockres %s, level %d => %d\n", lockres->l_name,
 lockres->l_level, new_level);
 
+   /*
+* On DLM_LKF_VALBLK, fsdlm behaves differently with o2cb. It always
+* expects DLM_LKF_VALBLK being set if the LKB has LVB, so that
+* we can recover correctly from node failure. Otherwise, we may get
+* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID??being set.
+*/
+   if (!ocfs2_is_o2cb_active() &&
+   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
+   lvb = 1;
+
if (lvb)
dlm_flags |= DLM_LKF_VALBLK;
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 52c07346b..8203590 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -48,6 +48,12 @@ static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = 
"/sbin/ocfs2_hb_ctl";
  */
 static struct ocfs2_stack_plugin *active_stack;
 
+inline int ocfs2_is_o2cb_active(void)
+{
+   return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
+}
+EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
+
 static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
 {
struct ocfs2_stack_plugin *p;
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index f2dce10..e3036e1 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -298,6 +298,9 @@ void ocfs2_stack_glue_set_max_proto_version(struct 
ocfs2_protocol_version *max_p
 int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
 void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 
+/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
+int ocfs2_is_o2cb_active(void);
+
 extern struct kset *ocfs2_kset;
 
 #endif  /* STACKGLUE_H */
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 0/7] quota: Use s_umount for quota on/off serialization

2016-11-30 Thread Eric Ren

Hello,

On 11/24/2016 04:12 PM, Jan Kara wrote:
> Hello,
>
> this patch set changes quota code to use s_umount semaphore for serialization
> of quota on/off operations among each other and with other quotactl and
> quota writeback operations. So far we have used dedicated dqonoff_mutex but
> that triggered lockdep warnings during fs freezing and also unnecessarily
> serialized some quotactl operations.
>
> Al, any objections to patch 1/7 exporting functionality to get superblock with
> s_umount in exclusive mode? Alternatively I could add a wrapper around
> get_super_thawed() in quota code to drop s_umount & get it in exclusive mode
> and recheck that superblock didn't get unmounted / frozen but what I did here
> looked cleaner to me.
>
> OCFS2 guys, it would be good if you could test ocfs2 quotas with this patch 
> set
> in some multi-node setup (I have tested just with a single node), especially
> whether quota file recovery for other nodes still works as expected. Thanks.

With this patch set, the quota file recovery works well for ocfs2 on multiple 
nodes.

Tested-by:Eric Ren 

Thanks,
Eric
>
> If nobody objects, I'll push these changes through my tree to Linus.
>
>   Honza
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Optimization of code while free dead locks, changed for reviews.

2016-11-28 Thread Eric Ren

Hi,

I am tired telling you things about patch format... won't do any response until 
you really 
model after
a correct patch.

Eric

On 11/28/2016 05:05 PM, Guozhonghua wrote:
> Changed the free order and code styles with reviews. Based on Linux-4.9-rc6. 
> Thanks.
>
> Signed-off-by: guozhonghua 
>
> diff -uprN ocfs2.orig/dlm/dlmrecovery.c ocfs2/dlm/dlmrecovery.c
> --- ocfs2.orig/dlm/dlmrecovery.c2016-11-28 16:26:45.890934481 +0800
> +++ ocfs2/dlm/dlmrecovery.c 2016-11-28 16:32:04.982940629 +0800
> @@ -2268,6 +2268,9 @@ static void dlm_free_dead_locks(struct d
>   {
>  struct dlm_lock *lock, *next;
>  unsigned int freed = 0;
> +   struct list_head *queue = NULL;
> +   int i;
> +
>
>  /* this node is the lockres master:
>   * 1) remove any stale locks for the dead node
> @@ -2280,33 +2283,19 @@ static void dlm_free_dead_locks(struct d
>   * to force the DLM_UNLOCK_FREE_LOCK action so as to free the locks 
> */
>
>  /* TODO: check pending_asts, pending_basts here */
> -   list_for_each_entry_safe(lock, next, &res->granted, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->converting, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->blocked, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> +   for (i = DLM_GRANTED_LIST; i <= DLM_BLOCKED_LIST; i++) {
> +   queue = dlm_list_idx_to_ptr(res, i);
> +   list_for_each_entry_safe(lock, next, queue, list) {
> +   if (lock->ml.node == dead_node) {
> +   list_del_init(&lock->list);
> +   dlm_lock_put(lock);
> +
> +   /* Can't schedule DLM_UNLOCK_FREE_LOCK
> +* - do manually
> +*/
> +   dlm_lock_put(lock);
> +   freed++;
> +   }
> +   }
> +   }
>
>  if (freed) {
>  mlog(0, "%s:%.*s: freed %u locks for dead node %u, "
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Optimization of code while free dead locks.

2016-11-27 Thread Eric Ren

Hi,

On 11/26/2016 08:15 PM, Guozhonghua wrote:
> The three loops can be optimized into one loop and its sub loops, so as small 
> code can do the same work.
> The patch is based on the linux-4.9-rc6.
>
> Signed-off-by: Guozhonghua 
>
>
> --- ocfs2.orig/dlm/dlmrecovery.c2016-11-26 19:13:04.833023242 +0800
> +++ ocfs2/dlm/dlmrecovery.c 2016-11-26 19:24:03.982552497 +0800
I don't think this patch could be applied cleanly:
--
zhen@desktop:~/linux> git am ~/patches/temp/\[PATCH\]\ ocfs2\:\ Optimization\ 
of\ code\ 
while\ free\ dead\ locks..eml
Applying: ocfs2: Optimization of code while free dead locks.
.git/rebase-apply/patch:7: trailing whitespace.
struct list_head *queue = NULL;
.git/rebase-apply/patch:8: trailing whitespace.
int i;
fatal: corrupt patch at line 9
Patch failed at 0001 ocfs2: Optimization of code while free dead locks.
--

Please go through the docs below:
[1] https://kernelnewbies.org/FirstKernelPatch

The file path (ocfs2/dlm/dlmrecovery.c ) is weird.  It should be like:

   diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c

> @@ -2268,6 +2268,8 @@ static void dlm_free_dead_locks(struct d
>   {
>  struct dlm_lock *lock, *next;
>  unsigned int freed = 0;
> +   struct list_head *queue = NULL;
> +   int i;
>
>  /* this node is the lockres master:
>   * 1) remove any stale locks for the dead node
> @@ -2280,33 +2282,19 @@ static void dlm_free_dead_locks(struct d
>   * to force the DLM_UNLOCK_FREE_LOCK action so as to free the locks 
> */
>
>  /* TODO: check pending_asts, pending_basts here */
> -   list_for_each_entry_safe(lock, next, &res->granted, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> +   for (i = DLM_BLOCKED_LIST; i >= DLM_GRANTED_LIST; i--) {

Is it right to loop the lists in a reversed order to the original?

Eric
> +   queue = dlm_list_idx_to_ptr(res, i);
> +   list_for_each_entry_safe(lock, next, queue, list) {
> +   if (lock->ml.node == dead_node) {
> +   list_del_init(&lock->list);
> +   dlm_lock_put(lock);
> +
> +   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do 
> manually */
> +   dlm_lock_put(lock);
> +   freed++;
> +   }
>  }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->converting, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> -   list_for_each_entry_safe(lock, next, &res->blocked, list) {
> -   if (lock->ml.node == dead_node) {
> -   list_del_init(&lock->list);
> -   dlm_lock_put(lock);
> -   /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually 
> */
> -   dlm_lock_put(lock);
> -   freed++;
> -   }
> -   }
> +   }
>
>  if (freed) {
>  mlog(0, "%s:%.*s: freed %u locks for dead node %u, "
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [Bug Report] multiple node reflink: kernel BUG at ../fs/ocfs2/suballoc.c:1989!

2016-11-23 Thread Eric Ren

Hi all,

FYI,

Reflink testcase in multiple nodes mode failed with the backtrace below:

---
2016-11-02T16:43:41.862247+08:00 ocfs2cts2 kernel: [25429.622914] [ 
cut here ]
2016-11-02T16:43:41.862273+08:00 ocfs2cts2 kernel: [25429.622979] kernel BUG at 
../fs/ocfs2/suballoc.c:1989!
2016-11-02T16:43:41.862274+08:00 ocfs2cts2 kernel: [25429.623024] invalid 
opcode:  [#1] SMP
2016-11-02T16:43:41.862276+08:00 ocfs2cts2 kernel: [25429.623064] Modules 
linked in: ocfs2_stack_user ocfs2 ocfs2_nodemanager ocfs2_stackglue jbd2 
quota_tree dlm configfs softdog sd_mod iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs ppdev acpi_cpufreq 
pvpanic virtio_net parport_pc joydev serio_raw i2c_piix4 pcspkr parport button 
virtio_balloon processor btrfs ata_generic xor raid6_pq ata_piix ahci libahci 
cirrus virtio_blk drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
ttm uhci_hcd libata ehci_hcd usbcore drm virtio_pci floppy virtio_ring 
usb_common virtio sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
scsi_mod autofs4
2016-11-02T16:43:41.862277+08:00 ocfs2cts2 kernel: [25429.623590] Supported: Yes
2016-11-02T16:43:41.862278+08:00 ocfs2cts2 kernel: [25429.623624] CPU: 0 PID: 
1923 Comm: multi_reflink_t Not tainted 4.4.21-69-default #1
2016-11-02T16:43:41.862279+08:00 ocfs2cts2 kernel: [25429.623684] Hardware 
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
2016-11-02T16:43:41.862280+08:00 ocfs2cts2 kernel: [25429.623744] task: 
880080010480 ti: 8800806cc000 task.ti: 8800806cc000
2016-11-02T16:43:41.862281+08:00 ocfs2cts2 kernel: [25429.623801] RIP: 
0010:[]  [] 
ocfs2_claim_metadata+0x148/0x150 [ocfs2]
2016-11-02T16:43:41.862282+08:00 ocfs2cts2 kernel: [25429.623961] RSP: 
0018:8800806cf838  EFLAGS: 00010297
2016-11-02T16:43:41.862282+08:00 ocfs2cts2 kernel: [25429.624010] RAX: 
0003 RBX: 88008989f4c0 RCX: 8800806cf8b0
2016-11-02T16:43:41.862284+08:00 ocfs2cts2 kernel: [25429.624064] RDX: 
0001 RSI: 88008989f4c0 RDI: 88006e2313b8
2016-11-02T16:43:41.862284+08:00 ocfs2cts2 kernel: [25429.624119] RBP: 
 R08: 8800806cf8aa R09: 8800806cf8ac
2016-11-02T16:43:41.862285+08:00 ocfs2cts2 kernel: [25429.624173] R10: 
0001d5e0 R11: 88011f9912c0 R12: 88008076e000
2016-11-02T16:43:41.862286+08:00 ocfs2cts2 kernel: [25429.624226] R13: 
88006e2313b8 R14: 88003693cae8 R15: 88008989f4c0
2016-11-02T16:43:41.862287+08:00 ocfs2cts2 kernel: [25429.624281] FS:  
7f6a35621740() GS:88013fc0() knlGS:
2016-11-02T16:43:41.862288+08:00 ocfs2cts2 kernel: [25429.624342] CS:  0010 DS: 
 ES:  CR0: 8005003b
2016-11-02T16:43:41.862289+08:00 ocfs2cts2 kernel: [25429.624389] CR2: 
0114e000 CR3: 53597000 CR4: 06f0
2016-11-02T16:43:41.862297+08:00 ocfs2cts2 kernel: [25429.624448] Stack:
2016-11-02T16:43:41.862299+08:00 ocfs2cts2 kernel: [25429.624476]  
810b7a80   
2016-11-02T16:43:41.862300+08:00 ocfs2cts2 kernel: [25429.624534]  
 0001  88008076e000
2016-11-02T16:43:41.862301+08:00 ocfs2cts2 kernel: [25429.624592]  
88006e2313b8 88003693cae8 a051bcb9 8800806cf8b8
2016-11-02T16:43:41.862302+08:00 ocfs2cts2 kernel: [25429.624649] Call Trace:
2016-11-02T16:43:41.862303+08:00 ocfs2cts2 kernel: [25429.624719]  
[] ocfs2_create_new_meta_bhs.isra.49+0x69/0x330 [ocfs2]
2016-11-02T16:43:41.862304+08:00 ocfs2cts2 kernel: [25429.624797]  
[] ocfs2_add_branch+0x1fd/0x830 [ocfs2]
2016-11-02T16:43:41.862306+08:00 ocfs2cts2 kernel: [25429.624878]  
[] ocfs2_grow_tree+0x350/0x710 [ocfs2]
2016-11-02T16:43:41.862307+08:00 ocfs2cts2 kernel: [25429.624943]  
[] ocfs2_split_and_insert+0x2e1/0x450 [ocfs2]
2016-11-02T16:43:41.862308+08:00 ocfs2cts2 kernel: [25429.625012]  
[] ocfs2_split_extent+0x3e4/0x540 [ocfs2]
2016-11-02T16:43:41.862309+08:00 ocfs2cts2 kernel: [25429.625082]  
[] ocfs2_clear_ext_refcount+0x1c9/0x2b0 [ocfs2]
2016-11-02T16:43:41.862310+08:00 ocfs2cts2 kernel: [25429.625155]  
[] ocfs2_make_clusters_writable+0x3b8/0x8d0 [ocfs2]
2016-11-02T16:43:41.862311+08:00 ocfs2cts2 kernel: [25429.625229]  
[] ocfs2_replace_cow+0x87/0x1c0 [ocfs2]
2016-11-02T16:43:41.862312+08:00 ocfs2cts2 kernel: [25429.626825]  
[] ocfs2_refcount_cow+0x3ea/0x4f0 [ocfs2]
2016-11-02T16:43:41.862314+08:00 ocfs2cts2 kernel: [25429.626825]  
[] ocfs2_file_write_iter+0xb8b/0xdf0 [ocfs2]
2016-11-02T16:43:41.862315+08:00 ocfs2cts2 kernel: [25429.626825]  
[] __vfs_write+0xa9/0xf0
2016-11-02T16:43:41.862316+08:00 ocfs2cts2 kernel: [25429.626825]  
[] vfs_write+0x9d/0x190
2016-11-02T16:43:41.862317+08:00 ocfs2cts2 kernel: [25429.626825]  
[] SyS_pwrite64+0x62/0x90
2016-11-02T16:43:41.862318+08:00 ocfs2cts2 kernel: [25429.626825]  
[] entry_SYSCALL_64_fastpath+0x12/0x6d
2016-11-02T16:

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-16 Thread Eric Ren

Hi,

On 11/16/2016 06:45 PM, Dan Carpenter wrote:
> On Wed, Nov 16, 2016 at 10:33:49AM +0800, Eric Ren wrote:
> That silences the warning, of course, but I feel like the code is buggy.
> How do we know that we don't hit that exit path?
Sorry, I missed your point. Do you mean the below?

"1817 goto out_quota; " will free (*wc), but with "ret = 0". Thus, the caller
think it's OK to use (*wc), but...

Do I understand you correctly?

Eric
>
> fs/ocfs2/aops.c
>1808  /*
>1809   * ocfs2_grab_pages_for_write() returns -EAGAIN if it could 
> not lock
>1810   * the target page. In this case, we exit with no error and 
> no target
>1811   * page. This will trigger the caller, page_mkwrite(), to 
> re-try
>1812   * the operation.
>1813   */
>1814  if (ret == -EAGAIN) {
>1815  BUG_ON(wc->w_target_page);
>1816  ret = 0;
>1817  goto out_quota;
>1818  }
>
> regards,
> dan carpenter
>
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix sparse file & data ordering issue in direct io

2016-11-15 Thread Eric Ren

Hi Dan,

On 11/15/2016 06:36 PM, Dan Carpenter wrote:
> Ryan's email is dead.  But this is buggy.  Someone please fix it.
>
> regards,
> dan carpenter
>
> On Tue, Nov 15, 2016 at 01:33:30PM +0300, Dan Carpenter wrote:
>> I never got a response on this.  I was looking at it today and it still
>> looks buggy to me.
>>
>> regards,
>> dan carpenter
>>
>> On Wed, Mar 09, 2016 at 01:25:05PM +0300, Dan Carpenter wrote:
>>> Hello Ryan Ding,
>>>
>>> The patch fbe25fb91af5: "ocfs2: fix sparse file & data ordering issue
>>> in direct io" from Feb 25, 2016, leads to the following static
>>> checker warning:
>>>
>>> fs/ocfs2/aops.c:2242 ocfs2_dio_get_block()
>>> error: potentially dereferencing uninitialized 'wc'.
>>>
>>> fs/ocfs2/aops.c
>>>2235
>>>2236  ret = ocfs2_write_begin_nolock(inode->i_mapping, pos, len,
>>>2237 OCFS2_WRITE_DIRECT, NULL,
>>>2238 (void **)&wc, di_bh, NULL);
>>> 
How do you perform the static checker? Please tech me;-)

Regarding this warning, please try to make this line 
(https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L2128)
into:

struct ocfs2_write_ctxt *wc = NULL;

It should work, and haven't any side effect.

Eric
>>>
>>> See commit 5c9e2986 ('ocfs2: Fix ocfs2_page_mkwrite()') for an
>>> explanation why a zero return here does not imply that "wc" has been
>>> initialized.
>>>
>>>2239  if (ret) {
>>>2240  mlog_errno(ret);
>>>2241  goto unlock;
>>>2242  }
>>>2243
>>>2244  desc = &wc->w_desc[0];
>>>2245
>>>2246  p_blkno = ocfs2_clusters_to_blocks(inode->i_sb, 
>>> desc->c_phys);
>>>
>>> regards,
>>> dan carpenter
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren

Hi,
> Thanks for your attention. Actually, I tried different versions of draft 
> patch locally.
> Either of them can satisfy myself so far.

Sorry, I meat "neither of them".

Eric
> Some rules I'd like to follow:
> 1) check and avoid recursive cluster locking, rather than allow it which 
> Junxiao had tried
> before;
> 2) Just keep track of lock resource that meets the following requirements:
>a. normal inodes (non systemfile);
>b. inode metadata lockres (not open, rw lockres);
> why? to avoid more special cluster locking usecases, like journal systemfile, 
> "LOST+FOUND"
> open lockres, that lock/unlock
> operations are performed by different processes, making tracking task more 
> tricky.
> 3) There is another problem if we follow "check + avoid" pattern, which I 
> have mentioned in
> this thread:
> """
> This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.
>
> So, we may need another function something like ocfs2_inode_getbh():
>if (!oh)
>   ocfs2_inode_lock();
>  else
>  ocfs2_inode_getbh();
> """
>
> Hope we can work out a nice solution for this tricky issue ;-)
>
> Eric
>
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-14 Thread Eric Ren

Hi,

On 11/14/2016 01:42 PM, piaojun wrote:
> Hi Eric,
>
>
> OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST 
> (ocfs2_generic_handle_bast) when downconvert is needed
> on behalf of remote lock request.
>
> The recursive cluster lock (the second one) will be blocked in 
> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED.
> But the downconvert cannot be done, why? because there is no chance for the 
> first cluster lock on this node to be unlocked -
> we blocked ourselves in the code path.
>
> Eric
> You clear my doubt. I will look through your solution.

Thanks for your attention. Actually, I tried different versions of draft patch 
locally. 
Either of them can satisfy myself so far.
Some rules I'd like to follow:
1) check and avoid recursive cluster locking, rather than allow it which 
Junxiao had tried 
before;
2) Just keep track of lock resource that meets the following requirements:
  a. normal inodes (non systemfile);
  b. inode metadata lockres (not open, rw lockres);
why? to avoid more special cluster locking usecases, like journal systemfile, 
"LOST+FOUND" 
open lockres, that lock/unlock
operations are performed by different processes, making tracking task more 
tricky.
3) There is another problem if we follow "check + avoid" pattern, which I have 
mentioned in 
this thread:
"""
This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.

So, we may need another function something like ocfs2_inode_getbh():
  if (!oh)
 ocfs2_inode_lock();
else
ocfs2_inode_getbh();
"""

Hope we can work out a nice solution for this tricky issue ;-)

Eric

>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren

On 11/11/2016 02:20 PM, Darrick J. Wong wrote:
> On Fri, Nov 11, 2016 at 01:49:48PM +0800, Eric Ren wrote:
>> Hi,
>>
>> A few issues obvious to me:
>>
>> On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
>>> Connect the new VFS clone_range, copy_range, and dedupe_range features
>>> to the existing reflink capability of ocfs2.  Compared to the existing
>>> ocfs2 reflink ioctl We have to do things a little differently to support
>>> the VFS semantics (we can clone subranges of a file but we don't clone
>>> xattrs), but the VFS ioctls are more broadly supported.
>> How can I test the new ocfs2 reflink (with this patch) manually? What
>> commands should I use to do xxx_range things?
> See the 'reflink', 'dedupe', and 'copy_range' commands in xfs_io.
>
> The first two were added in xfsprogs 4.3, and copy_range in 4.7.

OK, thanks. I think you are missing the following two inline comments:

>>> +   spin_lock(&OCFS2_I(dest)->ip_lock);
>>> +   if (newlen > i_size_read(dest)) {
>>> +   i_size_write(dest, newlen);
>>> +   di->i_size = newlen;
>> di->i_size = cpu_to_le64(newlen);
>>
>>> +   }
>>> +   spin_unlock(&OCFS2_I(dest)->ip_lock);
>>> +
>> Add ocfs2_update_inode_fsync_trans() here? Looks this function was
>> introduced by you to improve efficiency.
>> Just want to awake your memory about this, though I don't know about the
>> details why it should be.
>>
>> Eric
Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 6/6] ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features

2016-11-10 Thread Eric Ren

Hi,

A few issues obvious to me:

On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
> Connect the new VFS clone_range, copy_range, and dedupe_range features
> to the existing reflink capability of ocfs2.  Compared to the existing
> ocfs2 reflink ioctl We have to do things a little differently to support
> the VFS semantics (we can clone subranges of a file but we don't clone
> xattrs), but the VFS ioctls are more broadly supported.

How can I test the new ocfs2 reflink (with this patch) manually? What commands 
should I
use to do xxx_range things?

>
> Signed-off-by: Darrick J. Wong 
> ---
>   fs/ocfs2/file.c |   62 -
>   fs/ocfs2/file.h |3
>   fs/ocfs2/refcounttree.c |  619 
> +++
>   fs/ocfs2/refcounttree.h |7 +
>   4 files changed, 688 insertions(+), 3 deletions(-)
>
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 000c234..d5a022d 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1667,9 +1667,9 @@ static void ocfs2_calc_trunc_pos(struct inode *inode,
>   *done = ret;
>   }
>   
> -static int ocfs2_remove_inode_range(struct inode *inode,
> - struct buffer_head *di_bh, u64 byte_start,
> - u64 byte_len)
> +int ocfs2_remove_inode_range(struct inode *inode,
> +  struct buffer_head *di_bh, u64 byte_start,
> +  u64 byte_len)
>   {
>   int ret = 0, flags = 0, done = 0, i;
>   u32 trunc_start, trunc_len, trunc_end, trunc_cpos, phys_cpos;
> @@ -2440,6 +2440,56 @@ static loff_t ocfs2_file_llseek(struct file *file, 
> loff_t offset, int whence)
>   return offset;
>   }
>   
> +static ssize_t ocfs2_file_copy_range(struct file *file_in,
> +  loff_t pos_in,
> +  struct file *file_out,
> +  loff_t pos_out,
> +  size_t len,
> +  unsigned int flags)
> +{
> + int error;
> +
> + error = ocfs2_reflink_remap_range(file_in, pos_in, file_out, pos_out,
> +   len, false);
> + if (error)
> + return error;
> + return len;
> +}
> +
> +static int ocfs2_file_clone_range(struct file *file_in,
> +   loff_t pos_in,
> +   struct file *file_out,
> +   loff_t pos_out,
> +   u64 len)
> +{
> + return ocfs2_reflink_remap_range(file_in, pos_in, file_out, pos_out,
> +  len, false);
> +}
> +
> +#define OCFS2_MAX_DEDUPE_LEN (16 * 1024 * 1024)
> +static ssize_t ocfs2_file_dedupe_range(struct file *src_file,
> +u64 loff,
> +u64 len,
> +struct file *dst_file,
> +u64 dst_loff)
> +{
> + int error;
> +
> + /*
> +  * Limit the total length we will dedupe for each operation.
> +  * This is intended to bound the total time spent in this
> +  * ioctl to something sane.
> +  */
> + if (len > OCFS2_MAX_DEDUPE_LEN)
> + len = OCFS2_MAX_DEDUPE_LEN;
> +
> + error = ocfs2_reflink_remap_range(src_file, loff, dst_file, dst_loff,
> +   len, true);
> + if (error)
> + return error;
> + return len;
> +}
> +
>   const struct inode_operations ocfs2_file_iops = {
>   .setattr= ocfs2_setattr,
>   .getattr= ocfs2_getattr,
> @@ -2479,6 +2529,9 @@ const struct file_operations ocfs2_fops = {
>   .splice_read= generic_file_splice_read,
>   .splice_write   = iter_file_splice_write,
>   .fallocate  = ocfs2_fallocate,
> + .copy_file_range = ocfs2_file_copy_range,
> + .clone_file_range = ocfs2_file_clone_range,
> + .dedupe_file_range = ocfs2_file_dedupe_range,
>   };
>   
>   const struct file_operations ocfs2_dops = {
> @@ -2524,6 +2577,9 @@ const struct file_operations ocfs2_fops_no_plocks = {
>   .splice_read= generic_file_splice_read,
>   .splice_write   = iter_file_splice_write,
>   .fallocate  = ocfs2_fallocate,
> + .copy_file_range = ocfs2_file_copy_range,
> + .clone_file_range = ocfs2_file_clone_range,
> + .dedupe_file_range = ocfs2_file_dedupe_range,
>   };
>   
>   const struct file_operations ocfs2_dops_no_plocks = {
> diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
> index e8c62f2..897fd9a 100644
> --- a/fs/ocfs2/file.h
> +++ b/fs/ocfs2/file.h
> @@ -82,4 +82,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int 
> cmd,
>   
>   int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos,
>  size_t count);
> +int ocfs2_remove_inode_range(struct inode *inode,
> +  struct buffer_head *di_bh,

Re: [Ocfs2-devel] [PATCH 0/6] ocfs2: wire up {clone, copy, dedupe}_range

2016-11-10 Thread Eric Ren

Hi,

On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
> Hi all,
>
> These patches wire up the existing ocfs2 reflinking capabilities to
> the new(ish) VFS {copy,clone,dedupe}_range interface.  The first few
> patches clean up some minor bugs that I found; the last kernel patch
> contains the new code.
>
> A few minor fixes to xfstests are needed to make more of the tests
> run.  I'll tack that patch on the end.

FYI, reflink testcases from ocfs2-test both on single and multiple node(s)
all passed with your patches. At least, it shows that no obvious regression 
issue
is observed so far ;-)

Eric
>
> --D
>
> [1] https://github.com/djwong/linux/tree/ocfs2-vfs-reflink
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-10 Thread Eric Ren

Hi,

On 11/10/2016 06:49 PM, piaojun wrote:
> Hi Eric,
>
> On 2016-11-1 9:45, Eric Ren wrote:
>> Hi,
>>
>> On 10/31/2016 06:55 PM, piaojun wrote:
>>> Hi Eric,
>>>
>>> On 2016-10-19 13:19, Eric Ren wrote:
>>>> The deadlock issue happens when running discontiguous block
>>>> group testing on multiple nodes. The easier way to reproduce
>>>> is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
>>>> nodes at the same time by pssh.
>>>>
>>>> This is indeed another deadlock caused by: commit 743b5f1434f5
>>>> ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
>>>> had been explained well by Tariq Saeed in this thread:
>>>>
>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html
>>>>
>>>> For this case, the ocfs2_inode_lock() is misused recursively as below:
>>>>
>>>> do_sys_open
>>>>do_filp_open
>>>> path_openat
>>>>  may_open
>>>>   inode_permission
>>>>__inode_permission
>>>> ocfs2_permission  <== ocfs2_inode_lock()
>>>>  generic_permission
>>>>   get_acl
>>>>ocfs2_iop_get_acl  <== ocfs2_inode_lock()
>>>> ocfs2_inode_lock_full_nested <= deadlock if a remote EX 
>>>> request
>>> Do you mean another node wants to get ex of the inode? or another process?
>> Remote EX request means "another node wants to get ex of the inode";-)
>>
>> Eric
> If another node wants to get ex, it will get blocked as this node has
> got pr. Why will the ex request make this node get blocked? Expect your
> detailed description.
Did you look at this link I mentioned above?

OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST 
(ocfs2_generic_handle_bast) when 
downconvert is needed
on behalf of remote lock request.

The recursive cluster lock (the second one) will be blocked in 
__ocfs2_cluster_lock() 
because of OCFS2_LOCK_BLOCKED.
But the downconvert cannot be done, why? because there is no chance for the 
first cluster 
lock on this node to be unlocked -
we blocked ourselves in the code path.

Eric
>
> thanks,
> Jun
>>>> comes between two ocfs2_inode_lock()
>>>>
>>>> Fix by checking if the cluster lock has been acquired aready in the 
>>>> call-chain
>>>> path.
>>>>
>>>> Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in 
>>>> ocfs2_iop_set/get_acl()")
>>>> Signed-off-by: Eric Ren 
>>>> ---
>>>>fs/ocfs2/acl.c | 39 +++
>>>>1 file changed, 27 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>> index bed1fcb..7e3544e 100644
>>>> --- a/fs/ocfs2/acl.c
>>>> +++ b/fs/ocfs2/acl.c
>>>> @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
>>>>int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int 
>>>> type)
>>>>{
>>>>struct buffer_head *bh = NULL;
>>>> +struct ocfs2_holder *oh;
>>>> +struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>>>int status = 0;
>>>>-status = ocfs2_inode_lock(inode, &bh, 1);
>>>> -if (status < 0) {
>>>> -if (status != -ENOENT)
>>>> -mlog_errno(status);
>>>> -return status;
>>>> +oh = ocfs2_is_locked_by_me(lockres);
>>>> +if (!oh) {
>>>> +status = ocfs2_inode_lock(inode, &bh, 1);
>>>> +if (status < 0) {
>>>> +if (status != -ENOENT)
>>>> +mlog_errno(status);
>>>> +return status;
>>>> +}
>>>>}
>>>> +
>>>>status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>>> -ocfs2_inode_unlock(inode, 1);
>>>> +
>>>> +if (!oh)
>>>> +ocfs2_inode_unlock(inode, 1);
>>>>brelse(bh);
>>>>return status;
>>>>}
>>>> @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>>>> *inode, int type)
>>>>struct ocfs2_super *osb;
>>>>struct buffer_head *di_bh = NULL;
>>>>

Re: [Ocfs2-devel] ocfs2: A race about mle is unlinked and freed for the dead node, BUG

2016-11-09 Thread Eric Ren

Hi,

I am not familiar with ocfs2/dlm code, but I am trying to...

On 11/09/2016 06:17 PM, Zhangguanghui wrote:
> Hi All,
>
> when the mle have been used in dlm_get_lock_resouce, other nodes dead at the 
> same time,
> the mle that is block type may be unlinked and freed repeatedly for dead 
> nodes.
> so it is a BUG  about mle->mle_refs.refcount in __dlm_put_mle  in 
> dlm_get_lock_resouce.
May I suggest you give a big picture and background of what is going on before 
deep into 
code details, for someone like me
who don't know much about the code? As a stupid reader, what I would like see 
here are:

1) What is going on before this trouble?
2) Why does it ran into this trouble?  what do you expect and don't expect? 
maybe a 
simplified sequence diagram can make
it much more descriptive because we need to know: is this problem that happens 
on single or 
multiple node(s)? how do they interact
with each other if multiple nodes? For example:

commit 86b652b93adb57d8fed8edd532ed2eb8a791950d
Author: piaojun 
Date:   Tue Aug 2 14:02:13 2016 -0700

 ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before 
dlm_deref_lockres_done_handler

 We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
 unexpected that described below.  To solve the bug, we disable the
 BUG_ON and purge lockres in dlm_do_local_recovery_cleanup.

 Node 1   Node 2(master)
 dlm_purge_lockres
  dlm_deref_lockres_handler

  DLM_LOCK_RES_SETREF_INPROG is set
  response DLM_DEREF_RESPONSE_INPROG

 receive DLM_DEREF_RESPONSE_INPROG
 stop puring in dlm_purge_lockres
 and wait for DLM_DEREF_RESPONSE_DONE

  dispatch dlm_deref_lockres_worker
  response DLM_DEREF_RESPONSE_DONE

 receive DLM_DEREF_RESPONSE_DONE and
 prepare to purge lockres

  Node 2 goes down

 find Node2 down and do local
 clean up for Node2:
 dlm_do_local_recovery_cleanup
   -> clear DLM_LOCK_RES_DROPPING_REF

 when purging lockres, BUG_ON happens
 because DLM_LOCK_RES_DROPPING_REF is clear:
 dlm_deref_lockres_done_handler
   ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
---

3) Paste the back trace if it hits a BUG_ON(xxx);
4) Then you can deep into more details with code if necessary;
5) Explain how you fix this problem, and any side effects you can think of?

OK, back to you description, could you please explain to me:
1)  "the mle that is block type" - what's "block type"?
2) "may be " - when does it happen definitely? when doesn't?

> Finally, any feedback about this process (positive or negative) would be  
> greatly appreciated.
>
> *** linux-4.1.35/fs/ocfs2/dlm/dlmmaster.c 2016-11-09 17:39:02.230163503 +0800
> --- dlmmaster.c.update 2016-11-09 17:41:39.210166752 +0800
> ***
> *** 3229,3248 
> --- 3229,3261 
> struct dlm_master_list_entry *mle, u8 dead_node)
> {
> int bit;
> + int next_bit = O2NM_MAX_NODES;
> BUG_ON(mle->type != DLM_MLE_BLOCK);
Please use git to make your patch even if it's a draft patch, and add this:
```
[diff "default"]
xfuncname = "^[[:alpha:]$_].*[^:]$"
```
to your ~/.gitconfig to show in which function the changes are made.

Eric
>
> spin_lock(&mle->spinlock);
> bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0);
> + if (bit != O2NM_MAX_NODES)
> + next_bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, bit+1);
> +
> if (bit != dead_node) {
> mlog(0, "mle found, but dead node %u would not have been "
> "master\n", dead_node);
> spin_unlock(&mle->spinlock);
> + } else if (mle->inuse && next_bit != O2NM_MAX_NODES) {
> + /*Ignore it, the mle is used, other nodes dead now.
> + *as it is unlinked and freed for the dead node, it's a BUG*/
> + mlog(ML_ERROR, "the mle is used, but inuse %d, dead node %u, "
> + "master %u\n", mle->inuse, dead_node, mle->master);
> + clear_bit(bit, mle->maybe_map);
> + spin_unlock(&mle->spinlock);
> +
> } else {
> /* Must drop the refcount by one since the assert_master will
> * never arrive. This may result in the mle being unlinked and
> * freed, but there may still be a process waiting in the
> * dlmlock path which is fine. */
> mlog(0, "node %u was expected master\n", dead_node);
> + clear_bit(bit, mle->maybe_map);
> atomic_set(&mle->woken, 1);
> spin_unlock(&mle->spinlock);
> wake_up(&mle->wq);
>
> 
> All the best wishes for you.
> zhangguanghui
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information fro

Re: [Ocfs2-devel] [PATCH 1/6] ocfs2: convert inode refcount test to a helper

2016-11-09 Thread Eric Ren

On 11/10/2016 06:51 AM, Darrick J. Wong wrote:
> Replace the open-coded inode refcount flag test with a helper function
> to reduce the potential for bugs.
Thanks for this series;-) Some comments inline below:
>
> Signed-off-by: Darrick J. Wong 
> ---
>   fs/ocfs2/refcounttree.c |   28 +++-
>   fs/ocfs2/refcounttree.h |2 ++
>   2 files changed, 17 insertions(+), 13 deletions(-)
>
>
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index 1923851..59be8f4 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -48,6 +48,12 @@
>   #include 
>   #include 
>   
> +/* Does this inode have the reflink flag set? */
> +bool ocfs2_is_refcount_inode(struct inode *inode)
Should it be an inline function?

After applying this patch, looks there are still some places not being replaced 
with this 
function:
---
fs/ocfs2 # grep -rn "OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL"
xattr.c:2580:if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL) {
xattr.c:3611:if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL &&
file.c:1722:if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL) {
file.c:2039:!(OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL) 
||
refcounttree.c:55:return (OCFS2_I(inode)->ip_dyn_features & 
OCFS2_HAS_REFCOUNT_FL);

Eric

> +{
> + return (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL);
> +}
> +
>   struct ocfs2_cow_context {
>   struct inode *inode;
>   u32 cow_start;
> @@ -410,7 +416,7 @@ static int ocfs2_get_refcount_block(struct inode *inode, 
> u64 *ref_blkno)
>   goto out;
>   }
>   
> - BUG_ON(!(OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL));
> + BUG_ON(!ocfs2_is_refcount_inode(inode));
>   
>   di = (struct ocfs2_dinode *)di_bh->b_data;
>   *ref_blkno = le64_to_cpu(di->i_refcount_loc);
> @@ -570,7 +576,7 @@ static int ocfs2_create_refcount_tree(struct inode *inode,
>   u32 num_got;
>   u64 suballoc_loc, first_blkno;
>   
> - BUG_ON(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL);
> + BUG_ON(ocfs2_is_refcount_inode(inode));
>   
>   trace_ocfs2_create_refcount_tree(
>   (unsigned long long)OCFS2_I(inode)->ip_blkno);
> @@ -708,7 +714,7 @@ static int ocfs2_set_refcount_tree(struct inode *inode,
>   struct ocfs2_refcount_block *rb;
>   struct ocfs2_refcount_tree *ref_tree;
>   
> - BUG_ON(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL);
> + BUG_ON(ocfs2_is_refcount_inode(inode));
>   
>   ret = ocfs2_lock_refcount_tree(osb, refcount_loc, 1,
>  &ref_tree, &ref_root_bh);
> @@ -775,7 +781,7 @@ int ocfs2_remove_refcount_tree(struct inode *inode, 
> struct buffer_head *di_bh)
>   u64 blk = 0, bg_blkno = 0, ref_blkno = le64_to_cpu(di->i_refcount_loc);
>   u16 bit = 0;
>   
> - if (!(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL))
> + if (!ocfs2_is_refcount_inode(inode))
>   return 0;
>   
>   BUG_ON(!ref_blkno);
> @@ -2299,11 +2305,10 @@ int ocfs2_decrease_refcount(struct inode *inode,
>   {
>   int ret;
>   u64 ref_blkno;
> - struct ocfs2_inode_info *oi = OCFS2_I(inode);
>   struct buffer_head *ref_root_bh = NULL;
>   struct ocfs2_refcount_tree *tree;
>   
> - BUG_ON(!(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL));
> + BUG_ON(!ocfs2_is_refcount_inode(inode));
>   
>   ret = ocfs2_get_refcount_block(inode, &ref_blkno);
>   if (ret) {
> @@ -2533,7 +2538,6 @@ int ocfs2_prepare_refcount_change_for_del(struct inode 
> *inode,
> int *ref_blocks)
>   {
>   int ret;
> - struct ocfs2_inode_info *oi = OCFS2_I(inode);
>   struct buffer_head *ref_root_bh = NULL;
>   struct ocfs2_refcount_tree *tree;
>   u64 start_cpos = ocfs2_blocks_to_clusters(inode->i_sb, phys_blkno);
> @@ -2544,7 +2548,7 @@ int ocfs2_prepare_refcount_change_for_del(struct inode 
> *inode,
>   goto out;
>   }
>   
> - BUG_ON(!(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL));
> + BUG_ON(!ocfs2_is_refcount_inode(inode));
>   
>   ret = ocfs2_get_refcount_tree(OCFS2_SB(inode->i_sb),
> refcount_loc, &tree);
> @@ -3412,14 +3416,13 @@ static int ocfs2_refcount_cow_hunk(struct inode 
> *inode,
>   {
>   int ret;
>   u32 cow_start = 0, cow_len = 0;
> - struct ocfs2_inode_info *oi = OCFS2_I(inode);
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
>   struct buffer_head *ref_root_bh = NULL;
>   struct ocfs2_refcount_tree *ref_tree;
>   struct ocfs2_cow_context *context = NULL;
>   
> - BUG_ON(!(oi->ip_dyn_features & OCFS2_HAS_REFCOUNT_FL));
> + BUG_ON(!ocfs2_is_refcount_inode(inode));
>   
>   ret = ocfs2_refcount_cal_cow_clusters(inode, &di->id2.i_list,
>

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-11-08 Thread Eric Ren


Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..7e3544e 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  {
struct buffer_head *bh = NULL;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int status = 0;
  
-	status = ocfs2_inode_lock(inode, &bh, 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   status = ocfs2_inode_lock(inode, &bh, 1);
+   if (status < 0) {
+   if (status != -ENOENT)
+   mlog_errno(status);
+   return status;
+   }
}

This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.

So, we may need another function something like ocfs2_inode_getbh():
 if (!oh)
ocfs2_inode_lock();
   else
   ocfs2_inode_getbh();

Eric

+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!oh)
+   ocfs2_inode_unlock(inode, 1);
brelse(bh);
return status;
  }
@@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int ret;
  
  	osb = OCFS2_SB(inode->i_sb);

if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
+
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+   if (ret < 0) {
+   if (ret != -ENOENT)
+   mlog_errno(ret);
+   return ERR_PTR(ret);
+   }
}
  
  	acl = ocfs2_get_acl_nolock(inode, type, di_bh);
  
-	ocfs2_inode_unlock(inode, 0);

+   if (!oh)
+   ocfs2_inode_unlock(inode, 0);
brelse(di_bh);
return acl;
  }



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-11-08 Thread Eric Ren


Hi all,

On 10/19/2016 01:19 PM, Eric Ren wrote:

ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are three solutions I can think of.  The first one is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code. Frown on this.

The second one is, what I am trying now, to keep track of the processes who
lock/unlock a cluster lock by the following draft patches. But, I quickly
find out that a cluster locking which has been taken by processA can be unlocked
by processB. For example, systemfiles like journal: is locked during mout, 
and
unlocked during umount.

We can avoid the problem above by:

1) not keeping track of system file inode:

   if (!(OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE)) {
   
  }

2) only keeping track of inode metadata lockres:

   OCFS2_I(inode)->ip_inode_lockres;

because inode open lockres can also be get/release by different processes.

Eric


The thrid one is to revert that problematic commit! It looks like get/set_acl()
are always been called by other vfs callback like ocfs2_permission(). I think
we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)

Hope for your input to solve this problem;-)

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] what is g_f_a_w_n() short for? thanks

2016-11-07 Thread Eric Ren


Hello Mark,

There is a piece of comment that confused me, please correct me:

https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L2274
```
ocfs2_file_write_iter() {
...
 /*
 * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
 * function pointer which is called when o_direct io completes so that
 * it can unlock our rw lock.
...
}
```
Should g_f_a_w_n() be g_f_a_w_i() instead? Because grep only hits 
(__)generic_file_write_iter() by pattern.


Thanks,
Eric

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-31 Thread Eric Ren

Hi,

On 10/31/2016 06:55 PM, piaojun wrote:
> Hi Eric,
>
> On 2016-10-19 13:19, Eric Ren wrote:
>> The deadlock issue happens when running discontiguous block
>> group testing on multiple nodes. The easier way to reproduce
>> is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
>> nodes at the same time by pssh.
>>
>> This is indeed another deadlock caused by: commit 743b5f1434f5
>> ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
>> had been explained well by Tariq Saeed in this thread:
>>
>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html
>>
>> For this case, the ocfs2_inode_lock() is misused recursively as below:
>>
>> do_sys_open
>>   do_filp_open
>>path_openat
>> may_open
>>  inode_permission
>>   __inode_permission
>>ocfs2_permission  <== ocfs2_inode_lock()
>> generic_permission
>>  get_acl
>>   ocfs2_iop_get_acl  <== ocfs2_inode_lock()
>>ocfs2_inode_lock_full_nested <= deadlock if a remote EX 
>> request
> Do you mean another node wants to get ex of the inode? or another process?
Remote EX request means "another node wants to get ex of the inode";-)

Eric
>> comes between two ocfs2_inode_lock()
>>
>> Fix by checking if the cluster lock has been acquired aready in the 
>> call-chain
>> path.
>>
>> Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in 
>> ocfs2_iop_set/get_acl()")
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/acl.c | 39 +++
>>   1 file changed, 27 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..7e3544e 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
>>   int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>> +struct ocfs2_holder *oh;
>> +struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>  int status = 0;
>>   
>> -status = ocfs2_inode_lock(inode, &bh, 1);
>> -if (status < 0) {
>> -if (status != -ENOENT)
>> -mlog_errno(status);
>> -return status;
>> +oh = ocfs2_is_locked_by_me(lockres);
>> +if (!oh) {
>> +status = ocfs2_inode_lock(inode, &bh, 1);
>> +if (status < 0) {
>> +if (status != -ENOENT)
>> +mlog_errno(status);
>> +return status;
>> +}
>>  }
>> +
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!oh)
>> +ocfs2_inode_unlock(inode, 1);
>>  brelse(bh);
>>  return status;
>>   }
>> @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct ocfs2_super *osb;
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>> +struct ocfs2_holder *oh;
>> +struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>  int ret;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> -if (ret < 0) {
>> -if (ret != -ENOENT)
>> -mlog_errno(ret);
>> -return ERR_PTR(ret);
>> +
>> +oh = ocfs2_is_locked_by_me(lockres);
>> +if (!oh) {
>> +ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +if (ret < 0) {
>> +if (ret != -ENOENT)
>> +mlog_errno(ret);
>> +return ERR_PTR(ret);
>> +}
>>  }
>>   
>>  acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   
>> -ocfs2_inode_unlock(inode, 0);
>> +if (!oh)
>> +ocfs2_inode_unlock(inode, 0);
>>  brelse(di_bh);
>>  return acl;
>>   }
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-28 Thread Eric Ren


Hi Christoph!

Thanks for your attention.

On 10/28/2016 02:20 PM, Christoph Hellwig wrote:

Hi Eric,

I've added linux-fsdevel to the cc list as this should get a bit
broader attention.

On Wed, Oct 19, 2016 at 01:19:40PM +0800, Eric Ren wrote:

Mostly, we can avoid recursive locking by writing code carefully. However, as
the deadlock issues have proved out, it's very hard to handle the routines
that are called directly by vfs. For instance:

 const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
 };


ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

What do you actually protect in ocfs2_permission?  It's a trivial
wrapper around generic_permission which just looks at the VFS inode.

Yes, it is.

https://github.com/torvalds/linux/blob/master/fs/ocfs2/file.c#L1321
---
ocfs2_permission
  ocfs2_inode_lock()
generic_permission
ocfs2_inode_unlock


I think the right fix is to remove ocfs2_permission entirely and use
the default VFS implementation.  That both solves your locking problem,
and it will also get you RCU lookup instead of dropping out of
RCU mode all the time.
But, from my understanding, the pair of ocfs2_inode_lock/unlock() is used to prevent any 
concurrent changes
to the permission of the inode on the other cluster node while we are checking on it. It's a 
common  case for cluster

filesystem, such as GFS2: 
https://github.com/torvalds/linux/blob/master/fs/gfs2/inode.c#L1777

Thanks for your suggestion again!
Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

1 2 >

1 - 100 of 197 matches

Mail list logo