Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-14 Thread Eric Ren
Hello Guys,

This is indeed another deadlock caused by:

Commit 743b5f1434f5 ("ocfs2: take inode lock in 
ocfs2_iop_set/get_acl()")

The reason had been explained well by Tariq Saeed in this thread:

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

For this case, the ocfs2_inode_lock() is misused recursively as below:

do_sys_open
do_filp_open
  path_openat
may_open
   inode_permission
  __inode_permission
 ocfs2_permission  <== ocfs2_inode_lock()
generic_permission
get_acl
 ocfs2_iop_get_acl  <== ocfs2_inode_lock()
  ocfs2_inode_lock_full_nested <= deadlock 
if a remote 
EX request comes between two ocfs2_inode_lock()

Welcome any thoughts to deal with this issue!

Thanks,
Eric

On 10/12/2016 09:23 AM, Eric Ren wrote:
> Hi Junxiao,
>
>> Hi Eric,
>>
>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> As the subject, the testing hung there on a kernel without your patches:
>>>
>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>>> and
>>> "ocfs2: fix posix_acl_create deadlock"
>>>
>>> The stack trace is:
>>> ```
>>> ocfs2cts1:~ # pstree -pl 24133
>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>
>>> ocfs2cts1:~ # pgrep -a chmod
>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>
>>> ocfs2cts1:~ # cat /proc/15232/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>> [] iterate_dir+0x9c/0x110
>>> [] SyS_getdents+0x83/0xf0
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>> ```
>>>
>>> Do you think this issue can be fixed by your patches?
>> Looks not. Those two patches are to fix recursive locking deadlock. But
>> from above call trace, there is no recursive lock.
> Sorry, the call trace on another node was missing.  Here it is:
>
> ocfs2cts2:~ # pstree -lp
> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>
> ocfs2cts2:~ # cat /proc/4865/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
> [] generic_permission+0x166/0x1c0
> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
> [] __inode_permission+0x56/0xb0
> [] link_path_walk+0x29a/0x560
> [] path_lookupat+0x7f/0x110
> [] filename_lookup+0x9c/0x150
> [] SyS_fchmodat+0x33/0x90
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
>
> Thanks,
> Eric
>
>
>> Thanks,
>> Junxiao.
>>> I will try your patches later, but I am little worried the possibility
>>> of reproduction may not be 100%.
>>> So ask you to confirm;-)
>>>
>>> Eric
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Junxiao Bi
On 10/12/2016 06:54 PM, Eric Ren wrote:
> Hi,
> 
> On 10/12/2016 05:45 PM, Junxiao Bi wrote:
>> On 10/12/2016 05:34 PM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
 On 10/12/2016 10:36 AM, Eric Ren wrote:
> Hi,
>
> When backporting those patches, I find that they are already in our
> product kernel, maybe
> via "stable kernel" policy, although our product kernel is 4.4
> while the
> patches were merged
> into 4.6.
>
> Seems it's another deadlock that happens when doing `chmod -R 777
> /mnt/ocfs2`
> among mutilple nodes at the same time.
 Yes, but i just finish running ocfs2 full test on linux next-20161006
 and didn't find any issue.
>>> Thanks a lot, really!
>>>
>>> 1. What's the size of your ocfs2 disk? My disk is 200G.
>> 212G
>>
>>> 2. Did you run discontig block group test with multiple nodes? with this
>>> option:
>> Yes, but i don't know what that option is.
>>
>>>  " -m ocfs2cts1,ocfs2cts2"
> 
> ocfs2ctsX is the host name of cluster nodes. Discontig bg testcase will
> run in local mode if without
> this option.
It had, 3 machines were used. I first thought ocfs2cts1,ocfs2cts2 is the
option.

Thanks,
Junxiao.
> 
> Thanks
> Eric
> 
>>>
>>> 3. Then, I am using fs/dlm. That's a different point.
>> Yes, that deserve a look since your issue is cluster locking hung.
>>
>> Thanks,
>> Junxiao.
>>> Thanks,
>>> Eric
>>>
 Thanks,
 Junxiao.

> Thanks,
> Eric
> On 10/12/2016 09:23 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>>> Hi Eric,
>>>
>>> On 10/11/2016 10:42 AM, Eric Ren wrote:
 Hi Junxiao,

 As the subject, the testing hung there on a kernel without your
 patches:

 "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock
 hang"
 and
 "ocfs2: fix posix_acl_create deadlock"

 The stack trace is:
 ```
 ocfs2cts1:~ # pstree -pl 24133
 discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)




 ocfs2cts1:~ # pgrep -a chmod
 15232 /bin/chmod -R 777 /mnt/ocfs2

 ocfs2cts1:~ # cat /proc/15232/stack
 [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620
 [ocfs2]
 [] ocfs2_inode_lock_full_nested+0x12d/0x840
 [ocfs2]
 [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
 [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
 [] iterate_dir+0x9c/0x110
 [] SyS_getdents+0x83/0xf0
 [] entry_SYSCALL_64_fastpath+0x12/0x6d
 [] 0x
 ```

 Do you think this issue can be fixed by your patches?
>>> Looks not. Those two patches are to fix recursive locking deadlock.
>>> But
>>> from above call trace, there is no recursive lock.
>> Sorry, the call trace on another node was missing.  Here it is:
>>
>> ocfs2cts2:~ # pstree -lp
>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>
>>
>>
>>
>> ocfs2cts2:~ # cat /proc/4865/stack
>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>> [] generic_permission+0x166/0x1c0
>> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
>> [] __inode_permission+0x56/0xb0
>> [] link_path_walk+0x29a/0x560
>> [] path_lookupat+0x7f/0x110
>> [] filename_lookup+0x9c/0x150
>> [] SyS_fchmodat+0x33/0x90
>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>> [] 0x
>>
>> Thanks,
>> Eric
>>
>>
>>> Thanks,
>>> Junxiao.
 I will try your patches later, but I am little worried the
 possibility
 of reproduction may not be 100%.
 So ask you to confirm;-)

 Eric
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren
Hi,

On 10/12/2016 05:45 PM, Junxiao Bi wrote:
> On 10/12/2016 05:34 PM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
>>> On 10/12/2016 10:36 AM, Eric Ren wrote:
 Hi,

 When backporting those patches, I find that they are already in our
 product kernel, maybe
 via "stable kernel" policy, although our product kernel is 4.4 while the
 patches were merged
 into 4.6.

 Seems it's another deadlock that happens when doing `chmod -R 777
 /mnt/ocfs2`
 among mutilple nodes at the same time.
>>> Yes, but i just finish running ocfs2 full test on linux next-20161006
>>> and didn't find any issue.
>> Thanks a lot, really!
>>
>> 1. What's the size of your ocfs2 disk? My disk is 200G.
> 212G
>
>> 2. Did you run discontig block group test with multiple nodes? with this
>> option:
> Yes, but i don't know what that option is.
>
>>  " -m ocfs2cts1,ocfs2cts2"

ocfs2ctsX is the host name of cluster nodes. Discontig bg testcase will run in 
local mode if 
without
this option.

Thanks
Eric

>>
>> 3. Then, I am using fs/dlm. That's a different point.
> Yes, that deserve a look since your issue is cluster locking hung.
>
> Thanks,
> Junxiao.
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
 Thanks,
 Eric
 On 10/12/2016 09:23 AM, Eric Ren wrote:
> Hi Junxiao,
>
>> Hi Eric,
>>
>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> As the subject, the testing hung there on a kernel without your
>>> patches:
>>>
>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock
>>> hang"
>>> and
>>> "ocfs2: fix posix_acl_create deadlock"
>>>
>>> The stack trace is:
>>> ```
>>> ocfs2cts1:~ # pstree -pl 24133
>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>
>>>
>>>
>>> ocfs2cts1:~ # pgrep -a chmod
>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>
>>> ocfs2cts1:~ # cat /proc/15232/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>> [] iterate_dir+0x9c/0x110
>>> [] SyS_getdents+0x83/0xf0
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>> ```
>>>
>>> Do you think this issue can be fixed by your patches?
>> Looks not. Those two patches are to fix recursive locking deadlock.
>> But
>> from above call trace, there is no recursive lock.
> Sorry, the call trace on another node was missing.  Here it is:
>
> ocfs2cts2:~ # pstree -lp
> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>
>
>
> ocfs2cts2:~ # cat /proc/4865/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
> [] generic_permission+0x166/0x1c0
> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
> [] __inode_permission+0x56/0xb0
> [] link_path_walk+0x29a/0x560
> [] path_lookupat+0x7f/0x110
> [] filename_lookup+0x9c/0x150
> [] SyS_fchmodat+0x33/0x90
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
>
> Thanks,
> Eric
>
>
>> Thanks,
>> Junxiao.
>>> I will try your patches later, but I am little worried the
>>> possibility
>>> of reproduction may not be 100%.
>>> So ask you to confirm;-)
>>>
>>> Eric
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Junxiao Bi
On 10/12/2016 05:34 PM, Eric Ren wrote:
> Hi Junxiao,
> 
> On 10/12/2016 02:47 PM, Junxiao Bi wrote:
>> On 10/12/2016 10:36 AM, Eric Ren wrote:
>>> Hi,
>>>
>>> When backporting those patches, I find that they are already in our
>>> product kernel, maybe
>>> via "stable kernel" policy, although our product kernel is 4.4 while the
>>> patches were merged
>>> into 4.6.
>>>
>>> Seems it's another deadlock that happens when doing `chmod -R 777
>>> /mnt/ocfs2`
>>> among mutilple nodes at the same time.
>> Yes, but i just finish running ocfs2 full test on linux next-20161006
>> and didn't find any issue.
> 
> Thanks a lot, really!
> 
> 1. What's the size of your ocfs2 disk? My disk is 200G.
212G

> 
> 2. Did you run discontig block group test with multiple nodes? with this
> option:
Yes, but i don't know what that option is.

> 
> " -m ocfs2cts1,ocfs2cts2"
> 
> 3. Then, I am using fs/dlm. That's a different point.
Yes, that deserve a look since your issue is cluster locking hung.

Thanks,
Junxiao.
> 
> Thanks,
> Eric
> 
>>
>> Thanks,
>> Junxiao.
>>
>>> Thanks,
>>> Eric
>>> On 10/12/2016 09:23 AM, Eric Ren wrote:
 Hi Junxiao,

> Hi Eric,
>
> On 10/11/2016 10:42 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> As the subject, the testing hung there on a kernel without your
>> patches:
>>
>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock
>> hang"
>> and
>> "ocfs2: fix posix_acl_create deadlock"
>>
>> The stack trace is:
>> ```
>> ocfs2cts1:~ # pstree -pl 24133
>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>
>>
>>
>> ocfs2cts1:~ # pgrep -a chmod
>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>
>> ocfs2cts1:~ # cat /proc/15232/stack
>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>> [] iterate_dir+0x9c/0x110
>> [] SyS_getdents+0x83/0xf0
>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>> [] 0x
>> ```
>>
>> Do you think this issue can be fixed by your patches?
> Looks not. Those two patches are to fix recursive locking deadlock.
> But
> from above call trace, there is no recursive lock.
 Sorry, the call trace on another node was missing.  Here it is:

 ocfs2cts2:~ # pstree -lp
 sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)



 ocfs2cts2:~ # cat /proc/4865/stack
 [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
 [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
 [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
 [] generic_permission+0x166/0x1c0
 [] ocfs2_permission+0xaa/0xd0 [ocfs2]
 [] __inode_permission+0x56/0xb0
 [] link_path_walk+0x29a/0x560
 [] path_lookupat+0x7f/0x110
 [] filename_lookup+0x9c/0x150
 [] SyS_fchmodat+0x33/0x90
 [] entry_SYSCALL_64_fastpath+0x12/0x6d
 [] 0x

 Thanks,
 Eric


> Thanks,
> Junxiao.
>> I will try your patches later, but I am little worried the
>> possibility
>> of reproduction may not be 100%.
>> So ask you to confirm;-)
>>
>> Eric
 ___
 Ocfs2-devel mailing list
 Ocfs2-devel@oss.oracle.com
 https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Eric Ren
Hi Junxiao,

On 10/12/2016 02:47 PM, Junxiao Bi wrote:
> On 10/12/2016 10:36 AM, Eric Ren wrote:
>> Hi,
>>
>> When backporting those patches, I find that they are already in our
>> product kernel, maybe
>> via "stable kernel" policy, although our product kernel is 4.4 while the
>> patches were merged
>> into 4.6.
>>
>> Seems it's another deadlock that happens when doing `chmod -R 777
>> /mnt/ocfs2`
>> among mutilple nodes at the same time.
> Yes, but i just finish running ocfs2 full test on linux next-20161006
> and didn't find any issue.

Thanks a lot, really!

1. What's the size of your ocfs2 disk? My disk is 200G.

2. Did you run discontig block group test with multiple nodes? with this option:

 " -m ocfs2cts1,ocfs2cts2"

3. Then, I am using fs/dlm. That's a different point.

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>> Thanks,
>> Eric
>> On 10/12/2016 09:23 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
 Hi Eric,

 On 10/11/2016 10:42 AM, Eric Ren wrote:
> Hi Junxiao,
>
> As the subject, the testing hung there on a kernel without your
> patches:
>
> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
> and
> "ocfs2: fix posix_acl_create deadlock"
>
> The stack trace is:
> ```
> ocfs2cts1:~ # pstree -pl 24133
> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>
>
> ocfs2cts1:~ # pgrep -a chmod
> 15232 /bin/chmod -R 777 /mnt/ocfs2
>
> ocfs2cts1:~ # cat /proc/15232/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
> [] iterate_dir+0x9c/0x110
> [] SyS_getdents+0x83/0xf0
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
> ```
>
> Do you think this issue can be fixed by your patches?
 Looks not. Those two patches are to fix recursive locking deadlock. But
 from above call trace, there is no recursive lock.
>>> Sorry, the call trace on another node was missing.  Here it is:
>>>
>>> ocfs2cts2:~ # pstree -lp
>>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>>
>>>
>>> ocfs2cts2:~ # cat /proc/4865/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>>> [] generic_permission+0x166/0x1c0
>>> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
>>> [] __inode_permission+0x56/0xb0
>>> [] link_path_walk+0x29a/0x560
>>> [] path_lookupat+0x7f/0x110
>>> [] filename_lookup+0x9c/0x150
>>> [] SyS_fchmodat+0x33/0x90
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>>
>>> Thanks,
>>> Eric
>>>
>>>
 Thanks,
 Junxiao.
> I will try your patches later, but I am little worried the possibility
> of reproduction may not be 100%.
> So ask you to confirm;-)
>
> Eric
>>> ___
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-12 Thread Junxiao Bi
On 10/12/2016 10:36 AM, Eric Ren wrote:
> Hi,
> 
> When backporting those patches, I find that they are already in our
> product kernel, maybe
> via "stable kernel" policy, although our product kernel is 4.4 while the
> patches were merged
> into 4.6.
> 
> Seems it's another deadlock that happens when doing `chmod -R 777
> /mnt/ocfs2`
> among mutilple nodes at the same time.
Yes, but i just finish running ocfs2 full test on linux next-20161006
and didn't find any issue.

Thanks,
Junxiao.

> 
> Thanks,
> Eric
> On 10/12/2016 09:23 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>>> Hi Eric,
>>>
>>> On 10/11/2016 10:42 AM, Eric Ren wrote:
 Hi Junxiao,

 As the subject, the testing hung there on a kernel without your
 patches:

 "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
 and
 "ocfs2: fix posix_acl_create deadlock"

 The stack trace is:
 ```
 ocfs2cts1:~ # pstree -pl 24133
 discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)


 ocfs2cts1:~ # pgrep -a chmod
 15232 /bin/chmod -R 777 /mnt/ocfs2

 ocfs2cts1:~ # cat /proc/15232/stack
 [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
 [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
 [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
 [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
 [] iterate_dir+0x9c/0x110
 [] SyS_getdents+0x83/0xf0
 [] entry_SYSCALL_64_fastpath+0x12/0x6d
 [] 0x
 ```

 Do you think this issue can be fixed by your patches?
>>> Looks not. Those two patches are to fix recursive locking deadlock. But
>>> from above call trace, there is no recursive lock.
>> Sorry, the call trace on another node was missing.  Here it is:
>>
>> ocfs2cts2:~ # pstree -lp
>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>
>>
>> ocfs2cts2:~ # cat /proc/4865/stack
>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>> [] generic_permission+0x166/0x1c0
>> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
>> [] __inode_permission+0x56/0xb0
>> [] link_path_walk+0x29a/0x560
>> [] path_lookupat+0x7f/0x110
>> [] filename_lookup+0x9c/0x150
>> [] SyS_fchmodat+0x33/0x90
>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>> [] 0x
>>
>> Thanks,
>> Eric
>>
>>
>>> Thanks,
>>> Junxiao.
 I will try your patches later, but I am little worried the possibility
 of reproduction may not be 100%.
 So ask you to confirm;-)

 Eric
>>
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-11 Thread Eric Ren
Hi,

When backporting those patches, I find that they are already in our product 
kernel, maybe
via "stable kernel" policy, although our product kernel is 4.4 while the 
patches were merged
into 4.6.

Seems it's another deadlock that happens when doing `chmod -R 777 /mnt/ocfs2`
among mutilple nodes at the same time.

Thanks,
Eric
On 10/12/2016 09:23 AM, Eric Ren wrote:
> Hi Junxiao,
>
>> Hi Eric,
>>
>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>> Hi Junxiao,
>>>
>>> As the subject, the testing hung there on a kernel without your patches:
>>>
>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>>> and
>>> "ocfs2: fix posix_acl_create deadlock"
>>>
>>> The stack trace is:
>>> ```
>>> ocfs2cts1:~ # pstree -pl 24133
>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>
>>> ocfs2cts1:~ # pgrep -a chmod
>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>
>>> ocfs2cts1:~ # cat /proc/15232/stack
>>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>> [] iterate_dir+0x9c/0x110
>>> [] SyS_getdents+0x83/0xf0
>>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>>> [] 0x
>>> ```
>>>
>>> Do you think this issue can be fixed by your patches?
>> Looks not. Those two patches are to fix recursive locking deadlock. But
>> from above call trace, there is no recursive lock.
> Sorry, the call trace on another node was missing.  Here it is:
>
> ocfs2cts2:~ # pstree -lp
> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>
> ocfs2cts2:~ # cat /proc/4865/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
> [] generic_permission+0x166/0x1c0
> [] ocfs2_permission+0xaa/0xd0 [ocfs2]
> [] __inode_permission+0x56/0xb0
> [] link_path_walk+0x29a/0x560
> [] path_lookupat+0x7f/0x110
> [] filename_lookup+0x9c/0x150
> [] SyS_fchmodat+0x33/0x90
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
>
> Thanks,
> Eric
>
>
>> Thanks,
>> Junxiao.
>>> I will try your patches later, but I am little worried the possibility
>>> of reproduction may not be 100%.
>>> So ask you to confirm;-)
>>>
>>> Eric
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-11 Thread Eric Ren
Hi Junxiao,

> Hi Eric,
>
> On 10/11/2016 10:42 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> As the subject, the testing hung there on a kernel without your patches:
>>
>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>> and
>> "ocfs2: fix posix_acl_create deadlock"
>>
>> The stack trace is:
>> ```
>> ocfs2cts1:~ # pstree -pl 24133
>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>
>> ocfs2cts1:~ # pgrep -a chmod
>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>
>> ocfs2cts1:~ # cat /proc/15232/stack
>> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>> [] iterate_dir+0x9c/0x110
>> [] SyS_getdents+0x83/0xf0
>> [] entry_SYSCALL_64_fastpath+0x12/0x6d
>> [] 0x
>> ```
>>
>> Do you think this issue can be fixed by your patches?
> Looks not. Those two patches are to fix recursive locking deadlock. But
> from above call trace, there is no recursive lock.
Sorry, the call trace on another node was missing.  Here it is:

ocfs2cts2:~ # pstree -lp
sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)

ocfs2cts2:~ # cat /proc/4865/stack
[] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
[] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
[] generic_permission+0x166/0x1c0
[] ocfs2_permission+0xaa/0xd0 [ocfs2]
[] __inode_permission+0x56/0xb0
[] link_path_walk+0x29a/0x560
[] path_lookupat+0x7f/0x110
[] filename_lookup+0x9c/0x150
[] SyS_fchmodat+0x33/0x90
[] entry_SYSCALL_64_fastpath+0x12/0x6d
[] 0x

Thanks,
Eric


>
> Thanks,
> Junxiao.
>> I will try your patches later, but I am little worried the possibility
>> of reproduction may not be 100%.
>> So ask you to confirm;-)
>>
>> Eric
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren
Hi Junxiao,

On 10/11/2016 10:58 AM, Junxiao Bi wrote:
>> Do you think this issue can be fixed by your patches?
> Looks not. Those two patches are to fix recursive locking deadlock. But
> from above call trace, there is no recursive lock.
OK, thanks a lot!

Eric
>
> Thanks,
> Junxiao.
>> I will try your patches later, but I am little worried the possibility
>> of reproduction may not be 100%.
>> So ask you to confirm;-)
>>
>> Eric
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Junxiao Bi
Hi Eric,

On 10/11/2016 10:42 AM, Eric Ren wrote:
> Hi Junxiao,
> 
> As the subject, the testing hung there on a kernel without your patches:
> 
> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
> and
> "ocfs2: fix posix_acl_create deadlock"
> 
> The stack trace is:
> ```
> ocfs2cts1:~ # pstree -pl 24133
> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
> 
> ocfs2cts1:~ # pgrep -a chmod
> 15232 /bin/chmod -R 777 /mnt/ocfs2
> 
> ocfs2cts1:~ # cat /proc/15232/stack
> [] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
> [] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
> [] ocfs2_readdir+0x41/0x1b0 [ocfs2]
> [] iterate_dir+0x9c/0x110
> [] SyS_getdents+0x83/0xf0
> [] entry_SYSCALL_64_fastpath+0x12/0x6d
> [] 0x
> ```
> 
> Do you think this issue can be fixed by your patches?
Looks not. Those two patches are to fix recursive locking deadlock. But
from above call trace, there is no recursive lock.

Thanks,
Junxiao.
> 
> I will try your patches later, but I am little worried the possibility
> of reproduction may not be 100%.
> So ask you to confirm;-)
> 
> Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

2016-10-10 Thread Eric Ren

Hi Junxiao,

As the subject, the testing hung there on a kernel without your patches:

"ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
and
"ocfs2: fix posix_acl_create deadlock"

The stack trace is:
```
ocfs2cts1:~ # pstree -pl 24133
discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)

ocfs2cts1:~ # pgrep -a chmod
15232 /bin/chmod -R 777 /mnt/ocfs2

ocfs2cts1:~ # cat /proc/15232/stack
[] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
[] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
[] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
[] ocfs2_readdir+0x41/0x1b0 [ocfs2]
[] iterate_dir+0x9c/0x110
[] SyS_getdents+0x83/0xf0
[] entry_SYSCALL_64_fastpath+0x12/0x6d
[] 0x
```

Do you think this issue can be fixed by your patches?

I will try your patches later, but I am little worried the possibility of reproduction may 
not be 100%.

So ask you to confirm;-)

Eric
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel