On 10/12/2016 06:54 PM, Eric Ren wrote: > Hi, > > On 10/12/2016 05:45 PM, Junxiao Bi wrote: >> On 10/12/2016 05:34 PM, Eric Ren wrote: >>> Hi Junxiao, >>> >>> On 10/12/2016 02:47 PM, Junxiao Bi wrote: >>>> On 10/12/2016 10:36 AM, Eric Ren wrote: >>>>> Hi, >>>>> >>>>> When backporting those patches, I find that they are already in our >>>>> product kernel, maybe >>>>> via "stable kernel" policy, although our product kernel is 4.4 >>>>> while the >>>>> patches were merged >>>>> into 4.6. >>>>> >>>>> Seems it's another deadlock that happens when doing `chmod -R 777 >>>>> /mnt/ocfs2` >>>>> among mutilple nodes at the same time. >>>> Yes, but i just finish running ocfs2 full test on linux next-20161006 >>>> and didn't find any issue. >>> Thanks a lot, really! >>> >>> 1. What's the size of your ocfs2 disk? My disk is 200G. >> 212G >> >>> 2. Did you run discontig block group test with multiple nodes? with this >>> option: >> Yes, but i don't know what that option is. >> >>> " -m ocfs2cts1,ocfs2cts2" > > ocfs2ctsX is the host name of cluster nodes. Discontig bg testcase will > run in local mode if without > this option. It had, 3 machines were used. I first thought ocfs2cts1,ocfs2cts2 is the option.
Thanks, Junxiao. > > Thanks > Eric > >>> >>> 3. Then, I am using fs/dlm. That's a different point. >> Yes, that deserve a look since your issue is cluster locking hung. >> >> Thanks, >> Junxiao. >>> Thanks, >>> Eric >>> >>>> Thanks, >>>> Junxiao. >>>> >>>>> Thanks, >>>>> Eric >>>>> On 10/12/2016 09:23 AM, Eric Ren wrote: >>>>>> Hi Junxiao, >>>>>> >>>>>>> Hi Eric, >>>>>>> >>>>>>> On 10/11/2016 10:42 AM, Eric Ren wrote: >>>>>>>> Hi Junxiao, >>>>>>>> >>>>>>>> As the subject, the testing hung there on a kernel without your >>>>>>>> patches: >>>>>>>> >>>>>>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock >>>>>>>> hang" >>>>>>>> and >>>>>>>> "ocfs2: fix posix_acl_create deadlock" >>>>>>>> >>>>>>>> The stack trace is: >>>>>>>> ``` >>>>>>>> ocfs2cts1:~ # pstree -pl 24133 >>>>>>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ocfs2cts1:~ # pgrep -a chmod >>>>>>>> 15232 /bin/chmod -R 777 /mnt/ocfs2 >>>>>>>> >>>>>>>> ocfs2cts1:~ # cat /proc/15232/stack >>>>>>>> [<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 >>>>>>>> [ocfs2] >>>>>>>> [<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 >>>>>>>> [ocfs2] >>>>>>>> [<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2] >>>>>>>> [<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2] >>>>>>>> [<ffffffff8120d03c>] iterate_dir+0x9c/0x110 >>>>>>>> [<ffffffff8120d453>] SyS_getdents+0x83/0xf0 >>>>>>>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>>>>>> ``` >>>>>>>> >>>>>>>> Do you think this issue can be fixed by your patches? >>>>>>> Looks not. Those two patches are to fix recursive locking deadlock. >>>>>>> But >>>>>>> from above call trace, there is no recursive lock. >>>>>> Sorry, the call trace on another node was missing. Here it is: >>>>>> >>>>>> ocfs2cts2:~ # pstree -lp >>>>>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ocfs2cts2:~ # cat /proc/4865/stack >>>>>> [<ffffffffa053e7ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2] >>>>>> [<ffffffffa053f56d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2] >>>>>> [<ffffffffa059c860>] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2] >>>>>> [<ffffffff812044e6>] generic_permission+0x166/0x1c0 >>>>>> [<ffffffffa0542aca>] ocfs2_permission+0xaa/0xd0 [ocfs2] >>>>>> [<ffffffff81204596>] __inode_permission+0x56/0xb0 >>>>>> [<ffffffff812068fa>] link_path_walk+0x29a/0x560 >>>>>> [<ffffffff81206cbf>] path_lookupat+0x7f/0x110 >>>>>> [<ffffffff8120929c>] filename_lookup+0x9c/0x150 >>>>>> [<ffffffff811f96c3>] SyS_fchmodat+0x33/0x90 >>>>>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >>>>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>>>> >>>>>> Thanks, >>>>>> Eric >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Junxiao. >>>>>>>> I will try your patches later, but I am little worried the >>>>>>>> possibility >>>>>>>> of reproduction may not be 100%. >>>>>>>> So ask you to confirm;-) >>>>>>>> >>>>>>>> Eric >>>>>> _______________________________________________ >>>>>> Ocfs2-devel mailing list >>>>>> Ocfs2-devel@oss.oracle.com >>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel