Re: Linux-next parallel cp workload hang
On Thu, May 19, 2016 at 09:02:31AM +1000, Dave Chinner wrote: > On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote: > > Patch below should fix the deadlock. > > The test has been running for several hours without failure using > this patch, so I'd say this fixes the problem... Yes, the same for me. Thanks. Xiong > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Thu, May 19, 2016 at 09:02:31AM +1000, Dave Chinner wrote: > On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote: > > Patch below should fix the deadlock. > > The test has been running for several hours without failure using > this patch, so I'd say this fixes the problem... Yes, the same for me. Thanks. Xiong > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote: > Patch below should fix the deadlock. The test has been running for several hours without failure using this patch, so I'd say this fixes the problem... Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote: > Patch below should fix the deadlock. The test has been running for several hours without failure using this patch, so I'd say this fixes the problem... Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote: > > On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote: > > > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > > > > > Hi, > > > > > > > > > > Parallel cp workload (xfstests generic/273) hangs like blow. > > > > > It's reproducible with a small chance, less the 1/100 i think. > > > > > > > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > > > > > xfs with loop or block device. Ext4 survived several rounds > > > > > of testing. > > > > > > > > > > Linux next 20160510 tree hangs within 500 rounds testing several > > > > > times. The same tree with vfs parallel lookup patchset reverted > > > > > survived 900 rounds testing. Reverted commits are attached. > Ok, this is trivial to reproduce. Al - I've hit this 9 times out of ten running it on a 4p VM with a pair of 4GB ram disks using all the current upstream default mkfs and mount configurations. On the tenth attempt I got the tracing to capture what I needed to see - process 7340 was the last xfs_buf_lock_done trace without an unlock trace, and that process had this trace: schedule rwsem_down_read_failed call_rwsem_down_read_failed down_read xfs_ilock xfs_ilock_data_map_shared xfs_dir2_leaf_getdents xfs_readdir xfs_file_readdir iterate_dir SyS_getdents entry_SYSCALL_64_fastpath Which means it's holding a buffer lock while trying to get the ilock(shared). That's never going to end well - I'm now wondering why lockdep hasn't been all over this lock order inversion Essentially, it's a three-process deadlock involving shared/exclusive barriers and inverted lock orders. It's a pre-existing problem with buffer mapping lock orders, nothing to do with with the VFS parallelisation code. process 1 process 2 process 3 - - - readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock and read buffer iunlock(shared) process entries in buffer . readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock buffer finish ->iterate_shared file_accessed() ->update_time start transaction ilock(excl) . finishes processing buffer get next buffer ilock(shared) And that's the deadlock. Now I know what the problem is I can say that process 2 - the transactional timestamp update - is the reason the readdir operations are blocking like this. And I know why CXFS never hit this - it doesn't use the VFS paths, so the VFS calls to update timestamps don't exist during concurrent readdir operations on the CXFS metadata server. Hence process 2 doesn't exist and no exclusive barriers are put in amongst the shared locking Patch below should fix the deadlock. Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: concurrent readdir hangs on data buffer locks From: Dave ChinnerThere's a three-process deadlock involving shared/exclusive barriers and inverted lock orders in the directory readdir implementation. It's a pre-existing problem with lock ordering, exposed by the VFS parallelisation code. process 1 process 2 process 3 - - - readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock and read buffer iunlock(shared) process entries in buffer . readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock buffer finish ->iterate_shared file_accessed() ->update_time start transaction ilock(excl) . finishes processing buffer get next buffer ilock(shared) And
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote: > > On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote: > > > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > > > > > Hi, > > > > > > > > > > Parallel cp workload (xfstests generic/273) hangs like blow. > > > > > It's reproducible with a small chance, less the 1/100 i think. > > > > > > > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > > > > > xfs with loop or block device. Ext4 survived several rounds > > > > > of testing. > > > > > > > > > > Linux next 20160510 tree hangs within 500 rounds testing several > > > > > times. The same tree with vfs parallel lookup patchset reverted > > > > > survived 900 rounds testing. Reverted commits are attached. > Ok, this is trivial to reproduce. Al - I've hit this 9 times out of ten running it on a 4p VM with a pair of 4GB ram disks using all the current upstream default mkfs and mount configurations. On the tenth attempt I got the tracing to capture what I needed to see - process 7340 was the last xfs_buf_lock_done trace without an unlock trace, and that process had this trace: schedule rwsem_down_read_failed call_rwsem_down_read_failed down_read xfs_ilock xfs_ilock_data_map_shared xfs_dir2_leaf_getdents xfs_readdir xfs_file_readdir iterate_dir SyS_getdents entry_SYSCALL_64_fastpath Which means it's holding a buffer lock while trying to get the ilock(shared). That's never going to end well - I'm now wondering why lockdep hasn't been all over this lock order inversion Essentially, it's a three-process deadlock involving shared/exclusive barriers and inverted lock orders. It's a pre-existing problem with buffer mapping lock orders, nothing to do with with the VFS parallelisation code. process 1 process 2 process 3 - - - readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock and read buffer iunlock(shared) process entries in buffer . readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock buffer finish ->iterate_shared file_accessed() ->update_time start transaction ilock(excl) . finishes processing buffer get next buffer ilock(shared) And that's the deadlock. Now I know what the problem is I can say that process 2 - the transactional timestamp update - is the reason the readdir operations are blocking like this. And I know why CXFS never hit this - it doesn't use the VFS paths, so the VFS calls to update timestamps don't exist during concurrent readdir operations on the CXFS metadata server. Hence process 2 doesn't exist and no exclusive barriers are put in amongst the shared locking Patch below should fix the deadlock. Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: concurrent readdir hangs on data buffer locks From: Dave Chinner There's a three-process deadlock involving shared/exclusive barriers and inverted lock orders in the directory readdir implementation. It's a pre-existing problem with lock ordering, exposed by the VFS parallelisation code. process 1 process 2 process 3 - - - readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock and read buffer iunlock(shared) process entries in buffer . readdir iolock(shared) get_leaf_dents iterate entries ilock(shared) map, lock buffer finish ->iterate_shared file_accessed() ->update_time start transaction ilock(excl) . finishes processing buffer get next buffer ilock(shared) And that's the deadlock.
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > Hi, > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > Parallel cp workload (xfstests generic/273) hangs like blow. > > > It's reproducible with a small chance, less the 1/100 i think. > > > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > > > xfs with loop or block device. Ext4 survived several rounds > > > of testing. > > > > > > Linux next 20160510 tree hangs within 500 rounds testing several > > > times. The same tree with vfs parallel lookup patchset reverted > > > survived 900 rounds testing. Reverted commits are attached. > > > What hardware? > > A HP prototype host. description? cpus, memory, etc? I want to have some idea of what hardware I need to reproduce this... xfs_info from the scratch filesystem would also be handy. > > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and > > it doesn't trigger any warnings or asserts, can you then try to > > reproduce it while tracing the following events: > > > > xfs_buf_lock > > xfs_buf_lock_done > > xfs_buf_trylock > > xfs_buf_unlock > > > > So we might be able to see if there's an unexpected buffer > > locking/state pattern occurring when the hang occurs? > > Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers > on. There are some trace output after hang for a while. I'm not actually interested in the trace after the hang - I'm interested in what happened leading up to the hang. The output you've given me tell me that the directory block at offset is locked but nothing in the trace tells me what locked it. Can I suggest using trace-cmd to record the events, then when the test hangs kill the check process so that trace-cmd terminates and gathers the events. Then dump the report to a text file and attach that? Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > Hi, > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > Parallel cp workload (xfstests generic/273) hangs like blow. > > > It's reproducible with a small chance, less the 1/100 i think. > > > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > > > xfs with loop or block device. Ext4 survived several rounds > > > of testing. > > > > > > Linux next 20160510 tree hangs within 500 rounds testing several > > > times. The same tree with vfs parallel lookup patchset reverted > > > survived 900 rounds testing. Reverted commits are attached. > > > What hardware? > > A HP prototype host. description? cpus, memory, etc? I want to have some idea of what hardware I need to reproduce this... xfs_info from the scratch filesystem would also be handy. > > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and > > it doesn't trigger any warnings or asserts, can you then try to > > reproduce it while tracing the following events: > > > > xfs_buf_lock > > xfs_buf_lock_done > > xfs_buf_trylock > > xfs_buf_unlock > > > > So we might be able to see if there's an unexpected buffer > > locking/state pattern occurring when the hang occurs? > > Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers > on. There are some trace output after hang for a while. I'm not actually interested in the trace after the hang - I'm interested in what happened leading up to the hang. The output you've given me tell me that the directory block at offset is locked but nothing in the trace tells me what locked it. Can I suggest using trace-cmd to record the events, then when the test hangs kill the check process so that trace-cmd terminates and gathers the events. Then dump the report to a text file and attach that? Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > Hi, > > Parallel cp workload (xfstests generic/273) hangs like blow. > It's reproducible with a small chance, less the 1/100 i think. > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > xfs with loop or block device. Ext4 survived several rounds > of testing. > > Linux next 20160510 tree hangs within 500 rounds testing several > times. The same tree with vfs parallel lookup patchset reverted > survived 900 rounds testing. Reverted commits are attached. What hardware? > Bisecting in this patchset ided this commit: > > 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit > commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f > Author: Al Viro> Date: Wed Apr 20 23:42:46 2016 -0400 > > simple local filesystems: switch to ->iterate_shared() > > no changes needed (XFS isn't simple, but it has the same parallelism > in the interesting parts exercised from CXFS). > > With this commit reverted on top of Linux next 0510 tree, 5000+ rounds > of testing passed. > > Although 2000 rounds testing had been conducted before good/bad > verdict, i'm not 100 percent sure about all this, since it's > so hard to hit, and i am not that lucky.. > > Bisect log and full blocked state process dump log are also attached. > > Furthermore, this was first hit when testing fs dax on nvdimm, > however it's reproducible without dax mount option, and also > reproducible on loop device, just seems harder to hit. > > Thanks, > Xiong > > [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. > [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 > [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [0.841801] cp D 880b4e977928 0 49033 49014 > 0x0080 > [0.868923] 880b4e977928 880ba275d380 880b8d712b80 > 880b4e978000 > [0.897504] 7fff 0002 > 880b8d712b80 > [0.925234] 880b4e977940 816cbc25 88035a1dabb0 > 880b4e9779e8 > [0.953237] Call Trace: > [0.958314] [] schedule+0x35/0x80 > [0.974854] [] schedule_timeout+0x231/0x2d0 > [0.995728] [] ? down_trylock+0x2d/0x40 > [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] > [1.040182] [] __down_common+0xaa/0x104 > [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] > [1.081357] [] __down+0x1d/0x1f > [1.097166] [] down+0x41/0x50 > [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] > [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] > [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] So what's holding that directory data buffer lock? It should only be held if there is either IO in progress, or a modification of the buffer in progress that is blocked somewhere else. > [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] > [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] > [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] > [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] > [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 > [xfs] > [1.305094] [] ? down_read+0x12/0x30 > [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] > [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] > [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] So we should be holding the ilock in shared mode here... > [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] > [1.416874] [] iterate_dir+0x173/0x190 > [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 > [1.460951] [] SyS_getdents+0x98/0x120 > [1.480566] [] ? iterate_dir+0x190/0x190 > [1.500909] [] do_syscall_64+0x62/0x110 > [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 > [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. > [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 > [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [1.618544] cp D 880b91463b00 0 49040 49016 > 0x0080 > [1.645502] 880b91463b00 880464d5c140 88029b475700 > 880b91464000 > [1.674145] 880411c42610 880411c42628 > 8802c10bc610 > [1.702834] 880b91463b18 816cbc25 88029b475700 > 880b91463b88 > [1.731501] Call Trace: > [1.736866] [] schedule+0x35/0x80 > [1.754119] [] rwsem_down_read_failed+0xf2/0x140 > [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 > [xfs] > [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 > [1.830482] [] down_read+0x20/0x30 > [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] > [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 And it this is an attempt to lock the inode shared, so if that is failing while there's another shared holder, than means there's an exclusive waiter queued up (i.e. read iheld -> write blocked -> read blocked). So looking at dump-g273xfs0510: [ 845.727907] INFO: task cp:40126 blocked for more than 120 seconds. [ 845.751175] Not tainted 4.6.0-rc7-next-20160510 #9 [ 845.770011] "echo 0 >
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > Hi, > > Parallel cp workload (xfstests generic/273) hangs like blow. > It's reproducible with a small chance, less the 1/100 i think. > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > xfs with loop or block device. Ext4 survived several rounds > of testing. > > Linux next 20160510 tree hangs within 500 rounds testing several > times. The same tree with vfs parallel lookup patchset reverted > survived 900 rounds testing. Reverted commits are attached. What hardware? > Bisecting in this patchset ided this commit: > > 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit > commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f > Author: Al Viro > Date: Wed Apr 20 23:42:46 2016 -0400 > > simple local filesystems: switch to ->iterate_shared() > > no changes needed (XFS isn't simple, but it has the same parallelism > in the interesting parts exercised from CXFS). > > With this commit reverted on top of Linux next 0510 tree, 5000+ rounds > of testing passed. > > Although 2000 rounds testing had been conducted before good/bad > verdict, i'm not 100 percent sure about all this, since it's > so hard to hit, and i am not that lucky.. > > Bisect log and full blocked state process dump log are also attached. > > Furthermore, this was first hit when testing fs dax on nvdimm, > however it's reproducible without dax mount option, and also > reproducible on loop device, just seems harder to hit. > > Thanks, > Xiong > > [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. > [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 > [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [0.841801] cp D 880b4e977928 0 49033 49014 > 0x0080 > [0.868923] 880b4e977928 880ba275d380 880b8d712b80 > 880b4e978000 > [0.897504] 7fff 0002 > 880b8d712b80 > [0.925234] 880b4e977940 816cbc25 88035a1dabb0 > 880b4e9779e8 > [0.953237] Call Trace: > [0.958314] [] schedule+0x35/0x80 > [0.974854] [] schedule_timeout+0x231/0x2d0 > [0.995728] [] ? down_trylock+0x2d/0x40 > [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] > [1.040182] [] __down_common+0xaa/0x104 > [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] > [1.081357] [] __down+0x1d/0x1f > [1.097166] [] down+0x41/0x50 > [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] > [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] > [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] So what's holding that directory data buffer lock? It should only be held if there is either IO in progress, or a modification of the buffer in progress that is blocked somewhere else. > [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] > [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] > [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] > [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] > [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 > [xfs] > [1.305094] [] ? down_read+0x12/0x30 > [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] > [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] > [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] So we should be holding the ilock in shared mode here... > [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] > [1.416874] [] iterate_dir+0x173/0x190 > [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 > [1.460951] [] SyS_getdents+0x98/0x120 > [1.480566] [] ? iterate_dir+0x190/0x190 > [1.500909] [] do_syscall_64+0x62/0x110 > [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 > [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. > [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 > [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [1.618544] cp D 880b91463b00 0 49040 49016 > 0x0080 > [1.645502] 880b91463b00 880464d5c140 88029b475700 > 880b91464000 > [1.674145] 880411c42610 880411c42628 > 8802c10bc610 > [1.702834] 880b91463b18 816cbc25 88029b475700 > 880b91463b88 > [1.731501] Call Trace: > [1.736866] [] schedule+0x35/0x80 > [1.754119] [] rwsem_down_read_failed+0xf2/0x140 > [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 > [xfs] > [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 > [1.830482] [] down_read+0x20/0x30 > [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] > [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 And it this is an attempt to lock the inode shared, so if that is failing while there's another shared holder, than means there's an exclusive waiter queued up (i.e. read iheld -> write blocked -> read blocked). So looking at dump-g273xfs0510: [ 845.727907] INFO: task cp:40126 blocked for more than 120 seconds. [ 845.751175] Not tainted 4.6.0-rc7-next-20160510 #9 [ 845.770011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
Linux-next parallel cp workload hang
Hi, Parallel cp workload (xfstests generic/273) hangs like blow. It's reproducible with a small chance, less the 1/100 i think. Have hit this in linux-next 20160504 0506 0510 trees, testing on xfs with loop or block device. Ext4 survived several rounds of testing. Linux next 20160510 tree hangs within 500 rounds testing several times. The same tree with vfs parallel lookup patchset reverted survived 900 rounds testing. Reverted commits are attached. Bisecting in this patchset ided this commit: 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f Author: Al ViroDate: Wed Apr 20 23:42:46 2016 -0400 simple local filesystems: switch to ->iterate_shared() no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). With this commit reverted on top of Linux next 0510 tree, 5000+ rounds of testing passed. Although 2000 rounds testing had been conducted before good/bad verdict, i'm not 100 percent sure about all this, since it's so hard to hit, and i am not that lucky.. Bisect log and full blocked state process dump log are also attached. Furthermore, this was first hit when testing fs dax on nvdimm, however it's reproducible without dax mount option, and also reproducible on loop device, just seems harder to hit. Thanks, Xiong [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [0.841801] cp D 880b4e977928 0 49033 49014 0x0080 [0.868923] 880b4e977928 880ba275d380 880b8d712b80 880b4e978000 [0.897504] 7fff 0002 880b8d712b80 [0.925234] 880b4e977940 816cbc25 88035a1dabb0 880b4e9779e8 [0.953237] Call Trace: [0.958314] [] schedule+0x35/0x80 [0.974854] [] schedule_timeout+0x231/0x2d0 [0.995728] [] ? down_trylock+0x2d/0x40 [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] [1.040182] [] __down_common+0xaa/0x104 [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] [1.081357] [] __down+0x1d/0x1f [1.097166] [] down+0x41/0x50 [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs] [1.305094] [] ? down_read+0x12/0x30 [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] [1.416874] [] iterate_dir+0x173/0x190 [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 [1.460951] [] SyS_getdents+0x98/0x120 [1.480566] [] ? iterate_dir+0x190/0x190 [1.500909] [] do_syscall_64+0x62/0x110 [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1.618544] cp D 880b91463b00 0 49040 49016 0x0080 [1.645502] 880b91463b00 880464d5c140 88029b475700 880b91464000 [1.674145] 880411c42610 880411c42628 8802c10bc610 [1.702834] 880b91463b18 816cbc25 88029b475700 880b91463b88 [1.731501] Call Trace: [1.736866] [] schedule+0x35/0x80 [1.754119] [] rwsem_down_read_failed+0xf2/0x140 [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 [1.830482] [] down_read+0x20/0x30 [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.896775] [] xfs_dir_open+0x30/0x60 [xfs] [1.917882] [] do_dentry_open+0x20f/0x320 [1.938919] [] ? xfs_file_mmap+0x50/0x50 [xfs] [1.961532] [] vfs_open+0x57/0x60 [1.978945] [] path_openat+0x325/0x14e0 [1.999273] [] ? putname+0x53/0x60 [2.017695] [] do_filp_open+0x91/0x100 [2.036893] [] ? __alloc_fd+0x46/0x180 [2.055479] [] do_sys_open+0x124/0x210 [2.073783] [] ? __audit_syscall_exit+0x1db/0x260 [2.096426] [] SyS_openat+0x14/0x20 [2.113690] [] do_syscall_64+0x62/0x110 [2.132417] [] entry_SYSCALL64_slow_path+0x25/0x25 g273-block-dumps.tar.gz Description: application/gzip
Linux-next parallel cp workload hang
Hi, Parallel cp workload (xfstests generic/273) hangs like blow. It's reproducible with a small chance, less the 1/100 i think. Have hit this in linux-next 20160504 0506 0510 trees, testing on xfs with loop or block device. Ext4 survived several rounds of testing. Linux next 20160510 tree hangs within 500 rounds testing several times. The same tree with vfs parallel lookup patchset reverted survived 900 rounds testing. Reverted commits are attached. Bisecting in this patchset ided this commit: 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f Author: Al Viro Date: Wed Apr 20 23:42:46 2016 -0400 simple local filesystems: switch to ->iterate_shared() no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). With this commit reverted on top of Linux next 0510 tree, 5000+ rounds of testing passed. Although 2000 rounds testing had been conducted before good/bad verdict, i'm not 100 percent sure about all this, since it's so hard to hit, and i am not that lucky.. Bisect log and full blocked state process dump log are also attached. Furthermore, this was first hit when testing fs dax on nvdimm, however it's reproducible without dax mount option, and also reproducible on loop device, just seems harder to hit. Thanks, Xiong [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [0.841801] cp D 880b4e977928 0 49033 49014 0x0080 [0.868923] 880b4e977928 880ba275d380 880b8d712b80 880b4e978000 [0.897504] 7fff 0002 880b8d712b80 [0.925234] 880b4e977940 816cbc25 88035a1dabb0 880b4e9779e8 [0.953237] Call Trace: [0.958314] [] schedule+0x35/0x80 [0.974854] [] schedule_timeout+0x231/0x2d0 [0.995728] [] ? down_trylock+0x2d/0x40 [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] [1.040182] [] __down_common+0xaa/0x104 [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] [1.081357] [] __down+0x1d/0x1f [1.097166] [] down+0x41/0x50 [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs] [1.305094] [] ? down_read+0x12/0x30 [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] [1.416874] [] iterate_dir+0x173/0x190 [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 [1.460951] [] SyS_getdents+0x98/0x120 [1.480566] [] ? iterate_dir+0x190/0x190 [1.500909] [] do_syscall_64+0x62/0x110 [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1.618544] cp D 880b91463b00 0 49040 49016 0x0080 [1.645502] 880b91463b00 880464d5c140 88029b475700 880b91464000 [1.674145] 880411c42610 880411c42628 8802c10bc610 [1.702834] 880b91463b18 816cbc25 88029b475700 880b91463b88 [1.731501] Call Trace: [1.736866] [] schedule+0x35/0x80 [1.754119] [] rwsem_down_read_failed+0xf2/0x140 [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 [1.830482] [] down_read+0x20/0x30 [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.896775] [] xfs_dir_open+0x30/0x60 [xfs] [1.917882] [] do_dentry_open+0x20f/0x320 [1.938919] [] ? xfs_file_mmap+0x50/0x50 [xfs] [1.961532] [] vfs_open+0x57/0x60 [1.978945] [] path_openat+0x325/0x14e0 [1.999273] [] ? putname+0x53/0x60 [2.017695] [] do_filp_open+0x91/0x100 [2.036893] [] ? __alloc_fd+0x46/0x180 [2.055479] [] do_sys_open+0x124/0x210 [2.073783] [] ? __audit_syscall_exit+0x1db/0x260 [2.096426] [] SyS_openat+0x14/0x20 [2.113690] [] do_syscall_64+0x62/0x110 [2.132417] [] entry_SYSCALL64_slow_path+0x25/0x25 g273-block-dumps.tar.gz Description: application/gzip