Re: Linux-next parallel cp workload hang

2016-05-19 Thread Xiong Zhou
On Thu, May 19, 2016 at 09:02:31AM +1000, Dave Chinner wrote:
> On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote:
> > Patch below should fix the deadlock.
> 
> The test has been running for several hours without failure using
> this patch, so I'd say this fixes the problem...

Yes, the same for me.

Thanks.

Xiong

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-19 Thread Xiong Zhou
On Thu, May 19, 2016 at 09:02:31AM +1000, Dave Chinner wrote:
> On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote:
> > Patch below should fix the deadlock.
> 
> The test has been running for several hours without failure using
> this patch, so I'd say this fixes the problem...

Yes, the same for me.

Thanks.

Xiong

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote:
> Patch below should fix the deadlock.

The test has been running for several hours without failure using
this patch, so I'd say this fixes the problem...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote:
> Patch below should fix the deadlock.

The test has been running for several hours without failure using
this patch, so I'd say this fixes the problem...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote:
> 
> On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> > > Hi,
> > > 
> > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > > > Hi,
> > > > > 
> > > > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > > > It's reproducible with a small chance, less the 1/100 i think.
> > > > > 
> > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > > > xfs with loop or block device. Ext4 survived several rounds
> > > > > of testing.
> > > > > 
> > > > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > > > times. The same tree with vfs parallel lookup patchset reverted
> > > > > survived 900 rounds testing. Reverted commits are attached.  > 

Ok, this is trivial to reproduce. Al - I've hit this 9 times out of
ten running it on a 4p VM with a pair of 4GB ram disks using all
the current upstream default mkfs and mount configurations. On the
tenth attempt I got the tracing to capture what I needed to see -
process 7340 was the last xfs_buf_lock_done trace without an unlock
trace, and that process had this trace:

schedule
rwsem_down_read_failed
call_rwsem_down_read_failed
down_read
xfs_ilock
xfs_ilock_data_map_shared
xfs_dir2_leaf_getdents
xfs_readdir
xfs_file_readdir
iterate_dir
SyS_getdents
entry_SYSCALL_64_fastpath

Which means it's holding a buffer lock while trying to get the
ilock(shared). That's never going to end well - I'm now wondering
why lockdep hasn't been all over this lock order inversion

Essentially, it's a three-process deadlock involving
shared/exclusive barriers and inverted lock orders. It's a
pre-existing problem with buffer mapping lock orders, nothing to do
with with the VFS parallelisation code.

process 1   process 2   process 3
-   -   -
readdir
iolock(shared)
  get_leaf_dents
iterate entries
   ilock(shared)
   map, lock and read buffer
   iunlock(shared)
   process entries in buffer
   .
readdir
iolock(shared)
  get_leaf_dents
iterate entries
  ilock(shared)
  map, lock buffer
  
finish ->iterate_shared
file_accessed()
  ->update_time
start transaction
ilock(excl)

.
finishes processing buffer
get next buffer
  ilock(shared)
  

And that's the deadlock.

Now I know what the problem is I can say that process 2 - the
transactional timestamp update - is the reason the readdir
operations are blocking like this. And I know why CXFS never hit
this - it doesn't use the VFS paths, so the VFS calls to update
timestamps don't exist during concurrent readdir operations on the
CXFS metadata server. Hence process 2 doesn't exist and no exclusive
barriers are put in amongst the shared locking

Patch below should fix the deadlock.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


xfs: concurrent readdir hangs on data buffer locks

From: Dave Chinner 

There's a three-process deadlock involving shared/exclusive barriers
and inverted lock orders in the directory readdir implementation.
It's a pre-existing problem with lock ordering, exposed by the
VFS parallelisation code.

process 1   process 2   process 3
-   -   -
readdir
iolock(shared)
  get_leaf_dents
iterate entries
   ilock(shared)
   map, lock and read buffer
   iunlock(shared)
   process entries in buffer
   .
readdir
iolock(shared)
  get_leaf_dents
iterate entries
  ilock(shared)
  map, lock buffer
  
finish ->iterate_shared
file_accessed()
  ->update_time
start transaction
ilock(excl)

.
finishes processing buffer
get next buffer
  ilock(shared)
  

And 

Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote:
> 
> On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> > > Hi,
> > > 
> > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > > > Hi,
> > > > > 
> > > > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > > > It's reproducible with a small chance, less the 1/100 i think.
> > > > > 
> > > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > > > xfs with loop or block device. Ext4 survived several rounds
> > > > > of testing.
> > > > > 
> > > > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > > > times. The same tree with vfs parallel lookup patchset reverted
> > > > > survived 900 rounds testing. Reverted commits are attached.  > 

Ok, this is trivial to reproduce. Al - I've hit this 9 times out of
ten running it on a 4p VM with a pair of 4GB ram disks using all
the current upstream default mkfs and mount configurations. On the
tenth attempt I got the tracing to capture what I needed to see -
process 7340 was the last xfs_buf_lock_done trace without an unlock
trace, and that process had this trace:

schedule
rwsem_down_read_failed
call_rwsem_down_read_failed
down_read
xfs_ilock
xfs_ilock_data_map_shared
xfs_dir2_leaf_getdents
xfs_readdir
xfs_file_readdir
iterate_dir
SyS_getdents
entry_SYSCALL_64_fastpath

Which means it's holding a buffer lock while trying to get the
ilock(shared). That's never going to end well - I'm now wondering
why lockdep hasn't been all over this lock order inversion

Essentially, it's a three-process deadlock involving
shared/exclusive barriers and inverted lock orders. It's a
pre-existing problem with buffer mapping lock orders, nothing to do
with with the VFS parallelisation code.

process 1   process 2   process 3
-   -   -
readdir
iolock(shared)
  get_leaf_dents
iterate entries
   ilock(shared)
   map, lock and read buffer
   iunlock(shared)
   process entries in buffer
   .
readdir
iolock(shared)
  get_leaf_dents
iterate entries
  ilock(shared)
  map, lock buffer
  
finish ->iterate_shared
file_accessed()
  ->update_time
start transaction
ilock(excl)

.
finishes processing buffer
get next buffer
  ilock(shared)
  

And that's the deadlock.

Now I know what the problem is I can say that process 2 - the
transactional timestamp update - is the reason the readdir
operations are blocking like this. And I know why CXFS never hit
this - it doesn't use the VFS paths, so the VFS calls to update
timestamps don't exist during concurrent readdir operations on the
CXFS metadata server. Hence process 2 doesn't exist and no exclusive
barriers are put in amongst the shared locking

Patch below should fix the deadlock.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


xfs: concurrent readdir hangs on data buffer locks

From: Dave Chinner 

There's a three-process deadlock involving shared/exclusive barriers
and inverted lock orders in the directory readdir implementation.
It's a pre-existing problem with lock ordering, exposed by the
VFS parallelisation code.

process 1   process 2   process 3
-   -   -
readdir
iolock(shared)
  get_leaf_dents
iterate entries
   ilock(shared)
   map, lock and read buffer
   iunlock(shared)
   process entries in buffer
   .
readdir
iolock(shared)
  get_leaf_dents
iterate entries
  ilock(shared)
  map, lock buffer
  
finish ->iterate_shared
file_accessed()
  ->update_time
start transaction
ilock(excl)

.
finishes processing buffer
get next buffer
  ilock(shared)
  

And that's the deadlock.


Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> Hi,
> 
> On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > Hi,
> > > 
> > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > It's reproducible with a small chance, less the 1/100 i think.
> > > 
> > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > xfs with loop or block device. Ext4 survived several rounds
> > > of testing.
> > > 
> > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > times. The same tree with vfs parallel lookup patchset reverted
> > > survived 900 rounds testing. Reverted commits are attached.  > 
> > What hardware?
> 
> A HP prototype host.

description? cpus, memory, etc? I want to have some idea of what
hardware I need to reproduce this...

xfs_info from the scratch filesystem would also be handy.

> > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and
> > it doesn't trigger any warnings or asserts, can you then try to
> > reproduce it while tracing the following events:
> > 
> > xfs_buf_lock
> > xfs_buf_lock_done
> > xfs_buf_trylock
> > xfs_buf_unlock
> > 
> > So we might be able to see if there's an unexpected buffer
> > locking/state pattern occurring when the hang occurs?
> 
> Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers
> on. There are some trace output after hang for a while.

I'm not actually interested in the trace after the hang - I'm
interested in what happened leading up to the hang. The output
you've given me tell me that the directory block at offset is locked
but nothing in the trace tells me what locked it.

Can I suggest using trace-cmd to record the events, then when the
test hangs kill the check process so that trace-cmd terminates and
gathers the events. Then dump the report to a text file and attach
that?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> Hi,
> 
> On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > Hi,
> > > 
> > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > It's reproducible with a small chance, less the 1/100 i think.
> > > 
> > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > xfs with loop or block device. Ext4 survived several rounds
> > > of testing.
> > > 
> > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > times. The same tree with vfs parallel lookup patchset reverted
> > > survived 900 rounds testing. Reverted commits are attached.  > 
> > What hardware?
> 
> A HP prototype host.

description? cpus, memory, etc? I want to have some idea of what
hardware I need to reproduce this...

xfs_info from the scratch filesystem would also be handy.

> > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and
> > it doesn't trigger any warnings or asserts, can you then try to
> > reproduce it while tracing the following events:
> > 
> > xfs_buf_lock
> > xfs_buf_lock_done
> > xfs_buf_trylock
> > xfs_buf_unlock
> > 
> > So we might be able to see if there's an unexpected buffer
> > locking/state pattern occurring when the hang occurs?
> 
> Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers
> on. There are some trace output after hang for a while.

I'm not actually interested in the trace after the hang - I'm
interested in what happened leading up to the hang. The output
you've given me tell me that the directory block at offset is locked
but nothing in the trace tells me what locked it.

Can I suggest using trace-cmd to record the events, then when the
test hangs kill the check process so that trace-cmd terminates and
gathers the events. Then dump the report to a text file and attach
that?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: Linux-next parallel cp workload hang

2016-05-17 Thread Dave Chinner
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> Hi,
> 
> Parallel cp workload (xfstests generic/273) hangs like blow.
> It's reproducible with a small chance, less the 1/100 i think.
> 
> Have hit this in linux-next 20160504 0506 0510 trees, testing on
> xfs with loop or block device. Ext4 survived several rounds
> of testing.
> 
> Linux next 20160510 tree hangs within 500 rounds testing several
> times. The same tree with vfs parallel lookup patchset reverted
> survived 900 rounds testing. Reverted commits are attached.

What hardware?

> Bisecting in this patchset ided this commit:
> 
> 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
> commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
> Author: Al Viro 
> Date:   Wed Apr 20 23:42:46 2016 -0400
> 
> simple local filesystems: switch to ->iterate_shared()
> 
> no changes needed (XFS isn't simple, but it has the same parallelism
> in the interesting parts exercised from CXFS).
> 
> With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
> of testing passed.
> 
> Although 2000 rounds testing had been conducted before good/bad
> verdict, i'm not 100 percent sure about all this, since it's
> so hard to hit, and i am not that lucky..
> 
> Bisect log and full blocked state process dump log are also attached.
> 
> Furthermore, this was first hit when testing fs dax on nvdimm,
> however it's reproducible without dax mount option, and also
> reproducible on loop device, just seems harder to hit.
> 
> Thanks,
> Xiong
> 
> [0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
> [0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
> [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [0.841801] cp  D 880b4e977928 0 49033  49014
> 0x0080
> [0.868923]  880b4e977928 880ba275d380 880b8d712b80
> 880b4e978000
> [0.897504]  7fff 0002 
> 880b8d712b80
> [0.925234]  880b4e977940 816cbc25 88035a1dabb0
> 880b4e9779e8
> [0.953237] Call Trace:
> [0.958314]  [] schedule+0x35/0x80
> [0.974854]  [] schedule_timeout+0x231/0x2d0
> [0.995728]  [] ? down_trylock+0x2d/0x40
> [1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
> [1.040182]  [] __down_common+0xaa/0x104
> [1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
> [1.081357]  [] __down+0x1d/0x1f
> [1.097166]  [] down+0x41/0x50
> [1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
> [1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
> [1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]

So what's holding that directory data buffer lock? It should only be
held if there is either IO in progress, or a modification of the
buffer in progress that is blocked somewhere else.

> [1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
> [1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
> [1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
> [1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
> [1.275736]  [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 
> [xfs]
> [1.305094]  [] ? down_read+0x12/0x30
> [1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
> [1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs]
> [1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]

So we should be holding the ilock in shared mode here...

> [1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
> [1.416874]  [] iterate_dir+0x173/0x190
> [1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
> [1.460951]  [] SyS_getdents+0x98/0x120
> [1.480566]  [] ? iterate_dir+0x190/0x190
> [1.500909]  [] do_syscall_64+0x62/0x110
> [1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
> [1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
> [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [1.618544] cp  D 880b91463b00 0 49040  49016
> 0x0080
> [1.645502]  880b91463b00 880464d5c140 88029b475700
> 880b91464000
> [1.674145]  880411c42610  880411c42628
> 8802c10bc610
> [1.702834]  880b91463b18 816cbc25 88029b475700
> 880b91463b88
> [1.731501] Call Trace:
> [1.736866]  [] schedule+0x35/0x80
> [1.754119]  [] rwsem_down_read_failed+0xf2/0x140
> [1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
> [1.830482]  [] down_read+0x20/0x30
> [1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
> [1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40

And it this is an attempt to lock the inode shared, so if that is
failing while there's another shared holder, than means there's an
exclusive waiter queued up (i.e. read iheld -> write blocked -> read
blocked).


So looking at dump-g273xfs0510:

[  845.727907] INFO: task cp:40126 blocked for more than 120 seconds.
[  845.751175]   Not tainted 4.6.0-rc7-next-20160510 #9
[  845.770011] "echo 0 > 

Re: Linux-next parallel cp workload hang

2016-05-17 Thread Dave Chinner
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> Hi,
> 
> Parallel cp workload (xfstests generic/273) hangs like blow.
> It's reproducible with a small chance, less the 1/100 i think.
> 
> Have hit this in linux-next 20160504 0506 0510 trees, testing on
> xfs with loop or block device. Ext4 survived several rounds
> of testing.
> 
> Linux next 20160510 tree hangs within 500 rounds testing several
> times. The same tree with vfs parallel lookup patchset reverted
> survived 900 rounds testing. Reverted commits are attached.

What hardware?

> Bisecting in this patchset ided this commit:
> 
> 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
> commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
> Author: Al Viro 
> Date:   Wed Apr 20 23:42:46 2016 -0400
> 
> simple local filesystems: switch to ->iterate_shared()
> 
> no changes needed (XFS isn't simple, but it has the same parallelism
> in the interesting parts exercised from CXFS).
> 
> With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
> of testing passed.
> 
> Although 2000 rounds testing had been conducted before good/bad
> verdict, i'm not 100 percent sure about all this, since it's
> so hard to hit, and i am not that lucky..
> 
> Bisect log and full blocked state process dump log are also attached.
> 
> Furthermore, this was first hit when testing fs dax on nvdimm,
> however it's reproducible without dax mount option, and also
> reproducible on loop device, just seems harder to hit.
> 
> Thanks,
> Xiong
> 
> [0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
> [0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
> [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [0.841801] cp  D 880b4e977928 0 49033  49014
> 0x0080
> [0.868923]  880b4e977928 880ba275d380 880b8d712b80
> 880b4e978000
> [0.897504]  7fff 0002 
> 880b8d712b80
> [0.925234]  880b4e977940 816cbc25 88035a1dabb0
> 880b4e9779e8
> [0.953237] Call Trace:
> [0.958314]  [] schedule+0x35/0x80
> [0.974854]  [] schedule_timeout+0x231/0x2d0
> [0.995728]  [] ? down_trylock+0x2d/0x40
> [1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
> [1.040182]  [] __down_common+0xaa/0x104
> [1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
> [1.081357]  [] __down+0x1d/0x1f
> [1.097166]  [] down+0x41/0x50
> [1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
> [1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
> [1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]

So what's holding that directory data buffer lock? It should only be
held if there is either IO in progress, or a modification of the
buffer in progress that is blocked somewhere else.

> [1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
> [1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
> [1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
> [1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
> [1.275736]  [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 
> [xfs]
> [1.305094]  [] ? down_read+0x12/0x30
> [1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
> [1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs]
> [1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]

So we should be holding the ilock in shared mode here...

> [1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
> [1.416874]  [] iterate_dir+0x173/0x190
> [1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
> [1.460951]  [] SyS_getdents+0x98/0x120
> [1.480566]  [] ? iterate_dir+0x190/0x190
> [1.500909]  [] do_syscall_64+0x62/0x110
> [1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
> [1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
> [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [1.618544] cp  D 880b91463b00 0 49040  49016
> 0x0080
> [1.645502]  880b91463b00 880464d5c140 88029b475700
> 880b91464000
> [1.674145]  880411c42610  880411c42628
> 8802c10bc610
> [1.702834]  880b91463b18 816cbc25 88029b475700
> 880b91463b88
> [1.731501] Call Trace:
> [1.736866]  [] schedule+0x35/0x80
> [1.754119]  [] rwsem_down_read_failed+0xf2/0x140
> [1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
> [1.830482]  [] down_read+0x20/0x30
> [1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
> [1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40

And it this is an attempt to lock the inode shared, so if that is
failing while there's another shared holder, than means there's an
exclusive waiter queued up (i.e. read iheld -> write blocked -> read
blocked).


So looking at dump-g273xfs0510:

[  845.727907] INFO: task cp:40126 blocked for more than 120 seconds.
[  845.751175]   Not tainted 4.6.0-rc7-next-20160510 #9
[  845.770011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 

Linux-next parallel cp workload hang

2016-05-17 Thread Xiong Zhou
Hi,

Parallel cp workload (xfstests generic/273) hangs like blow.
It's reproducible with a small chance, less the 1/100 i think.

Have hit this in linux-next 20160504 0506 0510 trees, testing on
xfs with loop or block device. Ext4 survived several rounds
of testing.

Linux next 20160510 tree hangs within 500 rounds testing several
times. The same tree with vfs parallel lookup patchset reverted
survived 900 rounds testing. Reverted commits are attached.

Bisecting in this patchset ided this commit:

3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
Author: Al Viro 
Date:   Wed Apr 20 23:42:46 2016 -0400

simple local filesystems: switch to ->iterate_shared()

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
of testing passed.

Although 2000 rounds testing had been conducted before good/bad
verdict, i'm not 100 percent sure about all this, since it's
so hard to hit, and i am not that lucky..

Bisect log and full blocked state process dump log are also attached.

Furthermore, this was first hit when testing fs dax on nvdimm,
however it's reproducible without dax mount option, and also
reproducible on loop device, just seems harder to hit.

Thanks,
Xiong

[0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
[0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
[0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[0.841801] cp  D 880b4e977928 0 49033  49014
0x0080
[0.868923]  880b4e977928 880ba275d380 880b8d712b80
880b4e978000
[0.897504]  7fff 0002 
880b8d712b80
[0.925234]  880b4e977940 816cbc25 88035a1dabb0
880b4e9779e8
[0.953237] Call Trace:
[0.958314]  [] schedule+0x35/0x80
[0.974854]  [] schedule_timeout+0x231/0x2d0
[0.995728]  [] ? down_trylock+0x2d/0x40
[1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
[1.040182]  [] __down_common+0xaa/0x104
[1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
[1.081357]  [] __down+0x1d/0x1f
[1.097166]  [] down+0x41/0x50
[1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
[1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
[1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]
[1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
[1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
[1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
[1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
[1.275736]  []
xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs]
[1.305094]  [] ? down_read+0x12/0x30
[1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
[1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0
[xfs]
[1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]
[1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
[1.416874]  [] iterate_dir+0x173/0x190
[1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
[1.460951]  [] SyS_getdents+0x98/0x120
[1.480566]  [] ? iterate_dir+0x190/0x190
[1.500909]  [] do_syscall_64+0x62/0x110
[1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
[1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
[1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
[1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1.618544] cp  D 880b91463b00 0 49040  49016
0x0080
[1.645502]  880b91463b00 880464d5c140 88029b475700
880b91464000
[1.674145]  880411c42610  880411c42628
8802c10bc610
[1.702834]  880b91463b18 816cbc25 88029b475700
880b91463b88
[1.731501] Call Trace:
[1.736866]  [] schedule+0x35/0x80
[1.754119]  [] rwsem_down_read_failed+0xf2/0x140
[1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
[1.830482]  [] down_read+0x20/0x30
[1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
[1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.896775]  [] xfs_dir_open+0x30/0x60 [xfs]
[1.917882]  [] do_dentry_open+0x20f/0x320
[1.938919]  [] ? xfs_file_mmap+0x50/0x50 [xfs]
[1.961532]  [] vfs_open+0x57/0x60
[1.978945]  [] path_openat+0x325/0x14e0
[1.999273]  [] ? putname+0x53/0x60
[2.017695]  [] do_filp_open+0x91/0x100
[2.036893]  [] ? __alloc_fd+0x46/0x180
[2.055479]  [] do_sys_open+0x124/0x210
[2.073783]  [] ? __audit_syscall_exit+0x1db/0x260
[2.096426]  [] SyS_openat+0x14/0x20
[2.113690]  [] do_syscall_64+0x62/0x110
[2.132417]  [] entry_SYSCALL64_slow_path+0x25/0x25



g273-block-dumps.tar.gz
Description: application/gzip


Linux-next parallel cp workload hang

2016-05-17 Thread Xiong Zhou
Hi,

Parallel cp workload (xfstests generic/273) hangs like blow.
It's reproducible with a small chance, less the 1/100 i think.

Have hit this in linux-next 20160504 0506 0510 trees, testing on
xfs with loop or block device. Ext4 survived several rounds
of testing.

Linux next 20160510 tree hangs within 500 rounds testing several
times. The same tree with vfs parallel lookup patchset reverted
survived 900 rounds testing. Reverted commits are attached.

Bisecting in this patchset ided this commit:

3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
Author: Al Viro 
Date:   Wed Apr 20 23:42:46 2016 -0400

simple local filesystems: switch to ->iterate_shared()

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
of testing passed.

Although 2000 rounds testing had been conducted before good/bad
verdict, i'm not 100 percent sure about all this, since it's
so hard to hit, and i am not that lucky..

Bisect log and full blocked state process dump log are also attached.

Furthermore, this was first hit when testing fs dax on nvdimm,
however it's reproducible without dax mount option, and also
reproducible on loop device, just seems harder to hit.

Thanks,
Xiong

[0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
[0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
[0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[0.841801] cp  D 880b4e977928 0 49033  49014
0x0080
[0.868923]  880b4e977928 880ba275d380 880b8d712b80
880b4e978000
[0.897504]  7fff 0002 
880b8d712b80
[0.925234]  880b4e977940 816cbc25 88035a1dabb0
880b4e9779e8
[0.953237] Call Trace:
[0.958314]  [] schedule+0x35/0x80
[0.974854]  [] schedule_timeout+0x231/0x2d0
[0.995728]  [] ? down_trylock+0x2d/0x40
[1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
[1.040182]  [] __down_common+0xaa/0x104
[1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
[1.081357]  [] __down+0x1d/0x1f
[1.097166]  [] down+0x41/0x50
[1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
[1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
[1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]
[1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
[1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
[1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
[1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
[1.275736]  []
xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs]
[1.305094]  [] ? down_read+0x12/0x30
[1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
[1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0
[xfs]
[1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]
[1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
[1.416874]  [] iterate_dir+0x173/0x190
[1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
[1.460951]  [] SyS_getdents+0x98/0x120
[1.480566]  [] ? iterate_dir+0x190/0x190
[1.500909]  [] do_syscall_64+0x62/0x110
[1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
[1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
[1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
[1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1.618544] cp  D 880b91463b00 0 49040  49016
0x0080
[1.645502]  880b91463b00 880464d5c140 88029b475700
880b91464000
[1.674145]  880411c42610  880411c42628
8802c10bc610
[1.702834]  880b91463b18 816cbc25 88029b475700
880b91463b88
[1.731501] Call Trace:
[1.736866]  [] schedule+0x35/0x80
[1.754119]  [] rwsem_down_read_failed+0xf2/0x140
[1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
[1.830482]  [] down_read+0x20/0x30
[1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
[1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.896775]  [] xfs_dir_open+0x30/0x60 [xfs]
[1.917882]  [] do_dentry_open+0x20f/0x320
[1.938919]  [] ? xfs_file_mmap+0x50/0x50 [xfs]
[1.961532]  [] vfs_open+0x57/0x60
[1.978945]  [] path_openat+0x325/0x14e0
[1.999273]  [] ? putname+0x53/0x60
[2.017695]  [] do_filp_open+0x91/0x100
[2.036893]  [] ? __alloc_fd+0x46/0x180
[2.055479]  [] do_sys_open+0x124/0x210
[2.073783]  [] ? __audit_syscall_exit+0x1db/0x260
[2.096426]  [] SyS_openat+0x14/0x20
[2.113690]  [] do_syscall_64+0x62/0x110
[2.132417]  [] entry_SYSCALL64_slow_path+0x25/0x25



g273-block-dumps.tar.gz
Description: application/gzip