Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-09 Thread Steven Rostedt
On Thu, Dec 01, 2016 at 10:32:09AM -0500, Dave Jones wrote:
> 
> (function-graph screws up the RIP for some reason, 'return_to_handler'
>  should actually be btrfs_destroy_inode)

That's because function_graph hijacks the return address and replaces it with
return_to_handler. The back trace has code to show what that handler is
suppose to be. But I should see why the WARNING shows it instead, and fix
that too.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-07 Thread Dave Jones
On Sat, Dec 03, 2016 at 11:48:33AM -0500, Dave Jones wrote:

 > The interesting process here seems to be kworker/u8:17, and the trace
 > captures some of what that was doing before that bad page was hit.

I'm travelling next week, so I'm trying to braindump the stuff I've
found so far and summarise so I can pick it back up later if no-one else
figures it out first.

I've hit the bad page map spew with enough regularity that I've now got a 
handful of
good traces.

http://codemonkey.org.uk/junk/btrfs/bad-page-state1.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state2.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state3.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state4.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state5.txt
http://codemonkey.org.uk/junk/btrfs/bad-page-state6.txt

It smells to me like a race between truncate and the writeback
workqueue. The variety of traces here seem to show both sides
of the race, sometimes it's kworker, sometimes a trinity child process.

bad-page-state3.txt onwards have some bonus trace_printk's from
btrfs_setsize as I was curious what sizes we were passing down to
truncate. The only patterns I see are going from very large to very
small sizes. Perhaps that causes truncate to generate so much
writeback that it makes the race apparent ?



Other stuff I keep hitting:

Start transaction spew:
http://codemonkey.org.uk/junk/btrfs/start_transaction.txt
That's the WARN_ON(h->use_count > 2);
I hit this with enough regularity that I had to comment it out.
It's not clear to me whether this is related at all.

Lockdep spew:
http://codemonkey.org.uk/junk/btrfs/register_lock_class1.txt
http://codemonkey.org.uk/junk/btrfs/register_lock_class2.txt
This stuff has been around for a while (4.6ish iirc)

Sometimes the fs got into a screwed up state that needed btrfscking.
http://codemonkey.org.uk/junk/btrfs/replay-log-fail.txt

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-03 Thread Dave Jones
On Thu, Dec 01, 2016 at 10:32:09AM -0500, Dave Jones wrote:
 > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents.txt
 > 
 > Also same bug, different run, but a different traceview 
 > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents-function-graph.txt
 > 
 > (function-graph screws up the RIP for some reason, 'return_to_handler'
 >  should actually be btrfs_destroy_inode)

Chris pointed me at a pending patch that took care of this warning.

 > Anyways, I've got some code that works pretty well for dumping the
 > ftrace buffer now when things go awry.  I just need to run it enough
 > times that I hit that bad page state instead of this, or a lockdep bug first.

Which allowed me to run long enough to get this trace..
http://codemonkey.org.uk/junk/bad-page-state.txt

Does this shed any light ?
The interesting process here seems to be kworker/u8:17, and the trace
captures some of what that was doing before that bad page was hit.

I've got another run going now, so I'll compare that trace when it
happens to see if it matches my current theory that it's something to
do with that btrfs_scrubparity_helper. I've seen that show up in stack
traces a few times while chasing this.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-01 Thread Dave Jones
On Wed, Nov 23, 2016 at 02:58:45PM -0500, Dave Jones wrote:
 > On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
 > 
 >  > [  317.689216] BUG: Bad page state in process kworker/u8:8  pfn:4d8fd4
 >  > trace from just before this happened. Does this shed any light ?
 >  > 
 >  > https://codemonkey.org.uk/junk/trace.txt
 > 
 > crap, I just noticed the timestamps in the trace come from quite a bit
 > later.  I'll tweak the code to do the taint checking/ftrace stop after
 > every syscall, that should narrow the window some more.
 > 
 > Getting closer..

Ok, this is getting more like it.
http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents.txt

Also same bug, different run, but a different traceview 
http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents-function-graph.txt

(function-graph screws up the RIP for some reason, 'return_to_handler'
 should actually be btrfs_destroy_inode)


Anyways, I've got some code that works pretty well for dumping the
ftrace buffer now when things go awry.  I just need to run it enough
times that I hit that bad page state instead of this, or a lockdep bug first.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html