Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-24 Thread Dave Chinner
On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
> Just saw this during boot after an unclean shutdown. It hung afterwards.
> 
> [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
> file: fs/xfs/xfs_dir2_sf.c, line: 358
> [   97.164646] [ cut here ]

I just reproduced this by chance. I know exactly what I was doing
when it went bang, so I might be able to reduce it from scattergun
blast testing to be able to reproduce it on demand.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-24 Thread Dave Chinner
On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
 Just saw this during boot after an unclean shutdown. It hung afterwards.
 
 [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
 file: fs/xfs/xfs_dir2_sf.c, line: 358
 [   97.164646] [ cut here ]

I just reproduced this by chance. I know exactly what I was doing
when it went bang, so I might be able to reduce it from scattergun
blast testing to be able to reproduce it on demand.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Chinner
On Mon, Jul 15, 2013 at 09:42:44PM -0400, Dave Jones wrote:
> On Tue, Jul 16, 2013 at 10:27:03AM +1000, Dave Chinner wrote:
>  > On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
>  > > On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
>  > >  
>  > >  > > [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
> ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
>  > >  > 
>  > >  > 
>  > >  > So, it's not clear what has caused this yet. Is it reproducable?
>  > > 
>  > > Just hit it again on the same machine.
>  > > 
>  > >  > would be good to get a trace of lookup vs addname events from XFS,
>  > >  > too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
>  > >  > correct lookups were done prior to the failing addname operation...
>  > >  
>  > > <- xfs tracing noob. Got a recipe I can cut-n-paste ?
>  > > Or a patch if that's necessary..
>  > 
>  > Just use your usual method of pulling tracepoints out of the kernel,
>  > be it with perf or (my preferred method) trace-cmd:
>  > 
>  > # trace-cmd record -e xfs_dir\* -e xfs_da\* 
>  > 
>  > # trace-cmd report > trace.report.out
>  > 
>  > The trace.report.out file is what I'd like to see. The report file
>  > could be large, so compressing it might be an idea...
> 
> Hmm, will that only log traces from the command that I run ?

No, it will capture kernel wide. The command being run just gives it
a defined end-point to the capture. i.e. I often just use "sleep 30"
as the command.

$ sudo ~/trace-cmd/trace-cmd record -e xfs\* sleep 30 &
[1] 4180
$ /sys/kernel/debug/tracing/events/xfs*/filter
/sys/kernel/debug/tracing/events/*/xfs*/filter

$ sudo mount /dev/vda /mnt/test
$ sync
$ sudo umount /mnt/test
$ fg
sudo ~/trace-cmd/trace-cmd record -e xfs\* sleep 30
trace-cmd: Interrupted system call
  recorder error in splice input
offset=3eb000
Kernel buffer statistics:
  Note: "entries" are the entries left in the kernel ring buffer and are not
recorded in the trace data. They should all be zero.

CPU: 0
entries: 55
overrun: 0
commit overrun: 0
bytes: 3420
oldest event ts: 48589.221789
now ts: 48607.261966
dropped events: 0
read events: 352

$ sudo ~/trace-cmd/trace-cmd report > ~/t.t
$ head -10 ~/t.t
version = 6
cpus=1
   mount-4185  [000] 48582.777997: xfs_buf_init: dev 253:0 bno 
0x nblks 0x1 hold 1 pincount 0 lock 0 flags  caller 
xfs_buf_get_uncached
   mount-4185  [000] 48582.778001: xfs_buf_get_uncached: dev 253:0 bno 
0x nblks 0x1 hold 1 pincount 0 lock 0 flags PAGES caller 
xfs_buf_read_uncached
   mount-4185  [000] 48582.778002: xfs_buf_iorequest:dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfsbdstrat
   mount-4185  [000] 48582.778002: xfs_buf_hold: dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   mount-4185  [000] 48582.778017: xfs_buf_rele: dev 253:0 bno 
0x0 nblks 0x1 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   mount-4185  [000] 48582.778017: xfs_buf_iowait:   dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller 
xfs_buf_read_uncached
  -0 [000] 48582.778692: xfs_buf_ioerror:  dev 253:0 bno 
0x0 len 0x200 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller 
xfs_buf_bio_end_io
  -0 [000] 48582.778693: xfs_buf_iodone:   dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
$

> I might need to do some kind of system-wide tracing here,
> because every time this has happened so far, it's happened in
> a different process.  (rpcbind the first [wtf?], and chrony the 2nd)
> Can trace-cmd do that ?

Yup. trace-cmd will trace global events unless you give it a
specific process filter to limit what PIDs it gathers events from.
Basically, trace-cmd is a much nicer front end to perf...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Jones
On Tue, Jul 16, 2013 at 10:27:03AM +1000, Dave Chinner wrote:
 > On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
 > > On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
 > >  
 > >  > > [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
 > > ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
 > >  > 
 > >  > 
 > >  > So, it's not clear what has caused this yet. Is it reproducable?
 > > 
 > > Just hit it again on the same machine.
 > > 
 > >  > would be good to get a trace of lookup vs addname events from XFS,
 > >  > too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
 > >  > correct lookups were done prior to the failing addname operation...
 > >  
 > > <- xfs tracing noob. Got a recipe I can cut-n-paste ?
 > > Or a patch if that's necessary..
 > 
 > Just use your usual method of pulling tracepoints out of the kernel,
 > be it with perf or (my preferred method) trace-cmd:
 > 
 > # trace-cmd record -e xfs_dir\* -e xfs_da\* 
 > 
 > # trace-cmd report > trace.report.out
 > 
 > The trace.report.out file is what I'd like to see. The report file
 > could be large, so compressing it might be an idea...

Hmm, will that only log traces from the command that I run ?

I might need to do some kind of system-wide tracing here,
because every time this has happened so far, it's happened in
a different process.  (rpcbind the first [wtf?], and chrony the 2nd)
Can trace-cmd do that ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Chinner
On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
> On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
>  
>  > > [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
> ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
>  > 
>  > 
>  > So, it's not clear what has caused this yet. Is it reproducable?
> 
> Just hit it again on the same machine.
> 
>  > would be good to get a trace of lookup vs addname events from XFS,
>  > too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
>  > correct lookups were done prior to the failing addname operation...
>  
> <- xfs tracing noob. Got a recipe I can cut-n-paste ?
> Or a patch if that's necessary..

Just use your usual method of pulling tracepoints out of the kernel,
be it with perf or (my preferred method) trace-cmd:

# trace-cmd record -e xfs_dir\* -e xfs_da\* 

# trace-cmd report > trace.report.out

The trace.report.out file is what I'd like to see. The report file
could be large, so compressing it might be an idea...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Jones
On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
 
 > > [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
 > > file: fs/xfs/xfs_dir2_sf.c, line: 358
 > 
 > 
 > So, it's not clear what has caused this yet. Is it reproducable?

Just hit it again on the same machine.

 > would be good to get a trace of lookup vs addname events from XFS,
 > too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
 > correct lookups were done prior to the failing addname operation...
 
<- xfs tracing noob. Got a recipe I can cut-n-paste ?
Or a patch if that's necessary..

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Jones
On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
 
   [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
   file: fs/xfs/xfs_dir2_sf.c, line: 358
  
  
  So, it's not clear what has caused this yet. Is it reproducable?

Just hit it again on the same machine.

  would be good to get a trace of lookup vs addname events from XFS,
  too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
  correct lookups were done prior to the failing addname operation...
 
- xfs tracing noob. Got a recipe I can cut-n-paste ?
Or a patch if that's necessary..

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Chinner
On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
 On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
  
[   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
 ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
   
   
   So, it's not clear what has caused this yet. Is it reproducable?
 
 Just hit it again on the same machine.
 
   would be good to get a trace of lookup vs addname events from XFS,
   too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
   correct lookups were done prior to the failing addname operation...
  
 - xfs tracing noob. Got a recipe I can cut-n-paste ?
 Or a patch if that's necessary..

Just use your usual method of pulling tracepoints out of the kernel,
be it with perf or (my preferred method) trace-cmd:

# trace-cmd record -e xfs_dir\* -e xfs_da\* command that dies

# trace-cmd report  trace.report.out

The trace.report.out file is what I'd like to see. The report file
could be large, so compressing it might be an idea...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Jones
On Tue, Jul 16, 2013 at 10:27:03AM +1000, Dave Chinner wrote:
  On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
   On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:

  [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
   ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
 
 
 So, it's not clear what has caused this yet. Is it reproducable?
   
   Just hit it again on the same machine.
   
 would be good to get a trace of lookup vs addname events from XFS,
 too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
 correct lookups were done prior to the failing addname operation...

   - xfs tracing noob. Got a recipe I can cut-n-paste ?
   Or a patch if that's necessary..
  
  Just use your usual method of pulling tracepoints out of the kernel,
  be it with perf or (my preferred method) trace-cmd:
  
  # trace-cmd record -e xfs_dir\* -e xfs_da\* command that dies
  
  # trace-cmd report  trace.report.out
  
  The trace.report.out file is what I'd like to see. The report file
  could be large, so compressing it might be an idea...

Hmm, will that only log traces from the command that I run ?

I might need to do some kind of system-wide tracing here,
because every time this has happened so far, it's happened in
a different process.  (rpcbind the first [wtf?], and chrony the 2nd)
Can trace-cmd do that ?

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-15 Thread Dave Chinner
On Mon, Jul 15, 2013 at 09:42:44PM -0400, Dave Jones wrote:
 On Tue, Jul 16, 2013 at 10:27:03AM +1000, Dave Chinner wrote:
   On Mon, Jul 15, 2013 at 03:58:00PM -0400, Dave Jones wrote:
On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
 
   [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == 
 ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358
  
  
  So, it's not clear what has caused this yet. Is it reproducable?

Just hit it again on the same machine.

  would be good to get a trace of lookup vs addname events from XFS,
  too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
  correct lookups were done prior to the failing addname operation...
 
- xfs tracing noob. Got a recipe I can cut-n-paste ?
Or a patch if that's necessary..
   
   Just use your usual method of pulling tracepoints out of the kernel,
   be it with perf or (my preferred method) trace-cmd:
   
   # trace-cmd record -e xfs_dir\* -e xfs_da\* command that dies
   
   # trace-cmd report  trace.report.out
   
   The trace.report.out file is what I'd like to see. The report file
   could be large, so compressing it might be an idea...
 
 Hmm, will that only log traces from the command that I run ?

No, it will capture kernel wide. The command being run just gives it
a defined end-point to the capture. i.e. I often just use sleep 30
as the command.

$ sudo ~/trace-cmd/trace-cmd record -e xfs\* sleep 30 
[1] 4180
$ /sys/kernel/debug/tracing/events/xfs*/filter
/sys/kernel/debug/tracing/events/*/xfs*/filter

$ sudo mount /dev/vda /mnt/test
$ sync
$ sudo umount /mnt/test
$ fg
sudo ~/trace-cmd/trace-cmd record -e xfs\* sleep 30
trace-cmd: Interrupted system call
  recorder error in splice input
offset=3eb000
Kernel buffer statistics:
  Note: entries are the entries left in the kernel ring buffer and are not
recorded in the trace data. They should all be zero.

CPU: 0
entries: 55
overrun: 0
commit overrun: 0
bytes: 3420
oldest event ts: 48589.221789
now ts: 48607.261966
dropped events: 0
read events: 352

$ sudo ~/trace-cmd/trace-cmd report  ~/t.t
$ head -10 ~/t.t
version = 6
cpus=1
   mount-4185  [000] 48582.777997: xfs_buf_init: dev 253:0 bno 
0x nblks 0x1 hold 1 pincount 0 lock 0 flags  caller 
xfs_buf_get_uncached
   mount-4185  [000] 48582.778001: xfs_buf_get_uncached: dev 253:0 bno 
0x nblks 0x1 hold 1 pincount 0 lock 0 flags PAGES caller 
xfs_buf_read_uncached
   mount-4185  [000] 48582.778002: xfs_buf_iorequest:dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfsbdstrat
   mount-4185  [000] 48582.778002: xfs_buf_hold: dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   mount-4185  [000] 48582.778017: xfs_buf_rele: dev 253:0 bno 
0x0 nblks 0x1 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   mount-4185  [000] 48582.778017: xfs_buf_iowait:   dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller 
xfs_buf_read_uncached
  idle-0 [000] 48582.778692: xfs_buf_ioerror:  dev 253:0 bno 
0x0 len 0x200 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller 
xfs_buf_bio_end_io
  idle-0 [000] 48582.778693: xfs_buf_iodone:   dev 253:0 bno 
0x0 nblks 0x1 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
$

 I might need to do some kind of system-wide tracing here,
 because every time this has happened so far, it's happened in
 a different process.  (rpcbind the first [wtf?], and chrony the 2nd)
 Can trace-cmd do that ?

Yup. trace-cmd will trace global events unless you give it a
specific process filter to limit what PIDs it gathers events from.
Basically, trace-cmd is a much nicer front end to perf...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-13 Thread Dave Jones
On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
 > On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
 > > Just saw this during boot after an unclean shutdown. It hung afterwards.
 > > 
 > > [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
 > > file: fs/xfs/xfs_dir2_sf.c, line: 358
 > 
 > > [   97.173730]  [] xfs_dir2_sf_addname+0x43/0x760 [xfs]
 > > [   97.173743]  [] xfs_dir_createname+0x15c/0x1b0 [xfs]
 > > [   97.173754]  [] xfs_create+0x4cc/0x710 [xfs]
 > > [   97.173764]  [] xfs_vn_mknod+0x9a/0x1c0 [xfs]
 > > [   97.173773]  [] xfs_vn_create+0x13/0x20 [xfs]
 > > [   97.173776]  [] vfs_create+0x9d/0x100
 > > [   97.173778]  [] do_last+0x925/0xe00
 > > [   97.173780]  [] path_openat+0xbe/0x6f0
 > > [   97.173783]  [] ? local_clock+0x3f/0x50
 > > [   97.173785]  [] ? __alloc_fd+0xaf/0x200
 > > [   97.173787]  [] do_filp_open+0x3a/0x90
 > > [   97.173789]  [] ? __alloc_fd+0xaf/0x200
 > > [   97.173790]  [] do_sys_open+0x10b/0x200
 > > [   97.173792]  [] ? syscall_trace_enter+0x18/0x290
 > > [   97.173794]  [] SyS_open+0x1e/0x20
 > > 
 > > This trace repeated a few times, then the same assertion was triggered 
 > > from sys_renameat.
 > 
 > That's rather curious. What this means is that there is either an
 > EIO or EEXIST error being returned from xfs_dir2_sf_lookup() when a
 > we're about to add the new entry. There are two things here - EIO
 > can only be returned if a shutdown has occurred - are there any
 > signs of a shutdown in the logs?

not that I recall (machine is off, and remote right now).
1st time it was during boot (above), and 2nd time I hit it
the machine kept fuzzing for a while afterwards.

I'll see if I can reproduce it again on monday, and if so I'll
grab an xfsdump for you to stare at.

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-13 Thread Dave Jones
On Sat, Jul 13, 2013 at 12:00:30PM +1000, Dave Chinner wrote:
  On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
   Just saw this during boot after an unclean shutdown. It hung afterwards.
   
   [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
   file: fs/xfs/xfs_dir2_sf.c, line: 358
  
   [   97.173730]  [a0076953] xfs_dir2_sf_addname+0x43/0x760 [xfs]
   [   97.173743]  [a0067cfc] xfs_dir_createname+0x15c/0x1b0 [xfs]
   [   97.173754]  [a002f2dc] xfs_create+0x4cc/0x710 [xfs]
   [   97.173764]  [a00278ca] xfs_vn_mknod+0x9a/0x1c0 [xfs]
   [   97.173773]  [a0027a03] xfs_vn_create+0x13/0x20 [xfs]
   [   97.173776]  [811d100d] vfs_create+0x9d/0x100
   [   97.173778]  [811d1995] do_last+0x925/0xe00
   [   97.173780]  [811d1f2e] path_openat+0xbe/0x6f0
   [   97.173783]  [8109e33f] ? local_clock+0x3f/0x50
   [   97.173785]  [811e1b5f] ? __alloc_fd+0xaf/0x200
   [   97.173787]  [811d2c3a] do_filp_open+0x3a/0x90
   [   97.173789]  [811e1b5f] ? __alloc_fd+0xaf/0x200
   [   97.173790]  [811c0ddb] do_sys_open+0x10b/0x200
   [   97.173792]  [81010578] ? syscall_trace_enter+0x18/0x290
   [   97.173794]  [811c0eee] SyS_open+0x1e/0x20
   
   This trace repeated a few times, then the same assertion was triggered 
   from sys_renameat.
  
  That's rather curious. What this means is that there is either an
  EIO or EEXIST error being returned from xfs_dir2_sf_lookup() when a
  we're about to add the new entry. There are two things here - EIO
  can only be returned if a shutdown has occurred - are there any
  signs of a shutdown in the logs?

not that I recall (machine is off, and remote right now).
1st time it was during boot (above), and 2nd time I hit it
the machine kept fuzzing for a while afterwards.

I'll see if I can reproduce it again on monday, and if so I'll
grab an xfsdump for you to stare at.

Dave
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-12 Thread Dave Chinner
On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
> Just saw this during boot after an unclean shutdown. It hung afterwards.
> 
> [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
> file: fs/xfs/xfs_dir2_sf.c, line: 358

> [   97.173730]  [] xfs_dir2_sf_addname+0x43/0x760 [xfs]
> [   97.173743]  [] xfs_dir_createname+0x15c/0x1b0 [xfs]
> [   97.173754]  [] xfs_create+0x4cc/0x710 [xfs]
> [   97.173764]  [] xfs_vn_mknod+0x9a/0x1c0 [xfs]
> [   97.173773]  [] xfs_vn_create+0x13/0x20 [xfs]
> [   97.173776]  [] vfs_create+0x9d/0x100
> [   97.173778]  [] do_last+0x925/0xe00
> [   97.173780]  [] path_openat+0xbe/0x6f0
> [   97.173783]  [] ? local_clock+0x3f/0x50
> [   97.173785]  [] ? __alloc_fd+0xaf/0x200
> [   97.173787]  [] do_filp_open+0x3a/0x90
> [   97.173789]  [] ? __alloc_fd+0xaf/0x200
> [   97.173790]  [] do_sys_open+0x10b/0x200
> [   97.173792]  [] ? syscall_trace_enter+0x18/0x290
> [   97.173794]  [] SyS_open+0x1e/0x20
> 
> This trace repeated a few times, then the same assertion was triggered from 
> sys_renameat.

That's rather curious. What this means is that there is either an
EIO or EEXIST error being returned from xfs_dir2_sf_lookup() when a
we're about to add the new entry. There are two things here - EIO
can only be returned if a shutdown has occurred - are there any
signs of a shutdown in the logs? If there is a shutdown in progress,
then this is just unlucky to shutdown with an inode in an
inconsistent state in memory that triggers this validity check
failure.

And EEXIST means that the initial lookup of the name during the open
failed to find the entry we are now trying to create. i.e. the
initial path walk failed to do the correct lookup on the directory,
and so never got down to xfs_dir2_sf_lookup() to find the directory
entry (perhaps a problem with a cached negative dentry?). Hence it
was decided during the open(O_CREATE) call that the directory entry
needed to be created, we get down to XFS to create it, and then get
EEXIST because the name already exists...

So, it's not clear what has caused this yet. Is it reproducable? If
would be good to get a trace of lookup vs addname events from XFS,
too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
correct lookups were done prior to the failing addname operation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-12 Thread Dave Chinner
On Thu, Jul 11, 2013 at 10:39:30PM -0400, Dave Jones wrote:
 Just saw this during boot after an unclean shutdown. It hung afterwards.
 
 [   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, 
 file: fs/xfs/xfs_dir2_sf.c, line: 358

 [   97.173730]  [a0076953] xfs_dir2_sf_addname+0x43/0x760 [xfs]
 [   97.173743]  [a0067cfc] xfs_dir_createname+0x15c/0x1b0 [xfs]
 [   97.173754]  [a002f2dc] xfs_create+0x4cc/0x710 [xfs]
 [   97.173764]  [a00278ca] xfs_vn_mknod+0x9a/0x1c0 [xfs]
 [   97.173773]  [a0027a03] xfs_vn_create+0x13/0x20 [xfs]
 [   97.173776]  [811d100d] vfs_create+0x9d/0x100
 [   97.173778]  [811d1995] do_last+0x925/0xe00
 [   97.173780]  [811d1f2e] path_openat+0xbe/0x6f0
 [   97.173783]  [8109e33f] ? local_clock+0x3f/0x50
 [   97.173785]  [811e1b5f] ? __alloc_fd+0xaf/0x200
 [   97.173787]  [811d2c3a] do_filp_open+0x3a/0x90
 [   97.173789]  [811e1b5f] ? __alloc_fd+0xaf/0x200
 [   97.173790]  [811c0ddb] do_sys_open+0x10b/0x200
 [   97.173792]  [81010578] ? syscall_trace_enter+0x18/0x290
 [   97.173794]  [811c0eee] SyS_open+0x1e/0x20
 
 This trace repeated a few times, then the same assertion was triggered from 
 sys_renameat.

That's rather curious. What this means is that there is either an
EIO or EEXIST error being returned from xfs_dir2_sf_lookup() when a
we're about to add the new entry. There are two things here - EIO
can only be returned if a shutdown has occurred - are there any
signs of a shutdown in the logs? If there is a shutdown in progress,
then this is just unlucky to shutdown with an inode in an
inconsistent state in memory that triggers this validity check
failure.

And EEXIST means that the initial lookup of the name during the open
failed to find the entry we are now trying to create. i.e. the
initial path walk failed to do the correct lookup on the directory,
and so never got down to xfs_dir2_sf_lookup() to find the directory
entry (perhaps a problem with a cached negative dentry?). Hence it
was decided during the open(O_CREATE) call that the directory entry
needed to be created, we get down to XFS to create it, and then get
EEXIST because the name already exists...

So, it's not clear what has caused this yet. Is it reproducable? If
would be good to get a trace of lookup vs addname events from XFS,
too (i.e. all the xfs_dir* and xfs_da* events) so we can see if the
correct lookups were done prior to the failing addname operation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-11 Thread Dave Jones
Just saw this during boot after an unclean shutdown. It hung afterwards.

[   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: 
fs/xfs/xfs_dir2_sf.c, line: 358
[   97.164646] [ cut here ]
[   97.165312] kernel BUG at fs/xfs/xfs_message.c:108!
[   97.165986] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   97.166947] Modules linked in: snd_hda_codec_realtek pcspkr usb_debug 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp 
snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[   97.170782] CPU: 3 PID: 341 Comm: rpcbind Not tainted 3.10.0+ #49
[   97.173684] task: 880235d14a40 ti: 880235dae000 task.ti: 
880235dae000
[   97.173701] RIP: 0010:[]  [] 
assfail+0x22/0x30 [xfs]
[   97.173702] RSP: 0018:880235dafa98  EFLAGS: 00010296
[   97.173703] RAX: 0060 RBX: 880235dafb20 RCX: 0006
[   97.173703] RDX: 31e0 RSI: 880235d152a0 RDI: 880235d14a40
[   97.173704] RBP: 880235dafa98 R08: 0002 R09: 
[   97.173704] R10: 0001 R11: 0001 R12: 880235e54000
[   97.173705] R13: 00035e92 R14: 880235dafc88 R15: 880235dafc28
[   97.173706] FS:  7fa6aac79840() GS:880244e0() 
knlGS:
[   97.173706] CS:  0010 DS:  ES:  CR0: 80050033
[   97.173707] CR2: 7fffb1905f18 CR3: 000235ce5000 CR4: 001407e0
[   97.173707] DR0:  DR1:  DR2: 
[   97.173708] DR3:  DR6: fffe0ff0 DR7: 0400
[   97.173708] Stack:
[   97.173712]  880235dafb00 a0076953 e8c066a0 
880240e04ca8
[   97.173713]  0250 0020 88023d05af40 
880235dafb20
[   97.173715]  880235a14a40 880235e54000 00035e92 
880235dafc88
[   97.173715] Call Trace:
[   97.173730]  [] xfs_dir2_sf_addname+0x43/0x760 [xfs]
[   97.173743]  [] xfs_dir_createname+0x15c/0x1b0 [xfs]
[   97.173754]  [] xfs_create+0x4cc/0x710 [xfs]
[   97.173764]  [] xfs_vn_mknod+0x9a/0x1c0 [xfs]
[   97.173773]  [] xfs_vn_create+0x13/0x20 [xfs]
[   97.173776]  [] vfs_create+0x9d/0x100
[   97.173778]  [] do_last+0x925/0xe00
[   97.173780]  [] path_openat+0xbe/0x6f0
[   97.173783]  [] ? local_clock+0x3f/0x50
[   97.173785]  [] ? __alloc_fd+0xaf/0x200
[   97.173787]  [] do_filp_open+0x3a/0x90
[   97.173789]  [] ? __alloc_fd+0xaf/0x200
[   97.173790]  [] do_sys_open+0x10b/0x200
[   97.173792]  [] ? syscall_trace_enter+0x18/0x290
[   97.173794]  [] SyS_open+0x1e/0x20
[   97.173796]  [] tracesys+0xdd/0xe2
[   97.173813] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 
89 d0 48 89 e5 48 89 fa 48 c7 c6 a8 98 0b a0 31 ff 31 c0 e8 ce fb ff ff <0f> 0b 
66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 

This trace repeated a few times, then the same assertion was triggered from 
sys_renameat.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: fs/xfs/xfs_dir2_sf.c, line: 358

2013-07-11 Thread Dave Jones
Just saw this during boot after an unclean shutdown. It hung afterwards.

[   97.162665] XFS: Assertion failed: xfs_dir2_sf_lookup(args) == ENOENT, file: 
fs/xfs/xfs_dir2_sf.c, line: 358
[   97.164646] [ cut here ]
[   97.165312] kernel BUG at fs/xfs/xfs_message.c:108!
[   97.165986] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   97.166947] Modules linked in: snd_hda_codec_realtek pcspkr usb_debug 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e ptp 
snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[   97.170782] CPU: 3 PID: 341 Comm: rpcbind Not tainted 3.10.0+ #49
[   97.173684] task: 880235d14a40 ti: 880235dae000 task.ti: 
880235dae000
[   97.173701] RIP: 0010:[a002a362]  [a002a362] 
assfail+0x22/0x30 [xfs]
[   97.173702] RSP: 0018:880235dafa98  EFLAGS: 00010296
[   97.173703] RAX: 0060 RBX: 880235dafb20 RCX: 0006
[   97.173703] RDX: 31e0 RSI: 880235d152a0 RDI: 880235d14a40
[   97.173704] RBP: 880235dafa98 R08: 0002 R09: 
[   97.173704] R10: 0001 R11: 0001 R12: 880235e54000
[   97.173705] R13: 00035e92 R14: 880235dafc88 R15: 880235dafc28
[   97.173706] FS:  7fa6aac79840() GS:880244e0() 
knlGS:
[   97.173706] CS:  0010 DS:  ES:  CR0: 80050033
[   97.173707] CR2: 7fffb1905f18 CR3: 000235ce5000 CR4: 001407e0
[   97.173707] DR0:  DR1:  DR2: 
[   97.173708] DR3:  DR6: fffe0ff0 DR7: 0400
[   97.173708] Stack:
[   97.173712]  880235dafb00 a0076953 e8c066a0 
880240e04ca8
[   97.173713]  0250 0020 88023d05af40 
880235dafb20
[   97.173715]  880235a14a40 880235e54000 00035e92 
880235dafc88
[   97.173715] Call Trace:
[   97.173730]  [a0076953] xfs_dir2_sf_addname+0x43/0x760 [xfs]
[   97.173743]  [a0067cfc] xfs_dir_createname+0x15c/0x1b0 [xfs]
[   97.173754]  [a002f2dc] xfs_create+0x4cc/0x710 [xfs]
[   97.173764]  [a00278ca] xfs_vn_mknod+0x9a/0x1c0 [xfs]
[   97.173773]  [a0027a03] xfs_vn_create+0x13/0x20 [xfs]
[   97.173776]  [811d100d] vfs_create+0x9d/0x100
[   97.173778]  [811d1995] do_last+0x925/0xe00
[   97.173780]  [811d1f2e] path_openat+0xbe/0x6f0
[   97.173783]  [8109e33f] ? local_clock+0x3f/0x50
[   97.173785]  [811e1b5f] ? __alloc_fd+0xaf/0x200
[   97.173787]  [811d2c3a] do_filp_open+0x3a/0x90
[   97.173789]  [811e1b5f] ? __alloc_fd+0xaf/0x200
[   97.173790]  [811c0ddb] do_sys_open+0x10b/0x200
[   97.173792]  [81010578] ? syscall_trace_enter+0x18/0x290
[   97.173794]  [811c0eee] SyS_open+0x1e/0x20
[   97.173796]  [81717c14] tracesys+0xdd/0xe2
[   97.173813] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 
89 d0 48 89 e5 48 89 fa 48 c7 c6 a8 98 0b a0 31 ff 31 c0 e8 ce fb ff ff 0f 0b 
66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 

This trace repeated a few times, then the same assertion was triggered from 
sys_renameat.

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/