Re: [PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption

2018-09-22 Thread Ben Hutchings
On Sat, 2018-09-22 at 15:25 +1000, Dave Chinner wrote:
> On Sat, Sep 22, 2018 at 01:15:42AM +0100, Ben Hutchings wrote:
> > 3.16.58-rc1 review patch.  If anyone has any objections, please let
> > me know.
> > 
> > --
> > 
> > From: Dave Chinner 
> > 
> > commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream.
> > 
> > We recently came across a V4 filesystem causing memory corruption
> > due to a newly allocated inode being setup twice and being added to
> > the superblock inode list twice. From code inspection, the only way
> > this could happen is if a newly allocated inode was not marked as
> > free on disk (i.e. di_mode wasn't zero).
> 
> 
> > Signed-Off-By: Dave Chinner 
> > Reviewed-by: Carlos Maiolino 
> > Tested-by: Carlos Maiolino 
> > Reviewed-by: Darrick J. Wong 
> > Signed-off-by: Darrick J. Wong 
> > [bwh: Backported to 3.16:
> >  - Look up mode in XFS inode, not VFS inode
> >  - Use positive error codes, and EIO instead of EFSCORRUPTED]
> 
> Why EIO?

I believe EIO was the usual error code used for filesystem errors
before EFSCORRUPTED was added.  But now I see xfs had its own private
definition of EFSCORRUPTED.  I'll change this back.

Ben.

-- 
Ben Hutchings
Any sufficiently advanced bug is indistinguishable from a feature.




signature.asc
Description: This is a digitally signed message part


Re: [PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption

2018-09-21 Thread Dave Chinner
On Sat, Sep 22, 2018 at 01:15:42AM +0100, Ben Hutchings wrote:
> 3.16.58-rc1 review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Dave Chinner 
> 
> commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream.
> 
> We recently came across a V4 filesystem causing memory corruption
> due to a newly allocated inode being setup twice and being added to
> the superblock inode list twice. From code inspection, the only way
> this could happen is if a newly allocated inode was not marked as
> free on disk (i.e. di_mode wasn't zero).

> Signed-Off-By: Dave Chinner 
> Reviewed-by: Carlos Maiolino 
> Tested-by: Carlos Maiolino 
> Reviewed-by: Darrick J. Wong 
> Signed-off-by: Darrick J. Wong 
> [bwh: Backported to 3.16:
>  - Look up mode in XFS inode, not VFS inode
>  - Use positive error codes, and EIO instead of EFSCORRUPTED]

Why EIO?

Cheers,

Dave.
-- 
Dave Chinner
dchin...@redhat.com


[PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption

2018-09-21 Thread Ben Hutchings
3.16.58-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Dave Chinner 

commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream.

We recently came across a V4 filesystem causing memory corruption
due to a newly allocated inode being setup twice and being added to
the superblock inode list twice. From code inspection, the only way
this could happen is if a newly allocated inode was not marked as
free on disk (i.e. di_mode wasn't zero).

Running the metadump on an upstream debug kernel fails during inode
allocation like so:

XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
e.c, line: 838
 [ cut here ]
kernel BUG at fs/xfs/xfs_message.c:114!
invalid opcode:  [#1] PREEMPT SMP
CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
1/2014
RIP: 0010:assfail+0x28/0x30
RSP: 0018:c9000236fc80 EFLAGS: 00010202
RAX: ffea RBX: 4000 RCX: 
RDX: ffc0 RSI: 000a RDI: 8227211b
RBP: c9000236fce8 R08:  R09: 
R10: 0bec R11: f000 R12: c9000236fd30
R13: 8805c76bab80 R14: 8805c77ac800 R15: 88083fb12e10
FS:  7fac8cbff040() GS:88083fd0() knlGS:0=
000
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fffa6783ff8 CR3: 0005c6e2b003 CR4: 000606e0
Call Trace:
 xfs_ialloc+0x383/0x570
 xfs_dir_ialloc+0x6a/0x2a0
 xfs_create+0x412/0x670
 xfs_generic_create+0x1f7/0x2c0
 ? capable_wrt_inode_uidgid+0x3f/0x50
 vfs_mkdir+0xfb/0x1b0
 SyS_mkdir+0xcf/0xf0
 do_syscall_64+0x73/0x1a0
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Extracting the inode number we crashed on from an event trace and
looking at it with xfs_db:

xfs_db> inode 184452204
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
.

Confirms that it is not a free inode on disk. xfs_repair
also trips over this inode:

.
zero length extent (off = 0, fsbno = 0) in ino 184452204
correcting nextents for inode 184452204
bad attribute fork in inode 184452204, would clear attr fork
bad nblocks 1 for inode 184452204, would reset to 0
bad anextents 1 for inode 184452204, would reset to 0
imap claims in-use inode 184452204 is free, would correct imap
would have cleared inode 184452204
.
disconnected inode 184452204, would move to lost+found

And so we have a situation where the directory structure and the
inobt thinks the inode is free, but the inode on disk thinks it is
still in use. Where this corruption came from is not possible to
diagnose, but we can detect it and prevent the kernel from oopsing
on lookup. The reproducer now results in:

$ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
re needs cleaning
mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
utput error
mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
utput error


And this corruption shutdown:

[   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
 marked free on disk
[   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
[   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
443
[   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
S 1.10.2-1 04/01/2014
[   54.852859] Call Trace:
[   54.853531]  dump_stack+0x85/0xc5
[   54.854385]  xfs_trans_cancel+0x197/0x1c0
[   54.855421]  xfs_create+0x425/0x670
[   54.856314]  xfs_generic_create+0x1f7/0x2c0
[   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
[   54.858586]  vfs_mkdir+0xfb/0x1b0
[   54.859458]  SyS_mkdir+0xcf/0xf0
[   54.860254]  do_syscall_64+0x73/0x1a0
[   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   54.862492] RIP: 0033:0x7fb73bddf547
[   54.863358] RSP: 002b:7ffdaa553338 EFLAGS: 0246 ORIG_RAX: =
0053
[   54.865133] RAX: ffda RBX: 7ffdaa55449a RCX: 7fb73=
bddf547
[   54.866766] RDX: 0001 RSI: 01ff RDI: 7ffda=
a55449a
[   54.868432] RBP: 7ffdaa55449a R08: 01ff R09: 5623a=
8670dd0
[   54.870110] R10: 7fb73be72d5b R11: 0246 R12: 0=
1ff
[   54.871752] R13: 7ffdaa5534b0 R14:  R15: 7ffda=
a553500
[   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
024 of file fs/xfs/xfs_trans.c.  Return address = 814cd050
[   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutt=
ing down filesystem
[   54.884597]