Public bug reported:
Hi,
We have a server (a swift storage node) where a single XFS filesystem
repeatedly becomes unusable (the 5 others are fine).
You'll find kern.log messages at the end of this message.
We can reproduce the problem easily : after an error occurs, we just
need to umount/mount the filesystem. XFS will replay the log
successfully and mount the filesystem. Then the swift processes will
notice that the filesystem is back and will start sync'ing some data to
it (not much, because the filesystem is full - swift will remove/add
data to it) and then in ~1 or 2 minutes, the error pasted below will
fire, and then the filesystem becomes unusable :
$ ls /srv/node/sdg
ls: cannot access /srv/node/sdg: Input/output error
It is still mounted though.
A few observations :
- it's always an rsync process that will trigger the error (probably expected,
since most (if not all) write operations on the disk are done by rsync)
- it's always on an mkdir()
The "rsync" process (PID 15036) that crashed in the trace below was
running under strace -e mkdir, you will find the output of that command
below.
I tried reproducing the problem by repeating the "mkdir" after an
umount/mount, but I'm unable to do so.
I believe the fact that the filesystem is nearly full is what causes
this bug, which we're not seeing on other disks on this machine. We have
the same problem on another disk on another machine (running
4.4.0-42-generic - even though, as I reconnect to it to grab the kernel
version, it appears that the filesystem has managed to stay up for ~20
min now).
The question is : what information can we provide to help tracking this
bug down ? I tried investigating with blktrace and fatrace, but that
didn't really help as I'm still unable to reproduce the problem on my
own.
Thanks !
== kernel trace ==
Jun 30 10:35:05 cawesh kernel: [29510.321474] XFS (sdg): Internal error
xfs_trans_cancel at line 990 of file
/build/linux-lts-xenial-ep3zLI/linux-lts-xenial-4.4.0/fs/xfs/xfs_trans.c.
Caller xfs_create+0x4c6/0x6c0 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.345888] CPU: 3 PID: 15036 Comm: rsync
Tainted: G I 4.4.0-83-generic #106~14.04.1-Ubuntu
Jun 30 10:35:05 cawesh kernel: [29510.345891] Hardware name: HP ProLiant DL380
G6, BIOS P62 08/16/2015
Jun 30 10:35:05 cawesh kernel: [29510.345892] 0000000000000000
ffff8802627a3d30 ffffffff813ddefc ffff88058d91b790
Jun 30 10:35:05 cawesh kernel: [29510.345894] 0000000000000001
ffff8802627a3d48 ffffffffc03b095b ffffffffc03c0216
Jun 30 10:35:05 cawesh kernel: [29510.345896] ffff8802627a3d70
ffffffffc03cb166 ffff8805b432aec0 ffff88010444b800
Jun 30 10:35:05 cawesh kernel: [29510.345898] Call Trace:
Jun 30 10:35:05 cawesh kernel: [29510.345903] [<ffffffff813ddefc>]
dump_stack+0x63/0x87
Jun 30 10:35:05 cawesh kernel: [29510.345938] [<ffffffffc03b095b>]
xfs_error_report+0x3b/0x40 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.345955] [<ffffffffc03c0216>] ?
xfs_create+0x4c6/0x6c0 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.345974] [<ffffffffc03cb166>]
xfs_trans_cancel+0xb6/0xe0 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.345991] [<ffffffffc03c0216>]
xfs_create+0x4c6/0x6c0 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.346009] [<ffffffffc03bc993>]
xfs_generic_create+0xc3/0x290 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.346027] [<ffffffffc03bcb94>]
xfs_vn_mknod+0x14/0x20 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.346044] [<ffffffffc03bcbb6>]
xfs_vn_mkdir+0x16/0x20 [xfs]
Jun 30 10:35:05 cawesh kernel: [29510.346047] [<ffffffff8120d3e9>]
vfs_mkdir+0x109/0x1b0
Jun 30 10:35:05 cawesh kernel: [29510.346049] [<ffffffff812119c2>]
SyS_mkdir+0xb2/0xe0
Jun 30 10:35:05 cawesh kernel: [29510.346051] [<ffffffff8180c9ae>]
tracesys_phase2+0x88/0x8d
Jun 30 10:35:05 cawesh kernel: [29510.346055] XFS (sdg):
xfs_do_force_shutdown(0x8) called from line 991 of file
/build/linux-lts-xenial-ep3zLI/linux-lts-xenial-4.4.0/fs/xfs/xfs_trans.c.
Return address = 0xffffffffc03cb17f
Jun 30 10:35:05 cawesh kernel: [29510.346385] XFS (sdg): Corruption of
in-memory data detected. Shutting down filesystem
Jun 30 10:35:05 cawesh kernel: [29510.346671] XFS (sdg): xfs_imap_to_bp:
xfs_trans_read_buf() returned error -5.
Jun 30 10:35:06 cawesh kernel: [29510.356952] XFS (sdg): Please umount the
filesystem and rectify the problem(s)
Jun 30 10:35:06 cawesh kernel: [29510.362245] XFS (sdg): xfs_imap_to_bp:
xfs_trans_read_buf() returned error -5.
Jun 30 10:35:06 cawesh kernel: [29510.389098] XFS (sdg): xfs_imap_to_bp:
xfs_trans_read_buf() returned error -5.
Jun 30 10:35:32 cawesh kernel: [29536.602241] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:36:02 cawesh kernel: [29566.680878] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:36:32 cawesh kernel: [29596.759510] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:37:02 cawesh kernel: [29626.838049] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:37:32 cawesh kernel: [29656.920612] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:38:02 cawesh kernel: [29686.995254] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:38:32 cawesh kernel: [29717.073860] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:39:02 cawesh kernel: [29747.152459] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:39:32 cawesh kernel: [29777.231115] XFS (sdg): xfs_log_force: error
-5 returned.
Jun 30 10:39:39 cawesh kernel: [29784.127051] XFS (sdg): xfs_log_force: error
-5 returned.
== strace -e mkdir of the rsync process above ==
15036 mkdir("/sdg/objects/207716", 0777 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("15f", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("201", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("254", 0700) = 0
15036 mkdir("282", 0700) = 0
15036 mkdir("365", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("47d", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("4a2", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("4b5", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("4e7", 0700) = 0
15036 mkdir("5cf", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("5ed", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("620", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = 0
15036 mkdir("641", 0700 <unfinished ...>
15036 <... mkdir resumed> ) = -1 ENOSPC (No space left on device)
15036 mkdir("663", 0700) = -1 EIO (Input/output error)
15036 mkdir("695", 0700) = -1 EIO (Input/output error)
15036 mkdir("6aa", 0700) = -1 EIO (Input/output error)
15036 mkdir("74f", 0700) = -1 EIO (Input/output error)
15036 mkdir("7fa", 0700) = -1 EIO (Input/output error)
15036 mkdir("81c", 0700) = -1 EIO (Input/output error)
15036 mkdir("849", 0700) = -1 EIO (Input/output error)
15036 mkdir("859", 0700) = -1 EIO (Input/output error)
15036 mkdir("9d3", 0700) = -1 EIO (Input/output error)
15036 mkdir("a47", 0700) = -1 EIO (Input/output error)
15036 mkdir("a82", 0700) = -1 EIO (Input/output error)
15036 mkdir("a9d", 0700) = -1 EIO (Input/output error)
15036 mkdir("c78", 0700) = -1 EIO (Input/output error)
15036 mkdir("ce7", 0700) = -1 EIO (Input/output error)
15036 mkdir("d03", 0700) = -1 EIO (Input/output error)
15036 mkdir("d3f", 0700) = -1 EIO (Input/output error)
15036 mkdir("e3f", 0700) = -1 EIO (Input/output error)
15036 mkdir("ee9", 0700) = -1 EIO (Input/output error)
15036 mkdir("f19", 0700) = -1 EIO (Input/output error)
15036 mkdir("f9a", 0700) = -1 EIO (Input/output error)
15036 mkdir("15f", 040755) = -1 EEXIST (File exists)
15036 mkdir("15f/cad923122b5b596c7bddab4d3076415f", 0700) = -1 EIO
(Input/output error)
15036 mkdir("15f/cad923122b5b596c7bddab4d3076415f", 040755) = -1 EIO
(Input/output error)
15036 mkdir("201", 040755) = -1 EEXIST (File exists)
15036 mkdir("201/cad929a69d50826c89cd552a1727e201", 0700) = -1 EIO
(Input/output error)
15036 mkdir("201/cad929a69d50826c89cd552a1727e201", 040755) = -1 EIO
(Input/output error)
15036 mkdir("254", 040755) = -1 EEXIST (File exists)
15036 mkdir("254/cad905e97142eec0cca1ebb35156e254", 0700) = -1 EIO
(Input/output error)
15036 mkdir("254/cad905e97142eec0cca1ebb35156e254", 040755) = -1 EIO
(Input/output error)
15036 mkdir("282", 040755) = -1 EEXIST (File exists)
15036 mkdir("282/cad92f4db09debbb03982be4a8d91282", 0700) = -1 EIO
(Input/output error)
15036 mkdir("282/cad92f4db09debbb03982be4a8d91282", 040755) = -1 EIO
(Input/output error)
15036 mkdir("365", 040755) = -1 EEXIST (File exists)
15036 mkdir("365/cad913ba60589d9abbc0c89610076365", 0700) = -1 EIO
(Input/output error)
15036 mkdir("365/cad913ba60589d9abbc0c89610076365", 040755) = -1 EIO
(Input/output error)
15036 mkdir("47d", 040755) = -1 EEXIST (File exists)
15036 mkdir("47d/cad918b63ace0228f67afafd7975147d", 0700) = -1 EIO
(Input/output error)
15036 mkdir("47d/cad918b63ace0228f67afafd7975147d", 040755) = -1 EIO
(Input/output error)
15036 mkdir("4a2", 040755) = -1 EEXIST (File exists)
15036 mkdir("4a2/cad904dcd84ef22d04d8a28d48edb4a2", 0700) = -1 EIO
(Input/output error)
15036 mkdir("4a2/cad904dcd84ef22d04d8a28d48edb4a2", 040755) = -1 EIO
(Input/output error)
15036 mkdir("4b5", 040755) = -1 EEXIST (File exists)
15036 mkdir("4b5/cad906e15ef0b16971cc5d2a852a74b5", 0700) = -1 EIO
(Input/output error)
15036 mkdir("4b5/cad906e15ef0b16971cc5d2a852a74b5", 040755) = -1 EIO
(Input/output error)
15036 mkdir("4e7", 040755) = -1 EEXIST (File exists)
15036 mkdir("4e7/cad90235a9717d7dc2c71643961694e7", 0700) = -1 EIO
(Input/output error)
15036 mkdir("4e7/cad90235a9717d7dc2c71643961694e7", 040755) = -1 EIO
(Input/output error)
15036 mkdir("5cf", 040755) = -1 EEXIST (File exists)
15036 mkdir("5cf/cad92538caf35dc41b708c7d90c8e5cf", 0700) = -1 EIO
(Input/output error)
15036 mkdir("5cf/cad92538caf35dc41b708c7d90c8e5cf", 040755) = -1 EIO
(Input/output error)
15036 mkdir("5ed", 040755) = -1 EEXIST (File exists)
15036 mkdir("5ed/cad9147196d3644876dba2fbdf2205ed", 0700) = -1 EIO
(Input/output error)
15036 mkdir("5ed/cad9147196d3644876dba2fbdf2205ed", 040755) = -1 EIO
(Input/output error)
15036 mkdir("620", 040755) = -1 EEXIST (File exists)
15036 mkdir("620/cad932e8be8f62fd6eafa510b9be6620", 0700) = -1 EIO
(Input/output error)
15036 mkdir("620/cad932e8be8f62fd6eafa510b9be6620", 040755) = -1 EIO
(Input/output error)
15036 mkdir("641", 040755) = -1 EIO (Input/output error)
15036 mkdir("663", 040755) = -1 EIO (Input/output error)
15036 mkdir("695", 040755) = -1 EIO (Input/output error)
15036 mkdir("6aa", 040755) = -1 EIO (Input/output error)
15036 mkdir("74f", 040755) = -1 EIO (Input/output error)
15036 mkdir("7fa", 040755) = -1 EIO (Input/output error)
15036 mkdir("81c", 040755) = -1 EIO (Input/output error)
15036 mkdir("849", 040755) = -1 EIO (Input/output error)
15036 mkdir("859", 040755) = -1 EIO (Input/output error)
15036 mkdir("9d3", 040755) = -1 EIO (Input/output error)
15036 mkdir("a47", 040755) = -1 EIO (Input/output error)
15036 mkdir("a82", 040755) = -1 EIO (Input/output error)
15036 mkdir("a9d", 040755) = -1 EIO (Input/output error)
15036 mkdir("c78", 040755) = -1 EIO (Input/output error)
15036 mkdir("ce7", 040755) = -1 EIO (Input/output error)
15036 mkdir("d03", 040755) = -1 EIO (Input/output error)
15036 mkdir("d3f", 040755) = -1 EIO (Input/output error)
15036 mkdir("e3f", 040755) = -1 EIO (Input/output error)
15036 mkdir("ee9", 040755) = -1 EIO (Input/output error)
15036 mkdir("f19", 040755) = -1 EIO (Input/output error)
15036 mkdir("f9a", 040755) = -1 EIO (Input/output error)
15046 --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=15036, si_uid=106}
---
15036 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15046,
si_status=0, si_utime=0, si_stime=0} ---
15036 +++ exited with 0 +++
3244 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15036,
si_status=0, si_utime=0, si_stime=2} ---
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-4.4.0-83-generic 4.4.0-83.106~14.04.1
ProcVersionSignature: Ubuntu 4.4.0-83.106~14.04.1-generic 4.4.70
Uname: Linux 4.4.0-83-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.24
Architecture: amd64
Date: Fri Jun 30 11:35:33 2017
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-lts-xenial
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: linux-lts-xenial (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug third-party-packages trusty uec-images
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1701542
Title:
xfs_trans_cancel on mkdir()
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1701542/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs