Some updates which might explain the failure I've seen.

During december (about a month after my post) our server shut itself
down two times at night. Furthermore, nfs-kernel-server disappeared each
day. While we were analyzing this, the server finally shutdown again one
night in january and didn't start up anymore.

Supermicro engineers concluded the motherboard or power distribution
unit (it has two redundant power units) had defects. After replacing the
case with pdu (supermicro SC216) and motherboard (supermicro x7dbu) we
seemed to have solved the problems until the nfs-kernel-server processes
started to disappear again on a daily basis without a clue.

After moving the nfs server services to another host (with intrepid
server amd64) and mounting the problematic lvm2 logical volume through
aoe on the new host, we saw a host of other nfs and xfs-related problems
(see below).

After stopping nfsd and umounting the filesystem, I was instructed to
run `xfs_repair -L /dev/etherd/e5.1` to clear the logs (this showed
numerous errors), which I did and remounted the filesystem (without
errors). `xfs_check /dev/etherd/e5.1` did find countless problems
however.

My guess is that the filesystem got corrupted during the hardware
failures. I will try to make a fresh backup of the data on the
filesystem and run `xfs_repair` on it; when that fails I will remake the
filesystem and restore the data on it to see if this finally fixes
things. I'll post updates in this thread.

BTW: 
 * in a thread on the xfs mailinglist 
(http://oss.sgi.com/archives/xfs/2009-02/msg00058.html) a similar problem with 
Intrepid (2.6.27-9-server) is discussed.
 * nfs-kernel-server bug #181996 seemed related before I discovered the 
filesystem corruption

----

syslog on new nfs-kernel host:

Feb  6 17:53:49 srv-twin3-2 kernel: [2689800.846117] XFS mounting filesystem 
etherd/e5.1
Feb  6 17:53:49 srv-twin3-2 kernel: [2689800.927304] Ending clean XFS mount for 
filesystem: etherd/e5.1
Feb  6 17:55:29 srv-twin3-2 kernel: [2689901.054444] aoe: 001517766839 e5.1 
v4010 has 2147483648 sectors
Feb  6 17:56:13 srv-twin3-2 kernel: [2689944.367616] XFS mounting filesystem 
etherd/e5.1
Feb  6 17:56:13 srv-twin3-2 kernel: [2689944.508963] Starting XFS recovery on 
filesystem: etherd/e5.1 (logdev: internal)
Feb  6 17:56:13 srv-twin3-2 kernel: [2689944.847588] XFS resetting qflags for 
filesystem etherd/e5.1
Feb  6 17:56:20 srv-twin3-2 kernel: [2689951.167148] Ending XFS recovery on 
filesystem: etherd/e5.1 (logdev: internal)
Feb  6 17:56:54 srv-twin3-2 kernel: [2689985.890915] Installing knfsd 
(copyright (C) 1996 [email protected]).
Feb  6 17:56:54 srv-twin3-2 kernel: [2689985.936999] NFSD: Using 
/var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Feb  6 17:56:54 srv-twin3-2 kernel: [2689985.937015] NFSD: starting 90-second 
grace period
Feb  6 18:00:01 srv-twin3-2 /USR/SBIN/CRON[25657]: (root) CMD ([ -x 
/usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Feb  6 18:01:46 srv-twin3-2 mountd[25626]: Caught signal 15, un-registering and 
exiting.
Feb  6 18:01:46 srv-twin3-2 kernel: [2690277.358093] nfsd: last server has 
exited, flushing export cache
Feb  6 18:01:49 srv-twin3-2 kernel: [2690280.391320] NFSD: Using 
/var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Feb  6 18:01:49 srv-twin3-2 kernel: [2690280.391336] NFSD: starting 90-second 
grace period
Feb  6 18:03:47 srv-twin3-2 mountd[25808]: authenticated mount request from 
172.17.x.x:994 for /data/home (/data)
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133084] XFS internal error 
XFS_WANT_CORRUPTED_GOTO at line 1590 of file 
/build/buildd/linux-2.6.27/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa020ee92
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133149] Pid: 25783, comm: nfsd Not 
tainted 2.6.27-9-generic #1
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133151] 
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133152] Call Trace:
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133189]  [<ffffffffa0236b33>] 
xfs_error_report+0x43/0x50 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133210]  [<ffffffffa020ee92>] ? 
xfs_free_extent+0xb2/0xe0 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133225]  [<ffffffffa020d3f3>] 
xfs_free_ag_extent+0x5f3/0x6f0 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133240]  [<ffffffffa020ee92>] 
xfs_free_extent+0xb2/0xe0 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133256]  [<ffffffffa021c706>] 
xfs_bmap_finish+0x156/0x1a0 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133274]  [<ffffffffa0241b76>] 
xfs_itruncate_finish+0x146/0x330 [xfs]
Feb  6 18:03:53 srv-twin3-2 kernel: [2690405.133291]  [<ffffffffa025e166>] 
xfs_inactive+0x386/0x4b0 [xfs]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133308]  [<ffffffffa026b18f>] 
xfs_fs_clear_inode+0xcf/0x120 [xfs]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133316]  [<ffffffff80301c6f>] 
clear_inode+0x8f/0x110
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133320]  [<ffffffff8030250f>] 
generic_delete_inode+0x16f/0x180
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133323]  [<ffffffff80302545>] 
generic_drop_inode+0x25/0x30
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133327]  [<ffffffff803011f2>] 
iput+0x62/0x70
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133329]  [<ffffffff802fdf10>] 
dentry_iput+0x90/0xe0
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133333]  [<ffffffff802ffcf0>] 
d_delete+0xf0/0x100
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133336]  [<ffffffff802f3dd4>] 
vfs_unlink+0x124/0x130
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133349]  [<ffffffffa05e9b74>] 
nfsd_unlink+0x214/0x2d0 [nfsd]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133361]  [<ffffffffa05f112c>] 
nfsd3_proc_remove+0x7c/0xe0 [nfsd]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133371]  [<ffffffffa05e2293>] 
nfsd_dispatch+0xc3/0x270 [nfsd]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133392]  [<ffffffffa04b0f2c>] 
svc_process+0x47c/0x780 [sunrpc]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133403]  [<ffffffffa05e2ae5>] 
nfsd+0x1c5/0x2e0 [nfsd]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133412]  [<ffffffffa05e2920>] ? 
nfsd+0x0/0x2e0 [nfsd]
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133417]  [<ffffffff80266c1e>] 
kthread+0x4e/0x90
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133420]  [<ffffffff80213c99>] 
child_rip+0xa/0x11
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133424]  [<ffffffff80266bd0>] ? 
kthread+0x0/0x90
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133426]  [<ffffffff80213c8f>] ? 
child_rip+0x0/0x11
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133428] 
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.133434] 
xfs_force_shutdown(etherd/e5.1,0x8) called from line 4269 of file 
/build/buildd/linux-2.6.27/fs/xfs/xfs_bmap.c.  Return address = 
0xffffffffa021c744
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.139834] Filesystem "etherd/e5.1": 
Corruption of in-memory data detected.  Shutting down filesystem: etherd/e5.1
Feb  6 18:03:54 srv-twin3-2 kernel: [2690405.139887] Please umount the 
filesystem, and rectify the problem(s)
Feb  6 18:03:55 srv-twin3-2 kernel: [2690406.236012] Filesystem "etherd/e5.1": 
xfs_log_force: error 5 returned.
Feb  6 18:03:55 srv-twin3-2 kernel: [2690406.380379] nfsd: non-standard errno: 5
[repeated 40 times]
Feb  6 18:03:55 srv-twin3-2 kernel: [2690406.380379] nfsd: non-standard errno: 5
Feb  6 18:04:08 srv-twin3-2 kernel: [2690419.300014] Filesystem "etherd/e5.1": 
xfs_log_force: error 5 returned.
Feb  6 18:04:08 srv-twin3-2 kernel: [2690419.868102] nfsd: non-standard errno: 5
Feb  6 18:04:09 srv-twin3-2 mountd[25808]: authenticated mount request from 
172.17.1.8:772 for /data/servers/moodle (/data)
Feb  6 18:04:09 srv-twin3-2 mountd[25808]: can't stat exported dir 
/data/servers/moodle: Input/output error
Feb  6 18:04:09 srv-twin3-2 kernel: [2690420.868012] nfsd: non-standard errno: 5
[repeated many ~80 times]
Feb  6 18:04:43 srv-twin3-2 kernel: [2690454.868032] nfsd: non-standard errno: 5
Feb  6 18:04:44 srv-twin3-2 kernel: [2690455.300015] Filesystem "etherd/e5.1": 
xfs_log_force: error 5 returned.
Feb  6 18:04:44 srv-twin3-2 kernel: [2690455.870547] nfsd: non-standard errno: 5

-- 
XFS internal error xfs_trans_cancel at line 1164 of file 
/build/buildd/linux-2.6.27/fs/xfs/xfs_trans.c
https://bugs.launchpad.net/bugs/294259
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to