Some updates which might explain the failure I've seen. During december (about a month after my post) our server shut itself down two times at night. Furthermore, nfs-kernel-server disappeared each day. While we were analyzing this, the server finally shutdown again one night in january and didn't start up anymore.
Supermicro engineers concluded the motherboard or power distribution unit (it has two redundant power units) had defects. After replacing the case with pdu (supermicro SC216) and motherboard (supermicro x7dbu) we seemed to have solved the problems until the nfs-kernel-server processes started to disappear again on a daily basis without a clue. After moving the nfs server services to another host (with intrepid server amd64) and mounting the problematic lvm2 logical volume through aoe on the new host, we saw a host of other nfs and xfs-related problems (see below). After stopping nfsd and umounting the filesystem, I was instructed to run `xfs_repair -L /dev/etherd/e5.1` to clear the logs (this showed numerous errors), which I did and remounted the filesystem (without errors). `xfs_check /dev/etherd/e5.1` did find countless problems however. My guess is that the filesystem got corrupted during the hardware failures. I will try to make a fresh backup of the data on the filesystem and run `xfs_repair` on it; when that fails I will remake the filesystem and restore the data on it to see if this finally fixes things. I'll post updates in this thread. BTW: * in a thread on the xfs mailinglist (http://oss.sgi.com/archives/xfs/2009-02/msg00058.html) a similar problem with Intrepid (2.6.27-9-server) is discussed. * nfs-kernel-server bug #181996 seemed related before I discovered the filesystem corruption ---- syslog on new nfs-kernel host: Feb 6 17:53:49 srv-twin3-2 kernel: [2689800.846117] XFS mounting filesystem etherd/e5.1 Feb 6 17:53:49 srv-twin3-2 kernel: [2689800.927304] Ending clean XFS mount for filesystem: etherd/e5.1 Feb 6 17:55:29 srv-twin3-2 kernel: [2689901.054444] aoe: 001517766839 e5.1 v4010 has 2147483648 sectors Feb 6 17:56:13 srv-twin3-2 kernel: [2689944.367616] XFS mounting filesystem etherd/e5.1 Feb 6 17:56:13 srv-twin3-2 kernel: [2689944.508963] Starting XFS recovery on filesystem: etherd/e5.1 (logdev: internal) Feb 6 17:56:13 srv-twin3-2 kernel: [2689944.847588] XFS resetting qflags for filesystem etherd/e5.1 Feb 6 17:56:20 srv-twin3-2 kernel: [2689951.167148] Ending XFS recovery on filesystem: etherd/e5.1 (logdev: internal) Feb 6 17:56:54 srv-twin3-2 kernel: [2689985.890915] Installing knfsd (copyright (C) 1996 [email protected]). Feb 6 17:56:54 srv-twin3-2 kernel: [2689985.936999] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Feb 6 17:56:54 srv-twin3-2 kernel: [2689985.937015] NFSD: starting 90-second grace period Feb 6 18:00:01 srv-twin3-2 /USR/SBIN/CRON[25657]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null) Feb 6 18:01:46 srv-twin3-2 mountd[25626]: Caught signal 15, un-registering and exiting. Feb 6 18:01:46 srv-twin3-2 kernel: [2690277.358093] nfsd: last server has exited, flushing export cache Feb 6 18:01:49 srv-twin3-2 kernel: [2690280.391320] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Feb 6 18:01:49 srv-twin3-2 kernel: [2690280.391336] NFSD: starting 90-second grace period Feb 6 18:03:47 srv-twin3-2 mountd[25808]: authenticated mount request from 172.17.x.x:994 for /data/home (/data) Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133084] XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1590 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_alloc.c. Caller 0xffffffffa020ee92 Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133149] Pid: 25783, comm: nfsd Not tainted 2.6.27-9-generic #1 Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133151] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133152] Call Trace: Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133189] [<ffffffffa0236b33>] xfs_error_report+0x43/0x50 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133210] [<ffffffffa020ee92>] ? xfs_free_extent+0xb2/0xe0 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133225] [<ffffffffa020d3f3>] xfs_free_ag_extent+0x5f3/0x6f0 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133240] [<ffffffffa020ee92>] xfs_free_extent+0xb2/0xe0 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133256] [<ffffffffa021c706>] xfs_bmap_finish+0x156/0x1a0 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133274] [<ffffffffa0241b76>] xfs_itruncate_finish+0x146/0x330 [xfs] Feb 6 18:03:53 srv-twin3-2 kernel: [2690405.133291] [<ffffffffa025e166>] xfs_inactive+0x386/0x4b0 [xfs] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133308] [<ffffffffa026b18f>] xfs_fs_clear_inode+0xcf/0x120 [xfs] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133316] [<ffffffff80301c6f>] clear_inode+0x8f/0x110 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133320] [<ffffffff8030250f>] generic_delete_inode+0x16f/0x180 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133323] [<ffffffff80302545>] generic_drop_inode+0x25/0x30 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133327] [<ffffffff803011f2>] iput+0x62/0x70 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133329] [<ffffffff802fdf10>] dentry_iput+0x90/0xe0 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133333] [<ffffffff802ffcf0>] d_delete+0xf0/0x100 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133336] [<ffffffff802f3dd4>] vfs_unlink+0x124/0x130 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133349] [<ffffffffa05e9b74>] nfsd_unlink+0x214/0x2d0 [nfsd] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133361] [<ffffffffa05f112c>] nfsd3_proc_remove+0x7c/0xe0 [nfsd] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133371] [<ffffffffa05e2293>] nfsd_dispatch+0xc3/0x270 [nfsd] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133392] [<ffffffffa04b0f2c>] svc_process+0x47c/0x780 [sunrpc] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133403] [<ffffffffa05e2ae5>] nfsd+0x1c5/0x2e0 [nfsd] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133412] [<ffffffffa05e2920>] ? nfsd+0x0/0x2e0 [nfsd] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133417] [<ffffffff80266c1e>] kthread+0x4e/0x90 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133420] [<ffffffff80213c99>] child_rip+0xa/0x11 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133424] [<ffffffff80266bd0>] ? kthread+0x0/0x90 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133426] [<ffffffff80213c8f>] ? child_rip+0x0/0x11 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133428] Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.133434] xfs_force_shutdown(etherd/e5.1,0x8) called from line 4269 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa021c744 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.139834] Filesystem "etherd/e5.1": Corruption of in-memory data detected. Shutting down filesystem: etherd/e5.1 Feb 6 18:03:54 srv-twin3-2 kernel: [2690405.139887] Please umount the filesystem, and rectify the problem(s) Feb 6 18:03:55 srv-twin3-2 kernel: [2690406.236012] Filesystem "etherd/e5.1": xfs_log_force: error 5 returned. Feb 6 18:03:55 srv-twin3-2 kernel: [2690406.380379] nfsd: non-standard errno: 5 [repeated 40 times] Feb 6 18:03:55 srv-twin3-2 kernel: [2690406.380379] nfsd: non-standard errno: 5 Feb 6 18:04:08 srv-twin3-2 kernel: [2690419.300014] Filesystem "etherd/e5.1": xfs_log_force: error 5 returned. Feb 6 18:04:08 srv-twin3-2 kernel: [2690419.868102] nfsd: non-standard errno: 5 Feb 6 18:04:09 srv-twin3-2 mountd[25808]: authenticated mount request from 172.17.1.8:772 for /data/servers/moodle (/data) Feb 6 18:04:09 srv-twin3-2 mountd[25808]: can't stat exported dir /data/servers/moodle: Input/output error Feb 6 18:04:09 srv-twin3-2 kernel: [2690420.868012] nfsd: non-standard errno: 5 [repeated many ~80 times] Feb 6 18:04:43 srv-twin3-2 kernel: [2690454.868032] nfsd: non-standard errno: 5 Feb 6 18:04:44 srv-twin3-2 kernel: [2690455.300015] Filesystem "etherd/e5.1": xfs_log_force: error 5 returned. Feb 6 18:04:44 srv-twin3-2 kernel: [2690455.870547] nfsd: non-standard errno: 5 -- XFS internal error xfs_trans_cancel at line 1164 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_trans.c https://bugs.launchpad.net/bugs/294259 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
