On Thu, 7 May 2009, Ted Creedon wrote:
Typical for 3 servers all of which hung after "ls /afs"
I niced affsd to get enough keyboard control to see that klogd and syslog-ng
are using all the cycles filling /var/log/messages with rx carps.
I think that a massive error caused by mis-keying should not hang windows
and linux clients...
Must be a different problem then, Mark reported no suspicious log entries
whatsoever.
On Thu, 7 May 2009, Simon Wilkinson wrote:
On 7 May 2009, at 21:51, Mark Henry wrote:
I ran the 'echo t' command recommended below once afs hung again. It
definitely put some output in dmesg. The only D states that were listed
were bash sessions (I think). It looks like they were sessions that I
opened after the user told me that afs was hung again. I tried to attach
the dmesg output and cmdebug output but the email was rejected because the
log files were way too big. Any ideas of what to try next? Or is there
anything in particular that I should look at in the cmdebug or dmesg
output. Thanks,
Put the files somewhere we can fetch them - either on the web, or in AFS.
Your cmdebug output really shouldn't be that long, though (you want to run
'cmdebug' _not_ 'cmdebug -long)
My CC came through fine. The dmesg output looks mangled, however, it
starts in the middle of a trace. I hope nothing vital is missing, but I
failed to find afsd, which was mildly disappointing.
I took the liberty to paste the interesting parts to
http://pastebin.com/m53578cd5. Notice the bottom, which was the original
bottom as well. Mark, you've been asked to look at dmesg before this, so I
suppose this didn't happen before you tried this call-tracing?
Besides, it would be interesting if an upgrade to 1.4.10 makes the problem
go away. Can you try that?
Finally, it looks like ls and sshd got locked up trying to determine your
client machine's home cell. I can see that happening (only) if no cell has
been set at that point. The output of fs wscell would be interesting in
this situation, but I'm not sure wether that would lock up as well (and if
it is at all helpful).
Cheers
- Felix
sshd S 000061bfbfdb6613 0 8543 1 (NOTLB)
ffff8103489399d8 0000000000000086 0000000000000000 ffffffff8830848d
0000000000000000 ffff810348939988 ffffffff80617800 ffffffff80617800
ffffffff8061d210 ffffffff80617800 ffffffff80617800 ffffffff883082f3
Call Trace:
[<ffffffff8830848d>] :libafs:afs_IsPrimaryCell+0x2c/0x36
[<ffffffff883082f3>] :libafs:afs_GetCellStale+0x42/0x4b
[<ffffffff8834d0ae>] :libafs:afs_osi_SleepSig+0xc5/0x161
[<ffffffff8022b037>] default_wake_function+0x0/0xe
[<ffffffff883278d3>] :libafs:EvalMountPoint+0x2fa/0x3d1
[<ffffffff8834d1b3>] :libafs:afs_osi_Sleep+0x69/0xba
[<ffffffff88313672>] :libafs:Afs_Lock_Obtain+0xba/0x1a2
[<ffffffff88328bac>] :libafs:afs_lookup+0xe67/0x115f
[<ffffffff8834b18e>] :libafs:afs_pag_match+0x0/0x17
[<ffffffff8835026a>] :libafs:afs_linux_dentry_revalidate+0x178/0x434
[<ffffffff8029b8be>] igrab+0x25/0x34
[<ffffffff8831c62f>] :libafs:afs_FindVCache+0x222/0x4f5
[<ffffffff883210b4>] :libafs:afs_AccessOK+0x4a/0x14c
[<ffffffff803f9f8a>] mutex_lock+0xd/0x1e
[<ffffffff8029a68b>] iput+0x42/0x7b
[<ffffffff8831c995>] :libafs:afs_PutVCache+0x93/0x10d
[<ffffffff88325b0a>] :libafs:afs_PutFakeStat+0x3d/0x42
[<ffffffff88321538>] :libafs:afs_access+0x382/0x3be
[<ffffffff80298c2c>] __d_lookup+0xbc/0x10e
[<ffffffff8028fa05>] do_lookup+0x157/0x1ae
[<ffffffff8029179a>] __link_path_walk+0x36c/0xd8b
[<ffffffff80292211>] link_path_walk+0x58/0xe0
[<ffffffff8025db61>] audit_syscall_entry+0x138/0x17a
[<ffffffff80292589>] do_path_lookup+0x1ab/0x227
[<ffffffff80291089>] getname+0x14c/0x1af
[<ffffffff80292dfb>] __user_walk_fd+0x37/0x4c
[<ffffffff80287cee>] sys_faccessat+0x9c/0x148
[<ffffffff80209dac>] tracesys+0xdc/0xe1
ls S 000061c6b4bd29d2 0 8960 1 (NOTLB)
ffff8103f4e23968 0000000000000082 0000000000000000 ffffffff8830848d
0000000000000000 ffff8103f4e23918 ffffffff80617800 ffffffff80617800
ffffffff8061d210 ffffffff80617800 ffffffff80617800 ffffffff883082f3
Call Trace:
[<ffffffff8830848d>] :libafs:afs_IsPrimaryCell+0x2c/0x36
[<ffffffff883082f3>] :libafs:afs_GetCellStale+0x42/0x4b
[<ffffffff8834d0ae>] :libafs:afs_osi_SleepSig+0xc5/0x161
[<ffffffff8022b037>] default_wake_function+0x0/0xe
[<ffffffff883278d3>] :libafs:EvalMountPoint+0x2fa/0x3d1
[<ffffffff8834d1b3>] :libafs:afs_osi_Sleep+0x69/0xba
[<ffffffff88313672>] :libafs:Afs_Lock_Obtain+0xba/0x1a2
[<ffffffff88328bac>] :libafs:afs_lookup+0xe67/0x115f
[<ffffffff883210b4>] :libafs:afs_AccessOK+0x4a/0x14c
[<ffffffff8835026a>] :libafs:afs_linux_dentry_revalidate+0x178/0x434
[<ffffffff8029b8be>] igrab+0x25/0x34
[<ffffffff8831c62f>] :libafs:afs_FindVCache+0x222/0x4f5
[<ffffffff883210b4>] :libafs:afs_AccessOK+0x4a/0x14c
[<ffffffff803f9f8a>] mutex_lock+0xd/0x1e
[<ffffffff8029a68b>] iput+0x42/0x7b
[<ffffffff8831c995>] :libafs:afs_PutVCache+0x93/0x10d
[<ffffffff88325b0a>] :libafs:afs_PutFakeStat+0x3d/0x42
[<ffffffff88321538>] :libafs:afs_access+0x382/0x3be
[<ffffffff80298c2c>] __d_lookup+0xbc/0x10e
[<ffffffff8028fa05>] do_lookup+0x157/0x1ae
[<ffffffff80291d07>] __link_path_walk+0x8d9/0xd8b
[<ffffffff883080dc>] :libafs:afs_TraverseCells+0x4a/0x9b
[<ffffffff80292211>] link_path_walk+0x58/0xe0
[<ffffffff80235ae1>] current_fs_time+0x3b/0x40
[<ffffffff802a44a0>] __mark_inode_dirty+0xdf/0x17c
[<ffffffff80292589>] do_path_lookup+0x1ab/0x227
[<ffffffff80291089>] getname+0x14c/0x1af
[<ffffffff80292dfb>] __user_walk_fd+0x37/0x4c
[<ffffffff8028baee>] vfs_lstat_fd+0x18/0x47
[<ffffffff802a44a0>] __mark_inode_dirty+0xdf/0x17c
[<ffffffff8028bce0>] sys_newlstat+0x19/0x31
[<ffffffff80209d41>] tracesys+0x71/0xe1
[<ffffffff80209dac>] tracesys+0xdc/0xe1
bash D 000061d03f4a273d 0 9098 9096 (NOTLB)
ffff81035ddbfe78 0000000000000082 0000000000000000 ffffffff8832198d
ffff810341013c9c ffff81035ddbfe28 ffffffff80617800 ffffffff80617800
ffffffff8061d210 ffffffff80617800 ffffffff80617800 ffff8102e3f31780
Call Trace:
[<ffffffff8832198d>] :libafs:afs_getattr+0x11f/0x228
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff803fa0f8>] __mutex_lock_slowpath+0x61/0x9e
[<ffffffff803f9f97>] mutex_lock+0x1a/0x1e
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff80294887>] vfs_readdir+0x5c/0xa9
[<ffffffff80294949>] sys_getdents64+0x75/0xba
[<ffffffff80209d41>] tracesys+0x71/0xe1
[<ffffffff80209dac>] tracesys+0xdc/0xe1
bash D 000061dbc1e3ff6d 0 9921 9919 (NOTLB)
ffff810326999e78 0000000000000086 0000000000000000 ffffffff8832198d
ffff810341013ca0 ffff810326999e28 ffffffff80617800 ffffffff80617800
ffffffff8061d210 ffffffff80617800 ffffffff80617800 ffff810362b93600
Call Trace:
[<ffffffff8832198d>] :libafs:afs_getattr+0x11f/0x228
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff803fa0f8>] __mutex_lock_slowpath+0x61/0x9e
[<ffffffff803f9f97>] mutex_lock+0x1a/0x1e
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff80294887>] vfs_readdir+0x5c/0xa9
[<ffffffff80294949>] sys_getdents64+0x75/0xba
[<ffffffff80209d41>] tracesys+0x71/0xe1
[<ffffffff80209dac>] tracesys+0xdc/0xe1
bash R running task 0 10333 10331 (NOTLB)
openafs: Can't open inode 26407
------------[ cut here ]------------
kernel BUG at
/compile/openafs-1.4.9/src/libafs/MODLOAD-2.6.22.5-31-default-MP/osi_file.c:83!
invalid opcode: 0000 [4] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 4
Modules linked in: aeidd(P) amk tun iptable_filter ip_tables x_tables ext2 nfs
lockd nfs_acl sunrpc ipv6 libafs(P) microcode firmware_class usbhid tp_dd
af_packet apparmor loop raid0 dm_mod parport_pc rtc_cmos parport rtc_core
i2c_i801 sr_mod rtc_lib i2c_core serio_raw cdrom bnx2 tg3 shpchp pci_hotplug
container sg button joydev hid ff_memless uhci_hcd ehci_hcd usbcore sd_mod
ata_piix libata edd ext3 mbcache jbd fan aacraid scsi_mod piix ide_core thermal
processor
Pid: 11697, comm: bash Tainted: P N 2.6.22.5-31-default #1
RIP: 0010:[<ffffffff8834bbfd>] [<ffffffff8834bbfd>]
:libafs:osi_UFSOpen+0x155/0x1f2
RSP: 0018:ffff81038b0bbc78 EFLAGS: 00010292
RAX: 0000000000000023 RBX: ffff8103af4c2118 RCX: 0000000000000001
RDX: ffffffff804bdfe8 RSI: 0000000000000092 RDI: ffffffff804bdfe0
RBP: ffff8103417ed000 R08: 0000000000000001 R09: ffff81000106cb20
R10: 0000000000000046 R11: ffff81042ee721c0 R12: ffff81042a4ea400
R13: 0000000000006727 R14: 0000000000006727 R15: 0000000000000170
FS: 00002b3a72ff1b00(0000) GS:ffff81042edf9240(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b810f705062 CR3: 00000002cb81a000 CR4: 00000000000006e0
Process bash (pid: 11697, threadinfo ffff81038b0ba000, task ffff8102cba0b100)
Stack: ffff810387cf59d8 ffff810387cf5780 ffffc200078a2d10 0000000000000000
0000000000000000 ffffffff8830f97f 0000000000000296 ffff81042b16d480
0000000000000001 ffffffffffffffff 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8830f97f>] :libafs:afs_GetDCache+0x14c2/0x2adb
[<ffffffff883210b4>] :libafs:afs_AccessOK+0x4a/0x14c
[<ffffffff88330e9d>] :libafs:afs_GetVolume+0x20/0x118
[<ffffffff88321751>] :libafs:afs_CopyOutAttrs+0xc5/0x1e2
[<ffffffff8831683f>] :libafs:PagInCred+0x30/0xa9
[<ffffffff8834e9bd>] :libafs:afs_linux_readdir+0x126/0x61e
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff8028bc6c>] cp_new_stat+0xe5/0xfd
[<ffffffff802946bc>] filldir64+0x0/0xb8
[<ffffffff802948a2>] vfs_readdir+0x77/0xa9
[<ffffffff80294949>] sys_getdents64+0x75/0xba
[<ffffffff80209d41>] tracesys+0x71/0xe1
[<ffffffff80209dac>] tracesys+0xdc/0xe1
Code: 0f 0b eb fe 48 89 45 08 48 8b 40 18 48 c7 c7 20 b4 37 88 48
RIP [<ffffffff8834bbfd>] :libafs:osi_UFSOpen+0x155/0x1f2
RSP <ffff81038b0bbc78>