Am 20.04.2012 03:55, schrieb Ken Elkabany:
We have 2 OpenAFS servers running 1.4.14. We have many clients that we just switched over to 1.6.1pre1. Starting earlier today, we started
Not sure if it helps in your situation, but 1.6.1 is out. Try using this. T/Christof
getting NULL pointer dereferences, which has been completely hosing the clients. The client machines hang on any call that deals with AFS, whether it's "ls /", "ls /afs", "klist", etc... A "vos changeaddr" was done earlier today, whereby a large collection (4000) of volumes were mistakenly assigned to another server. These were corrected with "vos syncvldb" followed by "vos syncserv". I mention it here, as it's the only thing we've done to the AFS cluster today. Here's what we found in the syslog: Apr 20 01:30:43 SERVER kernel: [12861236.027818] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Apr 20 01:30:43 SERVER kernel: [12861236.027836] IP: [<ffffffffa0048087>] afs_Conn+0x1e7/0x260 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.027868] PGD 0 Apr 20 01:30:43 SERVER kernel: [12861236.027874] Oops: 0000 [#1] SMP Apr 20 01:30:43 SERVER kernel: [12861236.027882] CPU 6 Apr 20 01:30:43 SERVER kernel: [12861236.027885] Modules linked in: openafs(P) isofs acpiphp Apr 20 01:30:43 SERVER kernel: [12861236.027897] Apr 20 01:30:43 SERVER kernel: [12861236.027902] Pid: 1568, comm: apache2 Tainted: P O 3.2.0-23-virtual #36-Ubuntu Apr 20 01:30:43 SERVER kernel: [12861236.027912] RIP: e030:[<ffffffffa0048087>] [<ffffffffa0048087>] afs_Conn+0x1e7/0x260 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.027936] RSP: e02b:ffff88017f417808 EFLAGS: 00010282 Apr 20 01:30:43 SERVER kernel: [12861236.027942] RAX: ffffc9000188dbe0 RBX: 0000000000000000 RCX: 000000000000581b Apr 20 01:30:43 SERVER kernel: [12861236.027950] RDX: ffff8801b112a000 RSI: 0000000000000001 RDI: ffff88017f761680 Apr 20 01:30:43 SERVER kernel: [12861236.027957] RBP: ffff88017f417858 R08: 0000000000000000 R09: 0000000000000000 Apr 20 01:30:43 SERVER kernel: [12861236.027964] R10: 0000000000000002 R11: 0000000000000000 R12: ffff880184756f48 Apr 20 01:30:43 SERVER kernel: [12861236.027971] R13: ffff88017f417a20 R14: 0000000000000004 R15: ffff88017f4178f0 Apr 20 01:30:43 SERVER kernel: [12861236.027983] FS: 00007f1f6ae2f700(0000) GS:ffff8801bff73000(0000) knlGS:0000000000000000 Apr 20 01:30:43 SERVER kernel: [12861236.027991] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 20 01:30:43 SERVER kernel: [12861236.027998] CR2: 0000000000000028 CR3: 0000000181465000 CR4: 0000000000002660 Apr 20 01:30:43 SERVER kernel: [12861236.028006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 20 01:30:43 SERVER kernel: [12861236.028013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 20 01:30:43 SERVER kernel: [12861236.028021] Process apache2 (pid: 1568, threadinfo ffff88017f416000, task ffff88017f41adc0) Apr 20 01:30:43 SERVER kernel: [12861236.028028] Stack: Apr 20 01:30:43 SERVER kernel: [12861236.028032] 000000004e2a6741 0000000000000000 0000000000000000 000000004f90bc43 Apr 20 01:30:43 SERVER kernel: [12861236.028046] 000000000001584a ffff880184756cc0 ffff88017f41adc0 ffff88017f417a20 Apr 20 01:30:43 SERVER kernel: [12861236.028059] ffff880184756f48 ffff880184756cc0 ffff88017f417928 ffffffffa0068658 Apr 20 01:30:43 SERVER kernel: [12861236.028072] Call Trace: Apr 20 01:30:43 SERVER kernel: [12861236.028092] [<ffffffffa0068658>] afs_FetchStatus+0x58/0x450 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028113] [<ffffffffa004672b>] ? afs_GetCellStale+0x3b/0x60 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028134] [<ffffffffa0046a25>] ? afs_IsPrimaryCell+0x25/0x40 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028157] [<ffffffffa0082b80>] ? afs_GetVolume+0x40/0x1d0 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028179] [<ffffffffa006ae8d>] afs_GetVCache+0x26d/0x5d0 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028200] [<ffffffffa006b343>] afs_VerifyVCache2+0x153/0x200 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028222] [<ffffffffa006ccec>] afs_getattr+0x29c/0x350 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028242] [<ffffffffa009340f>] afs_linux_dentry_revalidate+0x39f/0x470 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028265] [<ffffffffa006bf43>] ? afs_AccessOK+0x113/0x1e0 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028279] [<ffffffff816552de>] ? _raw_spin_lock+0xe/0x20 Apr 20 01:30:43 SERVER kernel: [12861236.028290] [<ffffffff811818eb>] do_lookup+0x18b/0x310 Apr 20 01:30:43 SERVER kernel: [12861236.028298] [<ffffffff8129885c>] ? security_inode_permission+0x1c/0x30 Apr 20 01:30:43 SERVER kernel: [12861236.028306] [<ffffffff81182268>] link_path_walk+0x138/0x870 Apr 20 01:30:43 SERVER kernel: [12861236.028313] [<ffffffff811834ad>] ? path_init+0x2ed/0x3c0 Apr 20 01:30:43 SERVER kernel: [12861236.028319] [<ffffffff811835d8>] path_lookupat+0x58/0x750 Apr 20 01:30:43 SERVER kernel: [12861236.028339] [<ffffffffa006cb3c>] ? afs_getattr+0xec/0x350 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028348] [<ffffffff810067be>] ? xen_pmd_val+0xe/0x10 Apr 20 01:30:43 SERVER kernel: [12861236.028355] [<ffffffff81183d01>] do_path_lookup+0x31/0xc0 Apr 20 01:30:43 SERVER kernel: [12861236.028362] [<ffffffff81184809>] user_path_at_empty+0x59/0xa0 Apr 20 01:30:43 SERVER kernel: [12861236.028369] [<ffffffff8100aa32>] ? check_events+0x12/0x20 Apr 20 01:30:43 SERVER kernel: [12861236.028377] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 Apr 20 01:30:43 SERVER kernel: [12861236.028384] [<ffffffff81184861>] user_path_at+0x11/0x20 Apr 20 01:30:43 SERVER kernel: [12861236.028391] [<ffffffff8117995a>] vfs_fstatat+0x3a/0x70 Apr 20 01:30:43 SERVER kernel: [12861236.028398] [<ffffffff8100aa1f>] ? xen_restore_fl_direct_reloc+0x4/0x4 Apr 20 01:30:43 SERVER kernel: [12861236.028405] [<ffffffff8100465d>] ? xen_clts+0x8d/0x190 Apr 20 01:30:43 SERVER kernel: [12861236.028412] [<ffffffff811799ae>] vfs_lstat+0x1e/0x20 Apr 20 01:30:43 SERVER kernel: [12861236.028418] [<ffffffff81179b4a>] sys_newlstat+0x1a/0x40 Apr 20 01:30:43 SERVER kernel: [12861236.028427] [<ffffffff810146e1>] ? math_state_restore+0x51/0x80 Apr 20 01:30:43 SERVER kernel: [12861236.028435] [<ffffffff816562fe>] ? do_device_not_available+0xe/0x10 Apr 20 01:30:43 SERVER kernel: [12861236.028445] [<ffffffff8165f8cb>] ? device_not_available+0x1b/0x20 Apr 20 01:30:43 SERVER kernel: [12861236.028452] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b Apr 20 01:30:43 SERVER kernel: [12861236.028458] Code: 89 ef 48 89 45 c8 e8 39 c4 01 00 48 8b 45 c8 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 85 ff 0f 84 95 fe ff ff 48 8b 5f 58 <f6> 43 28 20 0f 85 87 fe ff ff 41 80 7d 12 00 7e 29 41 80 7d 13 Apr 20 01:30:43 SERVER kernel: [12861236.028543] RIP [<ffffffffa0048087>] afs_Conn+0x1e7/0x260 [openafs] Apr 20 01:30:43 SERVER kernel: [12861236.028563] RSP <ffff88017f417808> Apr 20 01:30:43 SERVER kernel: [12861236.028568] CR2: 0000000000000028
-- The future is all around us, waiting in moments of transition to be born in moments of revelation. No one knows the shape of that future or where it will take us. We know only that it is always born in pain. -- G'Quan Let's update the servers! ----------------------------------------------------------------- Christof Hanke e-mail [email protected] RZG (Rechenzentrum Garching) phone +49-89-3299-1041 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut für Plasmaphysik (IPP) _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
