I recently built two RHEL 6.3 x64 systems with 1.6.1-1 (compiled from the src.rpm) and they consistently have issues when one of our DB servers (running 1.2.11) is brought down for a cold backup of the AFS databases. Our older clients (1.4.14.1-1 and below) do not have this issue.

We have three DB servers (afs09, afs10, afs11) with afs09 as the master. Sunday at 4:05 AM a script run to stop the AFS DB processes on afs11 and tar the DB files then start the processes again. When this happens our new 1.6.1 clients hang and begin spewing a large number of these errors:

Jul 29 04:00:27 <kern.warning> ewi-afs-prod0 kernel: afs: Waiting for busy volume 1937412136 () in cell pitt.edu

Sometimes it is able to determine the volume name, sometimes not. When this happen I cannot access anything in our AFS cell on the failing client, even after a reboot. The one DB server is down only for a minute yet the issues continue after the DB server is back up.

So, a few questions:

Has anyone seen this behavior before when one DB server becomes inaccessible but other DB servers are available? Is there anything I can do to troubleshoot the issue to help determine what is casing it? If a client is talking to a particular DB server and the remote system stops responding, will the client silently move on to trying a different DB server or is it sticky to the same server and keep trying to talk to it?

I would hope that the last part of that is not true. It should work like DNS by trying every DB server in sequence and only returning an error once all servers have failed.




Jul 29 04:22:31 <kern.err> ewi-afs-prod0 kernel: INFO: task httpd:1542 blocked for more than 120 seconds. Jul 29 04:22:31 <kern.err> ewi-afs-prod0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 29 04:22:31 <kern.info> ewi-afs-prod0 kernel: httpd D 0000000000000000 0 1542 1535 0x00000080 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff88013b8f3ba8 0000000000000082 ffff88013b8f3c38 ffff880139bc1000 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff88013b8f3b68 ffffffffa02db742 ffff880137a9eae0 ffff880137a9eae0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff880137a9f098 ffff88013b8f3fd8 000000000000fb88 ffff880137a9f098
Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: Call Trace:
Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffffa02db742>] ? afs_FindVCache+0xe2/0x5b0 [openafs] Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff814fee9e>] __mutex_lock_slowpath+0x13e/0x180 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffffa02e1a81>] ? afs_access+0x181/0x730 [openafs] Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff814fed3b>] mutex_lock+0x2b/0x50 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118957b>] do_lookup+0x11b/0x230 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118999d>] __link_path_walk+0x20d/0x1030 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8105b483>] ? perf_event_task_sched_out+0x33/0x80 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118aa4a>] path_walk+0x6a/0xe0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118ac1b>] do_path_lookup+0x5b/0xa0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118b887>] user_path_at+0x57/0xa0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff81148641>] ? unlink_anon_vmas+0x71/0xd0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff811804bc>] vfs_fstatat+0x3c/0x80 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118062b>] vfs_stat+0x1b/0x20 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff81180654>] sys_newstat+0x24/0x50 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff810d69f2>] ? audit_syscall_entry+0x272/0x2a0 Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

--
Jeff White - GNU+Linux Systems Engineer
University of Pittsburgh - CSSD

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to