resolved: actual issue was the weekly fileserver restart and the dbservers were not relevant.
On Mon, Jul 30, 2012 at 1:46 PM, Jeff White <[email protected]> wrote: > I recently built two RHEL 6.3 x64 systems with 1.6.1-1 (compiled from the > src.rpm) and they consistently have issues when one of our DB servers > (running 1.2.11) is brought down for a cold backup of the AFS databases. > Our older clients (1.4.14.1-1 and below) do not have this issue. > > We have three DB servers (afs09, afs10, afs11) with afs09 as the master. > Sunday at 4:05 AM a script run to stop the AFS DB processes on afs11 and tar > the DB files then start the processes again. When this happens our new > 1.6.1 clients hang and begin spewing a large number of these errors: > > Jul 29 04:00:27 <kern.warning> ewi-afs-prod0 kernel: afs: Waiting for busy > volume 1937412136 () in cell pitt.edu > > Sometimes it is able to determine the volume name, sometimes not. When this > happen I cannot access anything in our AFS cell on the failing client, even > after a reboot. The one DB server is down only for a minute yet the issues > continue after the DB server is back up. > > So, a few questions: > > Has anyone seen this behavior before when one DB server becomes inaccessible > but other DB servers are available? > Is there anything I can do to troubleshoot the issue to help determine what > is casing it? > If a client is talking to a particular DB server and the remote system stops > responding, will the client silently move on to trying a different DB server > or is it sticky to the same server and keep trying to talk to it? > > I would hope that the last part of that is not true. It should work like > DNS by trying every DB server in sequence and only returning an error once > all servers have failed. > > > > > Jul 29 04:22:31 <kern.err> ewi-afs-prod0 kernel: INFO: task httpd:1542 > blocked for more than 120 seconds. > Jul 29 04:22:31 <kern.err> ewi-afs-prod0 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 29 04:22:31 <kern.info> ewi-afs-prod0 kernel: httpd D > 0000000000000000 0 1542 1535 0x00000080 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff88013b8f3ba8 > 0000000000000082 ffff88013b8f3c38 ffff880139bc1000 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff88013b8f3b68 > ffffffffa02db742 ffff880137a9eae0 ffff880137a9eae0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: ffff880137a9f098 > ffff88013b8f3fd8 000000000000fb88 ffff880137a9f098 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: Call Trace: > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffffa02db742>] ? > afs_FindVCache+0xe2/0x5b0 [openafs] > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff814fee9e>] > __mutex_lock_slowpath+0x13e/0x180 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffffa02e1a81>] ? > afs_access+0x181/0x730 [openafs] > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff814fed3b>] > mutex_lock+0x2b/0x50 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118957b>] > do_lookup+0x11b/0x230 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118999d>] > __link_path_walk+0x20d/0x1030 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8105b483>] ? > perf_event_task_sched_out+0x33/0x80 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118aa4a>] > path_walk+0x6a/0xe0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118ac1b>] > do_path_lookup+0x5b/0xa0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118b887>] > user_path_at+0x57/0xa0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff81148641>] ? > unlink_anon_vmas+0x71/0xd0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff811804bc>] > vfs_fstatat+0x3c/0x80 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8118062b>] > vfs_stat+0x1b/0x20 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff81180654>] > sys_newstat+0x24/0x50 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff810d69f2>] ? > audit_syscall_entry+0x272/0x2a0 > Jul 29 04:22:31 <kern.warning> ewi-afs-prod0 kernel: [<ffffffff8100b0f2>] > system_call_fastpath+0x16/0x1b > > -- > Jeff White - GNU+Linux Systems Engineer > University of Pittsburgh - CSSD > > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info -- Derrick _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
