[OpenAFS] httpd blocked

Jonathan Nilsson Mon, 20 Jun 2011 16:20:08 -0700

hello,

this past weekend our webserver, which serves pages from AFS, crashed and Ifound several messages like the following in /var/log/messages:

Jun 18 13:19:51 web1 kernel: INFO: task httpd:26383 blocked for more than 120seconds.Jun 18 13:19:51 web1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message.Jun 18 13:19:51 web1 kernel: httpd D 0001B845 2032 26383 3214326384 26382 (NOTLB)Jun 18 13:19:51 web1 kernel: c7449e48 00000082 1e778e40 0001b845 0000004600000002 f887e080 00000007Jun 18 13:19:51 web1 kernel: dff56000 1e7e1fb4 0001b845 00069174 00000000dff5610c c3012900 f3e77740Jun 18 13:19:51 web1 kernel: f24491e0 00000000 00000000 ea22cb80 0000000000000040 00000000 ea22cb80

Jun 18 13:19:51 web1 kernel: Call Trace:
Jun 18 13:19:51 web1 kernel:  [<f964f78d>] afs_access+0x320/0x337 [openafs]
Jun 18 13:19:51 web1 kernel:  [<c061d975>] __mutex_lock_slowpath+0x4d/0x7c
Jun 18 13:19:51 web1 kernel:  [<c061d9b3>] .text.lock.mutex+0xf/0x14
Jun 18 13:19:51 web1 kernel:  [<c048219b>] do_lookup+0x7a/0x174
Jun 18 13:19:51 web1 kernel:  [<c0483fc8>] __link_path_walk+0x87a/0xd4b
Jun 18 13:19:51 web1 kernel:  [<c04844d1>] link_path_walk+0x38/0x95
Jun 18 13:20:24 web1 kernel:  [<c0484892>] do_path_lookup+0x219/0x27f
Jun 18 13:20:24 web1 kernel:  [<c0484fec>] __user_walk_fd+0x29/0x3a
Jun 18 13:20:24 web1 kernel:  [<c0474e92>] sys_faccessat+0x93/0x126
Jun 18 13:20:24 web1 kernel:  [<c044bf62>] audit_syscall_entry+0x15a/0x18c
Jun 18 13:20:24 web1 kernel:  [<c0474f34>] sys_access+0xf/0x13
Jun 18 13:20:24 web1 kernel:  [<c0404f17>] syscall_call+0x7/0xb

this system is CentOS 5.5 (so it is quite out of date with several packages)32bit with OpenAFS 1.4.14. other AFS clients did not have any problems that weare aware of, but this web server is under the heaviest load.

i suspect that the system kept spawning httpd processes as old ones got blockedand eventually it ran out of memory and became unresponsive. after a reboot itworks fine. so the question is, what caused the afs cache manager to respond soslow?

can anyone confirm if they have seen kernel messages like this? how can iconfirm if the problem is with the client or the server? i see no error messagesin BosLog, FileLog, or VolserLog on our servers...


i may need to adjust the afsd or fileserver/volserver arguments.
the client's /etc/sysconfig/openafs

AFSD_ARGS="-dynroot -fakestat-all -daemons 6 -volumes 500 -chunksize 20 -blocks5242880"


our servers' BosConfig lines for fileserver and volserver
parm /usr/afs/bin/fileserver -L
parm /usr/afs/bin/volserver -p 128

i saw Russ Allbery's recent message on another thread that he uses theseparameter's on the fileserver, so i can try that:


/usr/lib/openafs/fileserver -L -l 1000 -s 1000 -vc 1000 -cb 200000 \
    -rxpck 800 -udpsize 1048576 -busyat 200 -vattachpar 4

thanks,

--Jonathan




--
[email protected]
Computing Services
School of Social Sciences
SSPA 4110 | 949.824.1536
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

[OpenAFS] httpd blocked

Reply via email to