On Wed, 12 Mar 2014 10:20:56 -0500 Eric Chris Garrison <[email protected]> wrote:
> Additional additional: If I didn't mention it before, this is all > going over samba-on-OpenAFS. Yes, I know, users should be using the > OpenAFS client rather than going through samba on a gateway. We have > found it extremely difficult to get users to adopt this method, > however, and have to try to make this work. I don't think you need to keep saying this :) While that setup is maybe not ideal, you shouldn't be able to lock-up the client like that. The samba daemon(s) are accessing files over /afs like anything else. > 3 - I had enabled a 2GB cache bypass, and it seemed to have no effect > whatsoever. "cache bypass" doesn't do anything for writes, only for read operations. That probably wasn't clear, but I didn't know before if this was just something stuffing data into afs or reading/writing stuff, or what. > cmbdebug said this: > > [root@rgwb1 ~]# cmdebug localhost > Lock afs_discon_lock status: (none_waiting, 21876 read_locks(pid:29278)) To be clear, this just ran and then exited on its own, right? You didn't ctrl-C it or anything. > [root@rgwb1 ~]# !ps > ps -ef | grep 29278 > root 29278 4477 0 09:27 ? 00:00:00 smbd > root 30101 29337 0 09:37 pts/3 00:00:00 grep 29278 > > When I ran "top" I saw that the afs_cachetrim process was #1, but > presumably wedged. > > I goosed /proc/sysrq-trigger and as promised, it dumped a lot of call > trace info to the syslog. I'm looking through it, but am not sure what to > look for. Nothing stands out, anyway. You're looking for the stack trace for the afs_cachetrim process. Look in syslog for "afs_cachetrim", or its pid. Under that should be a trace of functions that indicates where we are in the code at that time. I would extract that, and the entry for a hanging process. So, maybe 29278, or if anything hangs when touching anything in /afs, you could get the entry for that. Or if you want to try to find "everything", just look for anything containing the string "afs". If you ever don't want to leave the system hanging while you examine it, but you want to capture information you can examine later, you can generate a core dump. If your system is setup to capture a core on crash (I'm not sure if this is the default... look at RHEL documentation, it should be something mentioning kdump or kexec), you can crash the system and you'll get a vmcore afterwards. To do this, send a 'c' to /proc/sysrq-trigger. That will of course crash the system and cause it to reboot, so don't do that if that's not what you want to happen. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
