You don't need to recompile to use 'crash', you can use the debuginfo kernel (which is available in the -debuginfo channel for RHEL, you might not have it subscribed). In it is a kernel with debugging symbols.
On Wed, Mar 12, 2014 at 11:20 AM, Eric Chris Garrison <[email protected]>wrote: > A few things. > > 1 - The user claims they were merely storing the enormous .pst file, not > accessing them from Outlook. > > 2 - The user claimed that any large file bigger than about 4GB would cause > the lockup. We haven't been able to replicate it, but he crammed a few > 10GB files through this morning and locked up one of our gateways as a > demonstration. He has not made my day any brighter. > > Additional info: WE were unable to reproduce this, but he mentioned that > the test was conducted by copying from one AFS directory to another. > > Additional additional: If I didn't mention it before, this is all going > over samba-on-OpenAFS. Yes, I know, users should be using the OpenAFS > client rather than going through samba on a gateway. We have found it > extremely difficult to get users to adopt this method, however, and have > to try to make this work. > > 3 - I had enabled a 2GB cache bypass, and it seemed to have no effect > whatsoever. > > 4 - I gathered what data I could. Looks like I can't use "crash" without a > kernel recompile: > > This GDB was configured as "x86_64-unknown-linux-gnu"...(no debugging > symbols found)... > > crash: /boot/vmlinuz-2.6.18-194.26.1.el5: no debugging data available > > > cmbdebug said this: > > [root@rgwb1 ~]# cmdebug localhost > Lock afs_discon_lock status: (none_waiting, 21876 read_locks(pid:29278)) > > [root@rgwb1 ~]# !ps > ps -ef | grep 29278 > root 29278 4477 0 09:27 ? 00:00:00 smbd > root 30101 29337 0 09:37 pts/3 00:00:00 grep 29278 > > When I ran "top" I saw that the afs_cachetrim process was #1, but > presumably wedged. > > > I goosed /proc/sysrq-trigger and as promised, it dumped a lot of call > trace info to the syslog. I'm looking through it, but am not sure what to > look for. Nothing stands out, anyway. > > Chris > > On 3/7/14 3:51 PM, "Andrew Deason" <[email protected]> wrote: > >Message: 4 > >To: [email protected] > >From: Andrew Deason <[email protected]> > >Date: Fri, 7 Mar 2014 15:51:23 -0600 > >Organization: Sine Nomine Associates > >Subject: [OpenAFS] Re: OpenAFS client cache overrun? > > > >On Fri, 07 Mar 2014 13:51:06 -0500 > >Eric Chris Garrison <[email protected]> wrote: > > > >>I'll have to look for that message from Andrew to gather data if the > >>problem crops up again. > > > >It's this message: > > > >< > http://thread.gmane.org/gmane.comp.file-systems.openafs.general/34517/foc > >us=34532> > > > >The easiest / most basic information to get is just the stack trace from > >the daemon that is supposed to be trimming the cache back when it gets > >full. That message contains the commands where you can get that > >information via the 'crash' tool. > > > >Or, another way to get that information is by running this: > > > ># echo t > /proc/sysrq-trigger > > > >That will generate a ton of information to the kernel log, which you'd > >need to sift through or give to someone else. But it's at least a lot > >easier to set up and run. > > > >>Thanks also for the mention of AFS cache bypass, I think that may be a > >>BIG help with this problem. > > > >'Cache bypass' I don't believe is considered the most stable of > >features. It could indeed maybe help here, but I'd be looking out for > >kernel panics. > > > >-- > >Andrew Deason > >[email protected] > > > > > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info > -- Jonathan Billings <[email protected]> College of Engineering - CAEN - Unix and Linux Support
