Re: [OpenAFS] OpenAFS client cache overrun?

Jonathan Billings Wed, 12 Mar 2014 09:10:24 -0700

You don't need to recompile to use 'crash', you can use the debuginfo
kernel (which is available in the -debuginfo channel for RHEL, you might
not have it subscribed).  In it is a kernel with debugging symbols.



On Wed, Mar 12, 2014 at 11:20 AM, Eric Chris Garrison <[email protected]>wrote:

> A few things.
>
> 1 - The user claims they were merely storing the enormous .pst file, not
> accessing them from Outlook.
>
> 2 - The user claimed that any large file bigger than about 4GB would cause
> the lockup. We haven't been able to replicate it, but he crammed a few
> 10GB files through this morning and locked up one of our gateways as a
> demonstration. He has not made my day any brighter.
>
> Additional info: WE were unable to reproduce this, but he mentioned that
> the test was conducted by copying from one AFS directory to another.
>
> Additional additional: If I didn't mention it before, this is all going
> over samba-on-OpenAFS. Yes, I know, users should be using the OpenAFS
> client rather than going through samba on a gateway. We have found it
> extremely difficult to get users to adopt this method, however, and have
> to try to make this work.
>
> 3 - I had enabled a 2GB cache bypass, and it seemed to have no effect
> whatsoever.
>
> 4 - I gathered what data I could. Looks like I can't use "crash" without a
> kernel recompile:
>
> This GDB was configured as "x86_64-unknown-linux-gnu"...(no debugging
> symbols found)...
>
> crash: /boot/vmlinuz-2.6.18-194.26.1.el5: no debugging data available
>
>
> cmbdebug said this:
>
> [root@rgwb1 ~]# cmdebug localhost
> Lock afs_discon_lock status: (none_waiting, 21876 read_locks(pid:29278))
>
> [root@rgwb1 ~]# !ps
> ps -ef | grep 29278
> root     29278  4477  0 09:27 ?        00:00:00 smbd
> root     30101 29337  0 09:37 pts/3    00:00:00 grep 29278
>
> When I ran "top" I saw that the afs_cachetrim process was #1, but
> presumably wedged.
>
>
> I goosed /proc/sysrq-trigger and as promised, it dumped a lot of call
> trace info to the syslog. I'm looking through it, but am not sure what to
> look for. Nothing stands out, anyway.
>
> Chris
>
> On 3/7/14 3:51 PM, "Andrew Deason" <[email protected]> wrote:
> >Message: 4
> >To: [email protected]
> >From: Andrew Deason <[email protected]>
> >Date: Fri, 7 Mar 2014 15:51:23 -0600
> >Organization: Sine Nomine Associates
> >Subject: [OpenAFS] Re: OpenAFS client cache overrun?
> >
> >On Fri, 07 Mar 2014 13:51:06 -0500
> >Eric Chris Garrison <[email protected]> wrote:
> >
> >>I'll have to look for that message from Andrew to gather data if the
> >>problem crops up again.
> >
> >It's this message:
> >
> ><
> http://thread.gmane.org/gmane.comp.file-systems.openafs.general/34517/foc
> >us=34532>
> >
> >The easiest / most basic information to get is just the stack trace from
> >the daemon that is supposed to be trimming the cache back when it gets
> >full. That message contains the commands where you can get that
> >information via the 'crash' tool.
> >
> >Or, another way to get that information is by running this:
> >
> ># echo t > /proc/sysrq-trigger
> >
> >That will generate a ton of information to the kernel log, which you'd
> >need to sift through or give to someone else. But it's at least a lot
> >easier to set up and run.
> >
> >>Thanks also for the mention of AFS cache bypass, I think that may be a
> >>BIG help with this problem.
> >
> >'Cache bypass' I don't believe is considered the most stable of
> >features. It could indeed maybe help here, but I'd be looking out for
> >kernel panics.
> >
> >--
> >Andrew Deason
> >[email protected]
> >
>
>
> _______________________________________________
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info
>



-- 
Jonathan Billings <[email protected]>
College of Engineering - CAEN - Unix and Linux Support

Re: [OpenAFS] OpenAFS client cache overrun?

Reply via email to