Denice, I have spoken with a couple of the guys on the xfs mailing list. The quick fix would seem to be recompiling the kernel to support a 16K kernel stack.
I've spent a few hours researching and have been unable to locate anything relative to the 2.6.32 kernel. It's not easy finding anything regarding a patch, or recompiling the kernel to support this feature, let along finding anything relative to these operations for 2.6.32. Any suggestions? Thank you ---------- Forwarded message ---------- From: Dave Chinner <[email protected]> Date: Mon, Dec 12, 2011 at 5:47 PM Subject: Re: XFS causing stack overflow To: "Ryan C. England" <[email protected]> Cc: Andi Kleen <[email protected]>, Christoph Hellwig <[email protected]>, [email protected], [email protected] On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote: > On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <[email protected]> wrote: > > On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote: > > > BTW I suppose it wouldn't be all that hard to add more stacks and > > > switch to them too, similar to what the 32bit do_IRQ does. > > > Perhaps XFS could just allocate its own stack per thread > > > (or maybe only if it detects some specific configuration that > > > is known to need much stack) > > > > That's possible, but rather complex, I think. > > > It would need to be per thread if you could sleep inside them. > > > > Yes, we'd need to sleep, do IO, possibly operate within a > > transaction context, etc, and a workqueue handles all these cases > > without having to do anything special. Splitting the stack at a > > logical point is probably better, such as this patch: > > > > http://oss.sgi.com/archives/xfs/2011-07/msg00443.html > > Is it possible to apply this patch to my current installation? We use this > box in production and the reboots that we're experiencing are an > inconvenience. Not easily. The problem with a backport is that the workqueue infrastructure changed around 2.6.36, allowing workqueues to act like an (almost) infinite pool of worker threads and so by using a workqueue we can have effectively unlimited numbers of concurrent allocations in progress at once. The workqueue implementation in 2.6.32 only allows a single work instance per workqueue thread, and so even with per-CPU worker threads, would only allow one allocation at a time per CPU. This adds additional serialisation within a filesystem, between filesystem and potentially adds new deadlock conditions as well. So it's not exactly obvious whether it can be backported in a sane manner or not. > Is there is a walkthrough on how to apply this patch? If not, could your > provide the steps necessary to apply successfully? I would greatly > appreciate it. It would probably need redesigning and re-implementing from scratch because of the above reasons. It'd then need a lot of testing and review. As a workaround, you might be better off doing what Andi first suggested - recompiling your kernel to use 16k stacks. Cheers, Dave. -- Dave Chinner [email protected] -- Ryan C. England Corvid Technologies <http://www.corvidtec.com/> office: 704-799-6944 x158 cell: 980-521-2297
