Fwd: XFS stack overflow

Ryan C. England Thu, 15 Dec 2011 08:32:16 -0800

Denice,

I have spoken with a couple of the guys on the xfs mailing list.  The quick
fix would seem to be recompiling the kernel to support a 16K kernel stack.

I've spent a few hours researching and have been unable to locate anything
relative to the 2.6.32 kernel.  It's not easy finding anything regarding a
patch, or recompiling the kernel to support this feature, let along finding
anything relative to these operations for 2.6.32.  Any suggestions?

Thank you

---------- Forwarded message ----------
From: Dave Chinner <[email protected]>
Date: Mon, Dec 12, 2011 at 5:47 PM
Subject: Re: XFS causing stack overflow
To: "Ryan C. England" <[email protected]>
Cc: Andi Kleen <[email protected]>, Christoph Hellwig <[email protected]>,
[email protected], [email protected]

On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote:
> On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <[email protected]> wrote:
> > On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote:
> > > BTW I suppose it wouldn't be all that hard to add more stacks and
> > > switch to them too, similar to what the 32bit do_IRQ does.
> > > Perhaps XFS could just allocate its own stack per thread
> > > (or maybe only if it detects some specific configuration that
> > > is known to need much stack)
> >
> > That's possible, but rather complex, I think.
> > > It would need to be per thread if you could sleep inside them.
> >
> > Yes, we'd need to sleep, do IO, possibly operate within a
> > transaction context, etc, and a workqueue handles all these cases
> > without having to do anything special. Splitting the stack at a
> > logical point is probably better, such as this patch:
> >
> > http://oss.sgi.com/archives/xfs/2011-07/msg00443.html
>
> Is it possible to apply this patch to my current installation?  We use
this
> box in production and the reboots that we're experiencing are an
> inconvenience.

Not easily. The problem with a backport is that the workqueue
infrastructure changed around 2.6.36, allowing workqueues to act
like an (almost) infinite pool of worker threads and so by using a
workqueue we can have effectively unlimited numbers of concurrent
allocations in progress at once.

The workqueue implementation in 2.6.32 only allows a single work
instance per workqueue thread, and so even with per-CPU worker
threads, would only allow one allocation at a time per CPU. This
adds additional serialisation within a filesystem, between
filesystem and potentially adds new deadlock conditions as well.

So it's not exactly obvious whether it can be backported in a sane
manner or not.

> Is there is a walkthrough on how to apply this patch?  If not, could your
> provide the steps necessary to apply successfully?  I would greatly
> appreciate it.

It would probably need redesigning and re-implementing from scratch
because of the above reasons. It'd then need a lot of testing and
review. As a workaround, you might be better off doing what Andi
first suggested - recompiling your kernel to use 16k stacks.

Cheers,

Dave.
--
Dave Chinner
[email protected]

-- 
Ryan C. England
Corvid Technologies <http://www.corvidtec.com/>
office: 704-799-6944 x158
cell:    980-521-2297

Fwd: XFS stack overflow

Reply via email to