Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2012-04-02 Thread Ian Jackson
Ben Hutchings writes (Re: Lockups under heavy disk IO; md (RAID) resync/check implicated): There is a change in Linux 3.3, also intended to go into Linux 3.2.14, which looks like a fix for bug #584881. Thanks. I haven't experienced the bug in production in squeeze. I will try to set up a test

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2012-04-02 Thread Tim Small
FWIW, I was only able to reproduce the problem which I was seeing on lenny+openvz (running the same workload on lenny+chroot, or squeeze+openvz didn't trigger it). The fix you attached does sound like a plausible fix for the issue I was seeing (having spent a day or two peering at the code and

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2012-04-01 Thread Ben Hutchings
There is a change in Linux 3.3, also intended to go into Linux 3.2.14, which looks like a fix for bug #584881. I'm attaching a backported version of this bug fix for Debian 6.0 'squeeze', which you may wish to test. You can build a kernel package with this patch by following the instructions at

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2011-01-17 Thread Mark
Related: https://bugzilla.kernel.org/show_bug.cgi?id=12905 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-07-23 Thread Ian Jackson
I wrote: ... I'm going to compile the kernel again without that particular warning (and with kgdb support) and see if I can dig out anything interesting. My attempt to compile the kernel with kgdb support failed. Something in the Debian kernel packaging thingy complained like this: ABI has

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-07-23 Thread Ben Hutchings
On Fri, 2010-07-23 at 23:58 +0100, Ian Jackson wrote: I wrote: ... I'm going to compile the kernel again without that particular warning (and with kgdb support) and see if I can dig out anything interesting. My attempt to compile the kernel with kgdb support failed. Something in the

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-07-22 Thread Ben Hutchings
On Tue, 2010-07-20 at 22:51 +0100, Ian Jackson wrote: Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): Please try 2.6.34 from experimental. I've now replicated the problem on my coffee table with a temporary (intermittent) loan

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-25 Thread Ian Jackson
Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): I/O barriers are block I/O operations (not specific to md) that inhibit reordering of read and write operations. They certainly should not be blocking operations. Also, device-mapper did

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-25 Thread Ben Hutchings
On Fri, 2010-06-25 at 11:50 +0100, Ian Jackson wrote: Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): I/O barriers are block I/O operations (not specific to md) that inhibit reordering of read and write operations. They certainly

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-25 Thread Ian Jackson
Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): On Fri, 2010-06-25 at 11:50 +0100, Ian Jackson wrote: No, I think there are two meanings of the word barrier. AFAICT md has its own thing which it confusingly calls a barrier; it can

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-24 Thread Ian Jackson
Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): Even if you can't get a process dump, you can get some useful information with: Right, thanks. 'd' - show locks held 'l' - show backtrace for active CPUs 'w' - show uninterruptible tasks

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-24 Thread Ben Hutchings
On Thu, 2010-06-24 at 11:17 +0100, Ian Jackson wrote: Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): [...] Search the web suggests that symptoms very similar to mine are not uncommon, including instances without soft lockup messages

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-23 Thread Ben Hutchings
On Mon, 2010-06-21 at 11:11 +0100, Ian Jackson wrote: Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): We really need to see the kernel messages reporting soft-lockup. There aren't any. Or, if there are, it isn't printing them

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-21 Thread Ian Jackson
Ben Hutchings writes (Re: Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated): We really need to see the kernel messages reporting soft-lockup. There aren't any. Or, if there are, it isn't printing them to the serial console. Perhaps it is trying to send them only

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-20 Thread Ben Hutchings
On Mon, 2010-06-07 at 10:53 +0100, Ian Jackson wrote: Package: linux-image-2.6.26-2-686-bigmem Version: 2.6.26-21lenny4 I keep getting soft lockups; the symptoms appear to be that user processes become deadlocked when they try to access the disk, but the kernel doesn't notice that anything

Bug#584881: Lockups under heavy disk IO; md (RAID) resync/check implicated

2010-06-07 Thread Ian Jackson
Package: linux-image-2.6.26-2-686-bigmem Version: 2.6.26-21lenny4 I keep getting soft lockups; the symptoms appear to be that user processes become deadlocked when they try to access the disk, but the kernel doesn't notice that anything is wrong. It appears that the lockups happen when: (a) my