I had been triggering a similar bug across a number of m1.large
instances in different availability-zones running 2.6.35-22-virtual.
Mostly, the systems were running a number of rake tasks that were
migrating data from one remote database to another, and then either (1)
logging data locally or (2) mv'ing small files to new locations on the
root ext4 filesystem.  In either case, i/o to the device would
eventually block. Thinking it might be ext4 specific, I tried XFS and
ext3 and both eventually ran into the same problem.

I wasn't able to reproduce this on-demand, but it happened consistently
enough.  I found I was able to reproduce it more easily using DRBD.
Creating a simple DRBD cluster and initiating the initial
synchronization between nodes, the secondary node (sync target) would
eventually stall out while its backing device deadlocked.

After upgrading to 2.6.35-28-virtual across all instances, I found the
issue gone.  I can only assume it was resolved by the  upstream fixes
Stefan mentioned above:

  * xen: handle events as edge-triggered
  * xen: use percpu interrupts for IPIs and VIRQs

...which were applied to the 2.6.35-23 kernel.

Can anyone else confirm that upgrading from 2.6.35-22 to later maverick
kernels resolves their issues?

Attached are some traces from  3 instances as well as a URL to a thread
on the AWS support forum describing similar behavior.

https://forums.aws.amazon.com/thread.jspa?messageID=224301


** Attachment added: "drbd blocked on secondary node"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/666211/+attachment/2027203/+files/drbd.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/666211

Title:
  maverick on ec2 64bit ext4 deadlock

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to