I am also having this problem running the 64-bit EBS-backed Alestic EC2
image ami-548c783d on c1.xlarge.

Here is how I am setting up my machine:

1.) Boot a fresh instance
2.) Install the Sun/Oracle Java6 JDK
3.) Download the heritrix web crawler from 
http://builds.archive.org:8080/maven2/org/archive/heritrix/heritrix/3.1.1-SNAPSHOT/
 - heritrix is a java program which runs in user land and crawls web sites and 
writes out the results to disks. Heritrix is installed into /mnt/heritrix due 
to the fact that it writes a very large BerkleyDB database the size of which 
exceeds the EBS device which is only 15 gig.
4.) Mount a 100 gig EBS volume formatted as ext3 to write the heritrix crawl 
results
5.) Start heritrix with 5gig max heap (-Xmx5g) and start a crawl job to crawl a 
couple hundred thousand web sites

Invariably within a couple hours, the machine "hangs" and any attempts
at I/O block eternally. The following warnings/errors appear in
/var/log/syslog:

Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290051] INFO: task 
kjournald:614 blocked for more than 120 seconds.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290068] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290076] kjournald     D 
ffff880003c9f980     0   614      2 0x00000000
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290082]  ffff8801b63d9c30 
0000000000000246 0000000000000000 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290088]  ffff8801b63d9fd8 
0000000000015980 ffff8801b63d9fd8 ffff8801b7dfadc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290092]  0000000000015980 
0000000000015980 ffff8801b63d9fd8 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290097] Call Trace:
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290127]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290133]  
[<ffffffff815a20f3>] io_schedule+0x73/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290136]  
[<ffffffff8117d555>] sync_buffer+0x45/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290139]  
[<ffffffff815a276f>] __wait_on_bit+0x5f/0x90
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290142]  
[<ffffffff8117c281>] ? submit_bh+0x111/0x140
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290145]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290148]  
[<ffffffff815a2818>] out_of_line_wait_on_bit+0x78/0x90
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290153]  
[<ffffffff8107f0c0>] ? wake_bit_function+0x0/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290156]  
[<ffffffff8117d506>] __wait_on_buffer+0x26/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290161]  
[<ffffffff812205b1>] journal_commit_transaction+0x2f1/0xe30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290166]  
[<ffffffff81006afd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290169]  
[<ffffffff81006adf>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290173]  
[<ffffffff815a404e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290178]  
[<ffffffff81070933>] ? try_to_del_timer_sync+0x83/0xe0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290181]  
[<ffffffff8122456d>] kjournald+0xed/0x250
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290185]  
[<ffffffff8107f080>] ? autoremove_wake_function+0x0/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290188]  
[<ffffffff81224480>] ? kjournald+0x0/0x250
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290191]  
[<ffffffff8107eb26>] kthread+0x96/0xa0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290195]  
[<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290199]  
[<ffffffff815a45dd>] ? retint_restore_args+0x5/0x6
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290202]  
[<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290217] INFO: task 
flush-202:16:19096 blocked for more than 120 seconds.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290225] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290232] flush-202:16  D 
ffff880003bcd980     0 19096      2 0x00000000
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290236]  ffff8800ffe39850 
0000000000000246 ffff880000000000 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290240]  ffff8800ffe39fd8 
0000000000015980 ffff8800ffe39fd8 ffff8801b631adc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290244]  0000000000015980 
0000000000015980 ffff8800ffe39fd8 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290248] Call Trace:
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290251]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290255]  
[<ffffffff815a20f3>] io_schedule+0x73/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290258]  
[<ffffffff8117d555>] sync_buffer+0x45/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290261]  
[<ffffffff815a261a>] __wait_on_bit_lock+0x5a/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290264]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290267]  
[<ffffffff8117dab0>] ? end_buffer_async_write+0x0/0x190
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290270]  
[<ffffffff815a26f8>] out_of_line_wait_on_bit_lock+0x78/0x90
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290273]  
[<ffffffff8107f0c0>] ? wake_bit_function+0x0/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290276]  
[<ffffffff8117d6d6>] __lock_buffer+0x36/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290279]  
[<ffffffff8117e3e3>] __block_write_full_page+0x373/0x3b0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290283]  
[<ffffffff81100b94>] ? end_page_writeback+0x44/0x60
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290286]  
[<ffffffff8117dab0>] ? end_buffer_async_write+0x0/0x190
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290289]  
[<ffffffff8117ed50>] block_write_full_page_endio+0xe0/0x120
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290294]  
[<ffffffff811c5db0>] ? buffer_unmapped+0x0/0x20
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290297]  
[<ffffffff8117eda5>] block_write_full_page+0x15/0x20
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290300]  
[<ffffffff811c692d>] ext3_ordered_writepage+0x1ed/0x230
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290304]  
[<ffffffff81109417>] __writepage+0x17/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290307]  
[<ffffffff8110a537>] write_cache_pages+0x1c7/0x3d0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290310]  
[<ffffffff81109400>] ? __writepage+0x0/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290314]  
[<ffffffff8110a764>] generic_writepages+0x24/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290317]  
[<ffffffff8110a7a5>] do_writepages+0x35/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290321]  
[<ffffffff81175356>] writeback_single_inode+0xe6/0x3f0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290324]  
[<ffffffff81175ab5>] writeback_sb_inodes+0x195/0x280
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290328]  
[<ffffffff811762d0>] writeback_inodes_wb+0xa0/0x1b0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290331]  
[<ffffffff8117662b>] wb_writeback+0x24b/0x2b0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290334]  
[<ffffffff8117680c>] wb_do_writeback+0x17c/0x190
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290338]  
[<ffffffff8106ffd0>] ? process_timeout+0x0/0x10
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290341]  
[<ffffffff81176873>] bdi_writeback_task+0x53/0x160
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290344]  
[<ffffffff8107ef47>] ? bit_waitqueue+0x17/0xd0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290348]  
[<ffffffff81119cb6>] bdi_start_fn+0x86/0x100
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290351]  
[<ffffffff81119c30>] ? bdi_start_fn+0x0/0x100
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290354]  
[<ffffffff8107eb26>] kthread+0x96/0xa0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290357]  
[<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290360]  
[<ffffffff815a45dd>] ? retint_restore_args+0x5/0x6
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290363]  
[<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10

Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290369] INFO: task 
java:19279 blocked for more than 120 seconds.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290380] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290387] java          D 
ffff880003c27980     0 19279      1 0x00000000
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290391]  ffff8801b545b928 
0000000000000286 0000000000000000 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290396]  ffff8801b545bfd8 
0000000000015980 ffff8801b545bfd8 ffff8801b7efdb80
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290400]  0000000000015980 
0000000000015980 ffff8801b545bfd8 0000000000015980
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290403] Call Trace:
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290407]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290410]  
[<ffffffff815a20f3>] io_schedule+0x73/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290413]  
[<ffffffff8117d555>] sync_buffer+0x45/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290416]  
[<ffffffff815a261a>] __wait_on_bit_lock+0x5a/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290419]  
[<ffffffff8117d510>] ? sync_buffer+0x0/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290422]  
[<ffffffff815a26f8>] out_of_line_wait_on_bit_lock+0x78/0x90
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290425]  
[<ffffffff8107f0c0>] ? wake_bit_function+0x0/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290428]  
[<ffffffff8117d6d6>] __lock_buffer+0x36/0x40
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290431]  
[<ffffffff8117e55e>] sync_dirty_buffer+0xbe/0xe0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290434]  
[<ffffffff8121f414>] journal_dirty_data+0x1d4/0x270
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290438]  
[<ffffffff811c6cf0>] ext3_journal_dirty_data+0x20/0x50
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290441]  
[<ffffffff811c6d45>] journal_dirty_data_fn+0x25/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290444]  
[<ffffffff811c5d37>] walk_page_buffers+0x87/0xc0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290447]  
[<ffffffff811c6d20>] ? journal_dirty_data_fn+0x0/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290452]  
[<ffffffff811ca1d4>] ext3_ordered_write_end+0x84/0x170
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290457]  
[<ffffffff810ffb36>] ? iov_iter_copy_from_user_atomic+0x96/0x170
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290460]  
[<ffffffff810ffec2>] generic_perform_write+0x122/0x1d0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290464]  
[<ffffffff810fffd4>] generic_file_buffered_write+0x64/0xa0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290467]  
[<ffffffff811028e0>] __generic_file_aio_write+0x240/0x470
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290476]  
[<ffffffff810838df>] ? hrtimer_try_to_cancel+0x3f/0xd0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290479]  
[<ffffffff81083992>] ? hrtimer_cancel+0x22/0x30
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290482]  
[<ffffffff81102b75>] generic_file_aio_write+0x65/0xd0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290486]  
[<ffffffff81152bea>] do_sync_write+0xda/0x120
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290489]  
[<ffffffff810043c6>] ? xen_mc_flush+0x96/0x1c0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290494]  
[<ffffffff81036e88>] ? pvclock_clocksource_read+0x58/0xd0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290498]  
[<ffffffff8128f208>] ? apparmor_file_permission+0x18/0x20
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290503]  
[<ffffffff8125e7a6>] ? security_file_permission+0x16/0x20
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290506]  
[<ffffffff81152ec8>] vfs_write+0xb8/0x1a0
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290509]  
[<ffffffff81153711>] sys_write+0x51/0x80
Nov 15 15:27:03 domU-12-31-39-0A-B6-71 kernel: [39840.290512]  
[<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

Doing a ps aux shows a [flush-202:16] process stuck in "D"
(uninterruptible sleep)

Note that there is absolutely NOTHING ELSE running on this machine
besides the "vanilla" processes built into this image and this one java
process and this reliably reproduces the hang.

The heritrix web crawler is very I/O heavy, multi-threaded application.
These are all the pieces of I/O it does:

* Network I/O - DNS lookups, HTTP fetches
* Disk I/O - Writing logs, writing data to the BerkleyDB disk store (on the 
ephemeral drive, not on the EBS root device), writing crawl data to the EBS 
drive

Please let me know if I can get you more information to help reproduce
or solve this bug.

-- 
maverick on ec2 64bit ext4 deadlock
https://bugs.launchpad.net/bugs/666211
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to