On 14/10/14 06:38, Anton Ivanov wrote:
> How does the stall manifest itself?
>
> Do you have the journal thread (and sometimes a couple of other threads)
> sitting in D state?

Sorry, should not be asking questions at 6 am before the 3rd double 
espresso.

I think it is the same bug I am chasing - a stall in ubd, you hit it on 
swap while I hit it in normal operation on a swapless system. I see a 
stall in the journal instead of a backing dev stall.

If you apply the ubd patches out of my patchsets, you can trigger this 
one with ease. In theory, all they do is to make UBD faster so they 
should not by themselves introduce new races. They may however make the 
older ones more pronounced.

My working hypothesis is a race somewhere in the vm subsystem. I have 
been unable to nail it though.

A.

>
> A.
>
> On 13/10/14 22:48, Thomas Meyer wrote:
>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at 
>> mm/page-writeback.c:1587
>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized 
>> out>, address=<optimized out>, page_table=<optimized out>, pmd
>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, 
>> pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, 
>> vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, 
>> flags=<optimized out>) at mm/memory.c:3364
>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, 
>> ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>> imized out>) at arch/um/kernel/trap.c:75
>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized 
>> out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, 
>> unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>> 191
>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at 
>> arch/um/os-Linux/skas/process.c:429
>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>> #10 0x0000000000000000 in ?? ()
>>
>> backing_dev_info:
>> p *mapping->backing_dev_info
>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages 
>> = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, 
>> {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, 
>> write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 
>> 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>   = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 
>> 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, 
>> func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 
>> 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 
>> 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 
>> 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data 
>> fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 
>> 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, 
>> debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>
>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the 
>> system comes back to normal.
>>
>> any ideas what's going on here?
>>
>> with kind regards
>> thomas
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> User-mode-linux-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to