Re: Temporary lockup on loopback block device

2007-11-15 Thread Mikulas Patocka
> > On 2.6.23 it could happen even without loopback
> 
> Let's focus on this point, because we already know how the lockup
> happens _with_ loopback and any other kind of bdi stacking.
> 
> Can you describe the setup?  Or better still, can you reproduce it and
> post the sysrq-t output?

Hi

The trace is this, it is perfectly reproducible. It is 128M machine, 
Pentium 2 300MHz, host filesystem ext2, loop filesystems ext2 and spadfs 
(both of them locked up). But the problem is really over in 2.6.24, I 
think there is no more need to investigate it.

Mikulas

Nov 10 19:34:45 gerlinda kernel: SysRq : HELP : loglevel0-8 reBoot tErm 
Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync 
showTasks Unmount shoW-blocked-tasks
Nov 10 19:34:53 gerlinda kernel: SysRq : Show Blocked State
Nov 10 19:34:53 gerlinda kernel:   taskPC stack   pid 
father
Nov 10 19:34:54 gerlinda kernel: ddD 0286 0  4603   
2985
Nov 10 19:34:55 gerlinda kernel:c580bcdc 0086 c0308c20 
0286 0286 c580bcec 002a4e87 
Nov 10 19:34:55 gerlinda kernel:c580bd10 c0284bba c580bd1c 
 c03775e0 c03775e0 002a4e87 c011d050
Nov 10 19:34:55 gerlinda kernel:c117c030 c03771a0 0064 
c02f8eb4 c0283efe c580bd44 c0145ebc 
Nov 10 19:34:55 gerlinda kernel: Call Trace:
Nov 10 19:34:55 gerlinda kernel:  [] schedule_timeout+0x4a/0xc0
Nov 10 19:34:55 gerlinda kernel:  [] process_timeout+0x0/0x10
Nov 10 19:34:55 gerlinda kernel:  [] 
io_schedule_timeout+0xe/0x20
Nov 10 19:34:55 gerlinda kernel:  [] congestion_wait+0x6c/0x90
Nov 10 19:34:55 gerlinda kernel:  [] 
autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel:  
[] balance_dirty_pages_ratelimited_nr+0x11f/0x1e0
Nov 10 19:34:55 gerlinda kernel:  [] 
generic_file_buffered_write+0x2f8/0x6f0
Nov 10 19:34:55 gerlinda kernel:  [] irq_exit+0x47/0x70
Nov 10 19:34:55 gerlinda kernel:  [] do_IRQ+0x47/0x80
Nov 10 19:34:55 gerlinda kernel:  [] common_interrupt+0x23/0x28
Nov 10 19:34:55 gerlinda kernel:  [] 
__generic_file_aio_write_nolock+0x253/0x540
Nov 10 19:34:55 gerlinda kernel:  [] 
hrtimer_run_queues+0x6b/0x290
Nov 10 19:34:55 gerlinda kernel:  [] 
generic_file_aio_write+0x56/0xd0 Nov 10 19:34:55 gerlinda kernel:  
[] tick_handle_periodic+0xf/0x70
Nov 10 19:34:55 gerlinda kernel:  [] do_sync_write+0xc6/0x110
Nov 10 19:34:55 gerlinda kernel:  [] 
autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel:  
[] clear_user+0x2f/0x50
Nov 10 19:34:55 gerlinda kernel:  [] ptrace_notify+0x30/0x90
Nov 10 19:34:55 gerlinda kernel:  [] vfs_write+0xa6/0x140
Nov 10 19:34:55 gerlinda kernel:  [] SPADFS_FILE_WRITE+0x0/0x10 
[spadfs]
Nov 10 19:34:55 gerlinda kernel:  [] sys_write+0x41/0x70
Nov 10 19:34:55 gerlinda kernel:  [] syscall_call+0x7/0xb
Nov 10 19:34:55 gerlinda kernel:  ===


> Thanks,
> Miklos
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-15 Thread Mikulas Patocka
  On 2.6.23 it could happen even without loopback
 
 Let's focus on this point, because we already know how the lockup
 happens _with_ loopback and any other kind of bdi stacking.
 
 Can you describe the setup?  Or better still, can you reproduce it and
 post the sysrq-t output?

Hi

The trace is this, it is perfectly reproducible. It is 128M machine, 
Pentium 2 300MHz, host filesystem ext2, loop filesystems ext2 and spadfs 
(both of them locked up). But the problem is really over in 2.6.24, I 
think there is no more need to investigate it.

Mikulas

Nov 10 19:34:45 gerlinda kernel: SysRq : HELP : loglevel0-8 reBoot tErm 
Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync 
showTasks Unmount shoW-blocked-tasks
Nov 10 19:34:53 gerlinda kernel: SysRq : Show Blocked State
Nov 10 19:34:53 gerlinda kernel:   taskPC stack   pid 
father
Nov 10 19:34:54 gerlinda kernel: ddD 0286 0  4603   
2985
Nov 10 19:34:55 gerlinda kernel:c580bcdc 0086 c0308c20 
0286 0286 c580bcec 002a4e87 
Nov 10 19:34:55 gerlinda kernel:c580bd10 c0284bba c580bd1c 
 c03775e0 c03775e0 002a4e87 c011d050
Nov 10 19:34:55 gerlinda kernel:c117c030 c03771a0 0064 
c02f8eb4 c0283efe c580bd44 c0145ebc 
Nov 10 19:34:55 gerlinda kernel: Call Trace:
Nov 10 19:34:55 gerlinda kernel:  [c0284bba] schedule_timeout+0x4a/0xc0
Nov 10 19:34:55 gerlinda kernel:  [c011d050] process_timeout+0x0/0x10
Nov 10 19:34:55 gerlinda kernel:  [c0283efe] 
io_schedule_timeout+0xe/0x20
Nov 10 19:34:55 gerlinda kernel:  [c0145ebc] congestion_wait+0x6c/0x90
Nov 10 19:34:55 gerlinda kernel:  [c01274e0] 
autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel:  
[c014135f] balance_dirty_pages_ratelimited_nr+0x11f/0x1e0
Nov 10 19:34:55 gerlinda kernel:  [c013cb98] 
generic_file_buffered_write+0x2f8/0x6f0
Nov 10 19:34:55 gerlinda kernel:  [c01198b7] irq_exit+0x47/0x70
Nov 10 19:34:55 gerlinda kernel:  [c01049e7] do_IRQ+0x47/0x80
Nov 10 19:34:55 gerlinda kernel:  [c0102cbf] common_interrupt+0x23/0x28
Nov 10 19:34:55 gerlinda kernel:  [c013d1e3] 
__generic_file_aio_write_nolock+0x253/0x540
Nov 10 19:34:55 gerlinda kernel:  [c012a87b] 
hrtimer_run_queues+0x6b/0x290
Nov 10 19:34:55 gerlinda kernel:  [c013d526] 
generic_file_aio_write+0x56/0xd0 Nov 10 19:34:55 gerlinda kernel:  
[c012ed9f] tick_handle_periodic+0xf/0x70
Nov 10 19:34:55 gerlinda kernel:  [c015a1d6] do_sync_write+0xc6/0x110
Nov 10 19:34:55 gerlinda kernel:  [c01274e0] 
autoremove_wake_function+0x0/0x50Nov 10 19:34:55 gerlinda kernel:  
[c01c604f] clear_user+0x2f/0x50
Nov 10 19:34:55 gerlinda kernel:  [c012] ptrace_notify+0x30/0x90
Nov 10 19:34:55 gerlinda kernel:  [c015aa56] vfs_write+0xa6/0x140
Nov 10 19:34:55 gerlinda kernel:  [c8926310] SPADFS_FILE_WRITE+0x0/0x10 
[spadfs]
Nov 10 19:34:55 gerlinda kernel:  [c015b031] sys_write+0x41/0x70
Nov 10 19:34:55 gerlinda kernel:  [c0102b16] syscall_call+0x7/0xb
Nov 10 19:34:55 gerlinda kernel:  ===


 Thanks,
 Miklos
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-12 Thread Miklos Szeredi
> On 2.6.23 it could happen even without loopback

Let's focus on this point, because we already know how the lockup
happens _with_ loopback and any other kind of bdi stacking.

Can you describe the setup?  Or better still, can you reproduce it and
post the sysrq-t output?

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-12 Thread Miklos Szeredi
 On 2.6.23 it could happen even without loopback

Let's focus on this point, because we already know how the lockup
happens _with_ loopback and any other kind of bdi stacking.

Can you describe the setup?  Or better still, can you reproduce it and
post the sysrq-t output?

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-11 Thread Mikulas Patocka
> > Why are there over-limit dirty pages that no one is writing?
> 
> Please do a sysrq-t, and cat /proc/vmstat during the hang.  Those
> will show us what exactly is happening.

I did and I posted relevant information from my finding --- it looped in 
balance_dirty_pages.

> I've seen this type of hang many times, and I agree with Peter, that
> it's probably about loopback, and is fixed in 2.6.24-rc.

On 2.6.23 it could happen even without loopback --- loopback just made it 
happen very often. 2.6.24 seems ok.

Mikulas

> Thanks,
> Miklos
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-11 Thread Mikulas Patocka
  Why are there over-limit dirty pages that no one is writing?
 
 Please do a sysrq-t, and cat /proc/vmstat during the hang.  Those
 will show us what exactly is happening.

I did and I posted relevant information from my finding --- it looped in 
balance_dirty_pages.

 I've seen this type of hang many times, and I agree with Peter, that
 it's probably about loopback, and is fixed in 2.6.24-rc.

On 2.6.23 it could happen even without loopback --- loopback just made it 
happen very often. 2.6.24 seems ok.

Mikulas

 Thanks,
 Miklos
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Miklos Szeredi
> > > Arguably we just have the wrong backing-device here, and what we should do
> > > is to propagate the real backing device's pointer through up into the
> > > filesystem.  There's machinery for this which things like DM stacks use.
> > > 
> > > I wonder if the post-2.6.23 changes happened to make this problem go away.
> > 
> > The per BDI dirty stuff in 24 should make this work, I just checked and
> > loopback thingies seem to have their own BDI, so all should be well.
> 
> This is not only about loopback (I think the lockup can happen even 
> without loopback) --- the main problem is:
> 
> Why are there over-limit dirty pages that no one is writing?

Please do a sysrq-t, and cat /proc/vmstat during the hang.  Those
will show us what exactly is happening.

I've seen this type of hang many times, and I agree with Peter, that
it's probably about loopback, and is fixed in 2.6.24-rc.

Thanks,
Miklos


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
> > > Arguably we just have the wrong backing-device here, and what we 
> > > should do is to propagate the real backing device's pointer through 
> > > up into the filesystem.  There's machinery for this which things 
> > > like DM stacks use.

Just thinking about the new implementation --- you shouldn't really 
propagate physical block device's backing_device into loopback device.

If you leave it as is (each loop device has it's own backing store), you 
can nicely avoid the long-standing loopback deadlock coming from the fact 
that flushing one page on loopback device can generate several more dirty 
pages on the filesystem.

If you let loopback device and physical device have the same backing 
store, then it can go wild creating more and more dirty pages up to a 
memory exhaustion. If you let them have different backing stores, it can't 
happen --- loopback flushing will just wait until the pages on the 
filesystem are written.

Mikulas

> So I compiled it and I don't see any more lock-ups. The writeback loop 
> doesn't depend on any global page count, so the above scenario can't 
> happen here. Good.
> 
> Mikulas
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
On Sun, 11 Nov 2007, Mikulas Patocka wrote:

> On Sat, 10 Nov 2007, Andrew Morton wrote:
> 
> > On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <[EMAIL 
> > PROTECTED]> wrote:
> > 
> > > Hi
> > > 
> > > I am experiencing a transient lockup in 'D' state with loopback device. 
> > > It 
> > > happens when process writes to a filesystem in loopback with command like
> > > dd if=/dev/zero of=/s/fill bs=4k 
> > > 
> > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
> > > congestion_wait called from balance_dirty_pages.
> > > 
> > > After about 30 seconds, the lockup is gone and dd resumes, but it locks 
> > > up 
> > > soon again.
> > > 
> > > I added a printk to the balance_dirty_pages
> > > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
> > > pages_written %d, write_chunk %d\n", nr_reclaimable, 
> > > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
> > > write_chunk);
> > > 
> > > and it shows this during the lockup:
> > > 
> > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > > pages_written 1021, write_chunk 1522
> > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > > pages_written 1021, write_chunk 1522
> > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > > pages_written 1021, write_chunk 1522
> > > 
> > > What apparently happens:
> > > 
> > > writeback_inodes syncs inodes only on the given wbc->bdi, however 
> > > balance_dirty_pages checks against global counts of dirty pages. So if 
> > > there's nothing to sync on a given device, but there are other dirty 
> > > pages 
> > > so that the counts are over the limit, it will loop without doing any 
> > > work.
> > > 
> > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
> > > something writes to the backing device, it flushes the dirty pages 
> > > generated by the loopback and the lockup is gone. If you add printk, 
> > > don't 
> > > forget to stop klogd, otherwise logging would end the lockup.
> > 
> > erk.
> > 
> > > The hotfix (that I verified to work) is to not set wbc->bdi, so that all 
> > > devices are flushed ... but the code probably needs some redesign (i.e. 
> > > either account per-device and flush per-device, or account-global and 
> > > flush-global).
> > > 
> > > Mikulas
> > > 
> > > 
> > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
> > > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
> > > 18:43:44.0 +0200
> > > +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
> > > @@ -214,7 +214,6 @@
> > > 
> > >   for (;;) {
> > >   struct writeback_control wbc = {
> > > - .bdi= bdi,
> > >   .sync_mode  = WB_SYNC_NONE,
> > >   .older_than_this = NULL,
> > >   .nr_to_write= write_chunk,
> > 
> > Arguably we just have the wrong backing-device here, and what we should do
> > is to propagate the real backing device's pointer through up into the
> > filesystem.  There's machinery for this which things like DM stacks use.
> 
> If you change loopback backing-device, you just turn this nicely 
> reproducible example into a subtle race condition that can happen whenever 
> you use loopback or not. Think, what happens when different process 
> dirties memory:
> 
> You have process "A" that dirtied a lot of pages on device "1" but has not 
> started writing them.
> You have process "B" that is trying to write to device "2", sees dirty 
> page count over limit, but can't do anything about it, because it is only 
> allowed to flush pages on device "2". --- so it endlessly loops.
> 
> If you want to use the current flushing semantics, you just have to audit 
> the whole kernel to make sure that if some process sees over-limit dirty 
> page count, there is another process that is flushing the pages. Currently 
> it is not true, the "dd" process sees over-limit count, but there is 
> no-one writing.
> 
> > I wonder if the post-2.6.23 changes happened to make this problem go away.
> 
> I will try 2.6.24-rc2, but I don't think the root cause of this went away. 
> Maybe you just reduced probability.
> 
> Mikulas

So I compiled it and I don't see any more lock-ups. The writeback loop 
doesn't depend on any global page count, so the above scenario can't 
happen here. Good.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
> > Arguably we just have the wrong backing-device here, and what we should do
> > is to propagate the real backing device's pointer through up into the
> > filesystem.  There's machinery for this which things like DM stacks use.
> > 
> > I wonder if the post-2.6.23 changes happened to make this problem go away.
> 
> The per BDI dirty stuff in 24 should make this work, I just checked and
> loopback thingies seem to have their own BDI, so all should be well.

This is not only about loopback (I think the lockup can happen even 
without loopback) --- the main problem is:

Why are there over-limit dirty pages that no one is writing?

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka


On Sat, 10 Nov 2007, Andrew Morton wrote:

> On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <[EMAIL PROTECTED]> 
> wrote:
> 
> > Hi
> > 
> > I am experiencing a transient lockup in 'D' state with loopback device. It 
> > happens when process writes to a filesystem in loopback with command like
> > dd if=/dev/zero of=/s/fill bs=4k 
> > 
> > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
> > congestion_wait called from balance_dirty_pages.
> > 
> > After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
> > soon again.
> > 
> > I added a printk to the balance_dirty_pages
> > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
> > pages_written %d, write_chunk %d\n", nr_reclaimable, 
> > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
> > write_chunk);
> > 
> > and it shows this during the lockup:
> > 
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > 
> > What apparently happens:
> > 
> > writeback_inodes syncs inodes only on the given wbc->bdi, however 
> > balance_dirty_pages checks against global counts of dirty pages. So if 
> > there's nothing to sync on a given device, but there are other dirty pages 
> > so that the counts are over the limit, it will loop without doing any 
> > work.
> > 
> > To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
> > something writes to the backing device, it flushes the dirty pages 
> > generated by the loopback and the lockup is gone. If you add printk, don't 
> > forget to stop klogd, otherwise logging would end the lockup.
> 
> erk.
> 
> > The hotfix (that I verified to work) is to not set wbc->bdi, so that all 
> > devices are flushed ... but the code probably needs some redesign (i.e. 
> > either account per-device and flush per-device, or account-global and 
> > flush-global).
> > 
> > Mikulas
> > 
> > 
> > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
> > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
> > 18:43:44.0 +0200
> > +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
> > @@ -214,7 +214,6 @@
> > 
> > for (;;) {
> > struct writeback_control wbc = {
> > -   .bdi= bdi,
> > .sync_mode  = WB_SYNC_NONE,
> > .older_than_this = NULL,
> > .nr_to_write= write_chunk,
> 
> Arguably we just have the wrong backing-device here, and what we should do
> is to propagate the real backing device's pointer through up into the
> filesystem.  There's machinery for this which things like DM stacks use.

If you change loopback backing-device, you just turn this nicely 
reproducible example into a subtle race condition that can happen whenever 
you use loopback or not. Think, what happens when different process 
dirties memory:

You have process "A" that dirtied a lot of pages on device "1" but has not 
started writing them.
You have process "B" that is trying to write to device "2", sees dirty 
page count over limit, but can't do anything about it, because it is only 
allowed to flush pages on device "2". --- so it endlessly loops.

If you want to use the current flushing semantics, you just have to audit 
the whole kernel to make sure that if some process sees over-limit dirty 
page count, there is another process that is flushing the pages. Currently 
it is not true, the "dd" process sees over-limit count, but there is 
no-one writing.

> I wonder if the post-2.6.23 changes happened to make this problem go away.

I will try 2.6.24-rc2, but I don't think the root cause of this went away. 
Maybe you just reduced probability.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Peter Zijlstra

On Sat, 2007-11-10 at 14:54 -0800, Andrew Morton wrote:
> On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <[EMAIL PROTECTED]> 
> wrote:
> 
> > Hi
> > 
> > I am experiencing a transient lockup in 'D' state with loopback device. It 
> > happens when process writes to a filesystem in loopback with command like
> > dd if=/dev/zero of=/s/fill bs=4k 
> > 
> > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
> > congestion_wait called from balance_dirty_pages.
> > 
> > After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
> > soon again.
> > 
> > I added a printk to the balance_dirty_pages
> > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
> > pages_written %d, write_chunk %d\n", nr_reclaimable, 
> > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
> > write_chunk);
> > 
> > and it shows this during the lockup:
> > 
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> > pages_written 1021, write_chunk 1522
> > 
> > What apparently happens:
> > 
> > writeback_inodes syncs inodes only on the given wbc->bdi, however 
> > balance_dirty_pages checks against global counts of dirty pages. So if 
> > there's nothing to sync on a given device, but there are other dirty pages 
> > so that the counts are over the limit, it will loop without doing any 
> > work.
> > 
> > To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
> > something writes to the backing device, it flushes the dirty pages 
> > generated by the loopback and the lockup is gone. If you add printk, don't 
> > forget to stop klogd, otherwise logging would end the lockup.
> 
> erk.

known issue.

> > The hotfix (that I verified to work) is to not set wbc->bdi, so that all 
> > devices are flushed ... but the code probably needs some redesign (i.e. 
> > either account per-device and flush per-device, or account-global and 
> > flush-global).

.24 will have the per-device solution.

> > 
> > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
> > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
> > 18:43:44.0 +0200
> > +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
> > @@ -214,7 +214,6 @@
> > 
> > for (;;) {
> > struct writeback_control wbc = {
> > -   .bdi= bdi,
> > .sync_mode  = WB_SYNC_NONE,
> > .older_than_this = NULL,
> > .nr_to_write= write_chunk,
> 
> Arguably we just have the wrong backing-device here, and what we should do
> is to propagate the real backing device's pointer through up into the
> filesystem.  There's machinery for this which things like DM stacks use.
> 
> I wonder if the post-2.6.23 changes happened to make this problem go away.

The per BDI dirty stuff in 24 should make this work, I just checked and
loopback thingies seem to have their own BDI, so all should be well.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Andrew Morton
On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka <[EMAIL PROTECTED]> 
wrote:

> Hi
> 
> I am experiencing a transient lockup in 'D' state with loopback device. It 
> happens when process writes to a filesystem in loopback with command like
> dd if=/dev/zero of=/s/fill bs=4k 
> 
> CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
> congestion_wait called from balance_dirty_pages.
> 
> After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
> soon again.
> 
> I added a printk to the balance_dirty_pages
> printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
> pages_written %d, write_chunk %d\n", nr_reclaimable, 
> global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
> write_chunk);
> 
> and it shows this during the lockup:
> 
> wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> pages_written 1021, write_chunk 1522
> wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> pages_written 1021, write_chunk 1522
> wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
> pages_written 1021, write_chunk 1522
> 
> What apparently happens:
> 
> writeback_inodes syncs inodes only on the given wbc->bdi, however 
> balance_dirty_pages checks against global counts of dirty pages. So if 
> there's nothing to sync on a given device, but there are other dirty pages 
> so that the counts are over the limit, it will loop without doing any 
> work.
> 
> To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
> something writes to the backing device, it flushes the dirty pages 
> generated by the loopback and the lockup is gone. If you add printk, don't 
> forget to stop klogd, otherwise logging would end the lockup.

erk.

> The hotfix (that I verified to work) is to not set wbc->bdi, so that all 
> devices are flushed ... but the code probably needs some redesign (i.e. 
> either account per-device and flush per-device, or account-global and 
> flush-global).
> 
> Mikulas
> 
> 
> diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
> --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.0 
> +0200
> +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
> @@ -214,7 +214,6 @@
> 
>   for (;;) {
>   struct writeback_control wbc = {
> - .bdi= bdi,
>   .sync_mode  = WB_SYNC_NONE,
>   .older_than_this = NULL,
>   .nr_to_write= write_chunk,

Arguably we just have the wrong backing-device here, and what we should do
is to propagate the real backing device's pointer through up into the
filesystem.  There's machinery for this which things like DM stacks use.

I wonder if the post-2.6.23 changes happened to make this problem go away.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
Hi

I am experiencing a transient lockup in 'D' state with loopback device. It 
happens when process writes to a filesystem in loopback with command like
dd if=/dev/zero of=/s/fill bs=4k 

CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
congestion_wait called from balance_dirty_pages.

After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
soon again.

I added a printk to the balance_dirty_pages
printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
pages_written %d, write_chunk %d\n", nr_reclaimable, 
global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
write_chunk);

and it shows this during the lockup:

wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522

What apparently happens:

writeback_inodes syncs inodes only on the given wbc->bdi, however 
balance_dirty_pages checks against global counts of dirty pages. So if 
there's nothing to sync on a given device, but there are other dirty pages 
so that the counts are over the limit, it will loop without doing any 
work.

To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
something writes to the backing device, it flushes the dirty pages 
generated by the loopback and the lockup is gone. If you add printk, don't 
forget to stop klogd, otherwise logging would end the lockup.

The hotfix (that I verified to work) is to not set wbc->bdi, so that all 
devices are flushed ... but the code probably needs some redesign (i.e. 
either account per-device and flush per-device, or account-global and 
flush-global).

Mikulas


diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
--- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.0 
+0200
+++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
@@ -214,7 +214,6 @@

for (;;) {
struct writeback_control wbc = {
-   .bdi= bdi,
.sync_mode  = WB_SYNC_NONE,
.older_than_this = NULL,
.nr_to_write= write_chunk,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
Hi

I am experiencing a transient lockup in 'D' state with loopback device. It 
happens when process writes to a filesystem in loopback with command like
dd if=/dev/zero of=/s/fill bs=4k 

CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
congestion_wait called from balance_dirty_pages.

After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
soon again.

I added a printk to the balance_dirty_pages
printk(wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
pages_written %d, write_chunk %d\n, nr_reclaimable, 
global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
write_chunk);

and it shows this during the lockup:

wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522
wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
pages_written 1021, write_chunk 1522

What apparently happens:

writeback_inodes syncs inodes only on the given wbc-bdi, however 
balance_dirty_pages checks against global counts of dirty pages. So if 
there's nothing to sync on a given device, but there are other dirty pages 
so that the counts are over the limit, it will loop without doing any 
work.

To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
something writes to the backing device, it flushes the dirty pages 
generated by the loopback and the lockup is gone. If you add printk, don't 
forget to stop klogd, otherwise logging would end the lockup.

The hotfix (that I verified to work) is to not set wbc-bdi, so that all 
devices are flushed ... but the code probably needs some redesign (i.e. 
either account per-device and flush per-device, or account-global and 
flush-global).

Mikulas


diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
--- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.0 
+0200
+++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
@@ -214,7 +214,6 @@

for (;;) {
struct writeback_control wbc = {
-   .bdi= bdi,
.sync_mode  = WB_SYNC_NONE,
.older_than_this = NULL,
.nr_to_write= write_chunk,

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Andrew Morton
On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka [EMAIL PROTECTED] 
wrote:

 Hi
 
 I am experiencing a transient lockup in 'D' state with loopback device. It 
 happens when process writes to a filesystem in loopback with command like
 dd if=/dev/zero of=/s/fill bs=4k 
 
 CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
 congestion_wait called from balance_dirty_pages.
 
 After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
 soon again.
 
 I added a printk to the balance_dirty_pages
 printk(wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
 pages_written %d, write_chunk %d\n, nr_reclaimable, 
 global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
 write_chunk);
 
 and it shows this during the lockup:
 
 wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
 pages_written 1021, write_chunk 1522
 wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
 pages_written 1021, write_chunk 1522
 wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
 pages_written 1021, write_chunk 1522
 
 What apparently happens:
 
 writeback_inodes syncs inodes only on the given wbc-bdi, however 
 balance_dirty_pages checks against global counts of dirty pages. So if 
 there's nothing to sync on a given device, but there are other dirty pages 
 so that the counts are over the limit, it will loop without doing any 
 work.
 
 To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
 something writes to the backing device, it flushes the dirty pages 
 generated by the loopback and the lockup is gone. If you add printk, don't 
 forget to stop klogd, otherwise logging would end the lockup.

erk.

 The hotfix (that I verified to work) is to not set wbc-bdi, so that all 
 devices are flushed ... but the code probably needs some redesign (i.e. 
 either account per-device and flush per-device, or account-global and 
 flush-global).
 
 Mikulas
 
 
 diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
 --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.0 
 +0200
 +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
 @@ -214,7 +214,6 @@
 
   for (;;) {
   struct writeback_control wbc = {
 - .bdi= bdi,
   .sync_mode  = WB_SYNC_NONE,
   .older_than_this = NULL,
   .nr_to_write= write_chunk,

Arguably we just have the wrong backing-device here, and what we should do
is to propagate the real backing device's pointer through up into the
filesystem.  There's machinery for this which things like DM stacks use.

I wonder if the post-2.6.23 changes happened to make this problem go away.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Peter Zijlstra

On Sat, 2007-11-10 at 14:54 -0800, Andrew Morton wrote:
 On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka [EMAIL PROTECTED] 
 wrote:
 
  Hi
  
  I am experiencing a transient lockup in 'D' state with loopback device. It 
  happens when process writes to a filesystem in loopback with command like
  dd if=/dev/zero of=/s/fill bs=4k 
  
  CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
  congestion_wait called from balance_dirty_pages.
  
  After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
  soon again.
  
  I added a printk to the balance_dirty_pages
  printk(wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
  pages_written %d, write_chunk %d\n, nr_reclaimable, 
  global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
  write_chunk);
  
  and it shows this during the lockup:
  
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  
  What apparently happens:
  
  writeback_inodes syncs inodes only on the given wbc-bdi, however 
  balance_dirty_pages checks against global counts of dirty pages. So if 
  there's nothing to sync on a given device, but there are other dirty pages 
  so that the counts are over the limit, it will loop without doing any 
  work.
  
  To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
  something writes to the backing device, it flushes the dirty pages 
  generated by the loopback and the lockup is gone. If you add printk, don't 
  forget to stop klogd, otherwise logging would end the lockup.
 
 erk.

known issue.

  The hotfix (that I verified to work) is to not set wbc-bdi, so that all 
  devices are flushed ... but the code probably needs some redesign (i.e. 
  either account per-device and flush per-device, or account-global and 
  flush-global).

.24 will have the per-device solution.

  
  diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
  --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
  18:43:44.0 +0200
  +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
  @@ -214,7 +214,6 @@
  
  for (;;) {
  struct writeback_control wbc = {
  -   .bdi= bdi,
  .sync_mode  = WB_SYNC_NONE,
  .older_than_this = NULL,
  .nr_to_write= write_chunk,
 
 Arguably we just have the wrong backing-device here, and what we should do
 is to propagate the real backing device's pointer through up into the
 filesystem.  There's machinery for this which things like DM stacks use.
 
 I wonder if the post-2.6.23 changes happened to make this problem go away.

The per BDI dirty stuff in 24 should make this work, I just checked and
loopback thingies seem to have their own BDI, so all should be well.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
  Arguably we just have the wrong backing-device here, and what we should do
  is to propagate the real backing device's pointer through up into the
  filesystem.  There's machinery for this which things like DM stacks use.
  
  I wonder if the post-2.6.23 changes happened to make this problem go away.
 
 The per BDI dirty stuff in 24 should make this work, I just checked and
 loopback thingies seem to have their own BDI, so all should be well.

This is not only about loopback (I think the lockup can happen even 
without loopback) --- the main problem is:

Why are there over-limit dirty pages that no one is writing?

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka


On Sat, 10 Nov 2007, Andrew Morton wrote:

 On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka [EMAIL PROTECTED] 
 wrote:
 
  Hi
  
  I am experiencing a transient lockup in 'D' state with loopback device. It 
  happens when process writes to a filesystem in loopback with command like
  dd if=/dev/zero of=/s/fill bs=4k 
  
  CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
  congestion_wait called from balance_dirty_pages.
  
  After about 30 seconds, the lockup is gone and dd resumes, but it locks up 
  soon again.
  
  I added a printk to the balance_dirty_pages
  printk(wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
  pages_written %d, write_chunk %d\n, nr_reclaimable, 
  global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
  write_chunk);
  
  and it shows this during the lockup:
  
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
  pages_written 1021, write_chunk 1522
  
  What apparently happens:
  
  writeback_inodes syncs inodes only on the given wbc-bdi, however 
  balance_dirty_pages checks against global counts of dirty pages. So if 
  there's nothing to sync on a given device, but there are other dirty pages 
  so that the counts are over the limit, it will loop without doing any 
  work.
  
  To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
  something writes to the backing device, it flushes the dirty pages 
  generated by the loopback and the lockup is gone. If you add printk, don't 
  forget to stop klogd, otherwise logging would end the lockup.
 
 erk.
 
  The hotfix (that I verified to work) is to not set wbc-bdi, so that all 
  devices are flushed ... but the code probably needs some redesign (i.e. 
  either account per-device and flush per-device, or account-global and 
  flush-global).
  
  Mikulas
  
  
  diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
  --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
  18:43:44.0 +0200
  +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
  @@ -214,7 +214,6 @@
  
  for (;;) {
  struct writeback_control wbc = {
  -   .bdi= bdi,
  .sync_mode  = WB_SYNC_NONE,
  .older_than_this = NULL,
  .nr_to_write= write_chunk,
 
 Arguably we just have the wrong backing-device here, and what we should do
 is to propagate the real backing device's pointer through up into the
 filesystem.  There's machinery for this which things like DM stacks use.

If you change loopback backing-device, you just turn this nicely 
reproducible example into a subtle race condition that can happen whenever 
you use loopback or not. Think, what happens when different process 
dirties memory:

You have process A that dirtied a lot of pages on device 1 but has not 
started writing them.
You have process B that is trying to write to device 2, sees dirty 
page count over limit, but can't do anything about it, because it is only 
allowed to flush pages on device 2. --- so it endlessly loops.

If you want to use the current flushing semantics, you just have to audit 
the whole kernel to make sure that if some process sees over-limit dirty 
page count, there is another process that is flushing the pages. Currently 
it is not true, the dd process sees over-limit count, but there is 
no-one writing.

 I wonder if the post-2.6.23 changes happened to make this problem go away.

I will try 2.6.24-rc2, but I don't think the root cause of this went away. 
Maybe you just reduced probability.

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
On Sun, 11 Nov 2007, Mikulas Patocka wrote:

 On Sat, 10 Nov 2007, Andrew Morton wrote:
 
  On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka [EMAIL 
  PROTECTED] wrote:
  
   Hi
   
   I am experiencing a transient lockup in 'D' state with loopback device. 
   It 
   happens when process writes to a filesystem in loopback with command like
   dd if=/dev/zero of=/s/fill bs=4k 
   
   CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in 
   congestion_wait called from balance_dirty_pages.
   
   After about 30 seconds, the lockup is gone and dd resumes, but it locks 
   up 
   soon again.
   
   I added a printk to the balance_dirty_pages
   printk(wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, 
   pages_written %d, write_chunk %d\n, nr_reclaimable, 
   global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, 
   write_chunk);
   
   and it shows this during the lockup:
   
   wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
   pages_written 1021, write_chunk 1522
   wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
   pages_written 1021, write_chunk 1522
   wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, 
   pages_written 1021, write_chunk 1522
   
   What apparently happens:
   
   writeback_inodes syncs inodes only on the given wbc-bdi, however 
   balance_dirty_pages checks against global counts of dirty pages. So if 
   there's nothing to sync on a given device, but there are other dirty 
   pages 
   so that the counts are over the limit, it will loop without doing any 
   work.
   
   To reproduce it, you need totally idle machine (no GUI, etc.) -- if 
   something writes to the backing device, it flushes the dirty pages 
   generated by the loopback and the lockup is gone. If you add printk, 
   don't 
   forget to stop klogd, otherwise logging would end the lockup.
  
  erk.
  
   The hotfix (that I verified to work) is to not set wbc-bdi, so that all 
   devices are flushed ... but the code probably needs some redesign (i.e. 
   either account per-device and flush per-device, or account-global and 
   flush-global).
   
   Mikulas
   
   
   diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c
   --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 
   18:43:44.0 +0200
   +++ mm/page-writeback.c 2007-11-10 20:32:43.0 +0100
   @@ -214,7 +214,6 @@
   
 for (;;) {
 struct writeback_control wbc = {
   - .bdi= bdi,
 .sync_mode  = WB_SYNC_NONE,
 .older_than_this = NULL,
 .nr_to_write= write_chunk,
  
  Arguably we just have the wrong backing-device here, and what we should do
  is to propagate the real backing device's pointer through up into the
  filesystem.  There's machinery for this which things like DM stacks use.
 
 If you change loopback backing-device, you just turn this nicely 
 reproducible example into a subtle race condition that can happen whenever 
 you use loopback or not. Think, what happens when different process 
 dirties memory:
 
 You have process A that dirtied a lot of pages on device 1 but has not 
 started writing them.
 You have process B that is trying to write to device 2, sees dirty 
 page count over limit, but can't do anything about it, because it is only 
 allowed to flush pages on device 2. --- so it endlessly loops.
 
 If you want to use the current flushing semantics, you just have to audit 
 the whole kernel to make sure that if some process sees over-limit dirty 
 page count, there is another process that is flushing the pages. Currently 
 it is not true, the dd process sees over-limit count, but there is 
 no-one writing.
 
  I wonder if the post-2.6.23 changes happened to make this problem go away.
 
 I will try 2.6.24-rc2, but I don't think the root cause of this went away. 
 Maybe you just reduced probability.
 
 Mikulas

So I compiled it and I don't see any more lock-ups. The writeback loop 
doesn't depend on any global page count, so the above scenario can't 
happen here. Good.

Mikulas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Mikulas Patocka
   Arguably we just have the wrong backing-device here, and what we 
   should do is to propagate the real backing device's pointer through 
   up into the filesystem.  There's machinery for this which things 
   like DM stacks use.

Just thinking about the new implementation --- you shouldn't really 
propagate physical block device's backing_device into loopback device.

If you leave it as is (each loop device has it's own backing store), you 
can nicely avoid the long-standing loopback deadlock coming from the fact 
that flushing one page on loopback device can generate several more dirty 
pages on the filesystem.

If you let loopback device and physical device have the same backing 
store, then it can go wild creating more and more dirty pages up to a 
memory exhaustion. If you let them have different backing stores, it can't 
happen --- loopback flushing will just wait until the pages on the 
filesystem are written.

Mikulas

 So I compiled it and I don't see any more lock-ups. The writeback loop 
 doesn't depend on any global page count, so the above scenario can't 
 happen here. Good.
 
 Mikulas
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Temporary lockup on loopback block device

2007-11-10 Thread Miklos Szeredi
   Arguably we just have the wrong backing-device here, and what we should do
   is to propagate the real backing device's pointer through up into the
   filesystem.  There's machinery for this which things like DM stacks use.
   
   I wonder if the post-2.6.23 changes happened to make this problem go away.
  
  The per BDI dirty stuff in 24 should make this work, I just checked and
  loopback thingies seem to have their own BDI, so all should be well.
 
 This is not only about loopback (I think the lockup can happen even 
 without loopback) --- the main problem is:
 
 Why are there over-limit dirty pages that no one is writing?

Please do a sysrq-t, and cat /proc/vmstat during the hang.  Those
will show us what exactly is happening.

I've seen this type of hang many times, and I agree with Peter, that
it's probably about loopback, and is fixed in 2.6.24-rc.

Thanks,
Miklos


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/