Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Torsten Kaiser
On 11/1/07, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> On 11/1/07, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > Thank you. Maybe we can start by the applied debug patch :-)
>
> Will applied it and try to recreate this.

Patch applied, used emerge to install a 2.6.24-rc1 kernel.

I had no complete stalls, but three times during the move from tmpfs
to the main xfs the emerge got noticeable slower. There still was
writeout happening, but as emerge prints out every file it has written
during the pause not one file was processed.

vmstat 10:
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  1  0 3146424332 61476800   134  1849  438 2515  3  4 91  2
 0  0  0 3146644332 61478400 2  1628  507  646  0  2 85 13
 0  0  0 3146868332 61486800 5  2359  527 1076  0  3 97  0
 1  0  0 3144372332 6161480096  2829  607 2666  2  5 92  0
-> normal writeout
 0  0  0 3140560332 61814400   152  2764  633 3308  3  6 91  0
 0  0  0 3137332332 61990800   114  1801  588 2858  3  4 93  0
 0  0  0 3136912332 6201360020   827  393 1605  1  2 98  0
-> first stall
 0  0  0 3137088332 62013600 0   557  339 1437  0  1 99  0
 0  0  0 3137160332 62013600 0   642  310 1400  0  1 99  0
 0  0  0 3136588332 62017200 6  2972  527 1195  0  3 80 16
 0  0  0 3136276332 6203480010  2668  558 1195  0  3 96  0
 0  0  0 3135228332 62042400 8  2712  522 1311  0  4 96  0
 0  0  0 3131740332 6215240075  2935  559 2457  2  5 93  0
 0  0  0 3128348332 6229720085  1470  490 2607  3  4 93  0
 0  0  0 3129292332 62297200 0   527  353 1398  0  1 99  0
-> second longer stall
 0  0  0 3128520332 62302800 6   488  249 1390  0  1 99  0
 0  0  0 3128236332 62302800 0   482  222 1222  0  1 99  0
 0  0  0 3128408332 62302800 0   585  269 1301  0  0 99  0
 0  0  0 3128532332 62302800 0   610  262 1278  0  0 99  0
 0  0  0 3128568332 62302800 0   636  345 1639  0  1 99  0
 0  0  0 3129032332 62304000 1   664  337 1466  0  1 99  0
 0  0  0 3129484332 62304000 0   658  300 1508  0  0 100  0
 0  0  0 3129576332 62304000 0   562  271 1454  0  1 99  0
 0  0  0 3129736332 62304000 0   627  278 1406  0  1 99  0
 0  0  0 3129368332 62304000 0   507  274 1301  0  1 99  0
 0  0  0 3129004332 62304000 0   444  211 1213  0  0 99  0
 0  1  0 3127260332 62304000 0  1036  305 1242  0  1 95  4
 0  0  0 3126280332 62312800 7  4241  555 1575  1  5 84 10
 0  0  0 3124948332 62323200 6  4194  529 1505  1  4 95  0
 0  0  0 3125228332 6241680058  1966  586 1964  2  4 94  0
-> emerge resumed to normal speed, without any intervention from my side
 0  0  0 3120932332 62590400   112  1546  546 2565  3  4 93  0
 0  0  0 3118012332 62756800   128  1542  612 2705  3  4 93  0


>From syslog:
first stall:
[  575.05] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 47259
global 610 0 0 wc __ tw 1023 sk 0
[  586.35] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 50465
global 6117 0 0 wc _M tw 967 sk 0
[  586.36] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 50408
global 6117 0 0 wc __ tw 1022 sk 0
[  599.90] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 53523
global 11141 0 0 wc __ tw 1009 sk 0
[  635.78] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 59397
global 12757 124 0 wc __ tw 0 sk 0
[  638.47] mm/page-writeback.c 418 balance_dirty_pages:
emerge(6113) 1536 global 11405 51 0 wc __ tw 0 sk 0
[  638.82] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 58373
global 11276 48 0 wc __ tw -1 sk 0
[  641.26] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 57348
global 10565 100 0 wc __ tw 0 sk 0
[  643.98] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 56324
global 9788 103 0 wc __ tw -1 sk 0
[  646.12] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 55299
global 8912 6 0 wc __ tw 0 sk 0

second stall:
[  664.04] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 48117
global 2864 81 0 wc _M tw -13 sk 0
[  664.40] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 47080
global 1995 137 0 wc _M tw 176 sk 0
[  664.51] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 46232
global 1929 267 0 wc __ tw 880 sk 0
cron[6927]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
[  809.56] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 49422
global 19166 217 0 wc _M tw 380 sk 0
[  811.72] mm/page-writeback.c 655 

Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Torsten Kaiser
On 11/1/07, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote:
> > Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
> > Each time I noticed this I was using emerge (package util from the
> > gentoo distribution) to install/upgrade a package. The last step,
> > where this hang occurred, is moving the prepared files from a tmpfs
> > partion to the main xfs filesystem.
> > The hangs where not fatal, after a few second everything resumed
> > normal, so I was not able to capture a good image of what was
> > happening.
>
> Thank you for the detailed report.
>
> How severe was the hangs? Only writeouts stalled, all apps stalled, or
> cannot type and run new commands?

Only writeout stalled. The emerge that was moving the files hung, but
everything else worked normaly.
I was able to run new commands, like coping the /proc/meminfo.

[snip]
> > After this SysRq+W writeback resumed again. Possible that writing
> > above into the syslog triggered that.
>
> Maybe. Are the log files on another disk/partition?

No, everything was going to /

What might be interesting is, that doing cat /proc/meminfo
>~/stall/meminfo did not resume the writeback. So there might some
threshold that only was broken with the additional write from
syslog-ng. Or syslog-ng does some flushing, I dont now. (I'm using the
syslog-ng package from gentoo:
http://www.balabit.com/products/syslog_ng/ , version 2.0.5)

> > The source tmpfs is mounted with any special parameters, but the
> > target xfs filesystem resides on a dm-crypt device that is on top a 3
> > disk RAID5 md.
> > During the hang all CPUs where idle.
>
> No iowaits? ;-)

No, I have a KSysGuard in my taskbar that showed no activity at all.

OK, the subject does not match for my case, but there was also a tmpfs
involved. And I found no thread with stalls on xfs. :-)

> > The system is x86_64 with CONFIG_NO_HZ=y, but was still receiving ~330
> > interrupts per second because of the bttv driver. (But I was not using
> > that device at this time.)
> >
> > I'm willing to test patches or more provide more information, but lack
> > a good testcase to trigger this on demand.
>
> Thank you. Maybe we can start by the applied debug patch :-)

Will applied it and try to recreate this.

Thanks for looking into it.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Fengguang Wu
On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote:
> On 10/22/07, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
> > > Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?
> >
> > Bingo! It can be reproduced in -mm on reiserfs:
> >
> > # mkfs.reiserfs /dev/sdb1
> > # mount /dev/sdb1 /test
> > # cp bin /test
> > 
> > # dmesg
> > [...]
> > [  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
> > [  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
> > [  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)
> 
> Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
> Each time I noticed this I was using emerge (package util from the
> gentoo distribution) to install/upgrade a package. The last step,
> where this hang occurred, is moving the prepared files from a tmpfs
> partion to the main xfs filesystem.
> The hangs where not fatal, after a few second everything resumed
> normal, so I was not able to capture a good image of what was
> happening.

Thank you for the detailed report.

How severe was the hangs? Only writeouts stalled, all apps stalled, or
cannot type and run new commands?

> Today it happend again, but a little more obvious. During the moving
> process the writeout stalled completly for several minutes until I hit
> SysRq+W.
> 
> /proc/meminfo:
> MemTotal:  4061808 kB
> MemFree:881332 kB
> Buffers: 0 kB
> Cached:2566628 kB
> SwapCached: 64 kB
> Active: 926612 kB
> Inactive:  1959136 kB
> SwapTotal: 9775416 kB
> SwapFree:  9775296 kB
> Dirty:   44948 kB
> Writeback:   0 kB
> AnonPages:  319068 kB
> Mapped:  52844 kB
> Slab:   235572 kB
> SReclaimable:   164408 kB
> SUnreclaim:  71164 kB
> PageTables:   9576 kB
> NFS_Unstable:0 kB
> Bounce:  0 kB
> CommitLimit:  11806320 kB
> Committed_AS:   544520 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 35004 kB
> VmallocChunk: 34359702447 kB
> HugePages_Total: 0
> HugePages_Free:  0
> HugePages_Rsvd:  0
> HugePages_Surp:  0
> Hugepagesize: 2048 kB
> 
> The 'Dirty' count did not decrease during this time and 'Writeback' stayed at 
> 0.
> I also have /proc/pagetypeinfo ,but I see nothing interessing in
> there. (But will send it, if needed)
> 
> The output from SysRq+W:
> SysRq : Show Blocked State
>   taskPC stack   pid father
> pdflush   D 81001fcc2a88 0   285  2
>  810005d55580 0046 0800 0071
>  0400 8022d61c 80817b00 80817b00
>  80813f40 80817b00 810100893b18 
> Call Trace:
>  [] task_rq_lock+0x4c/0x90
>  [] __wake_up_common+0x5a/0x90
>  [] __down+0xa7/0x11e
>  [] default_wake_function+0x0/0x10
>  [] __down_failed+0x35/0x3a
>  [] xfs_buf_lock+0x3e/0x40
>  [] _xfs_buf_find+0x13e/0x240
>  [] xfs_buf_get_flags+0x6f/0x190
>  [] xfs_buf_read_flags+0x12/0xa0
>  [] xfs_trans_read_buf+0x64/0x340
>  [] xfs_itobp+0x81/0x1e0
>  [] xfs_buf_rele+0x2e/0xd0
>  [] xfs_iflush+0xfe/0x520
>  [] __down_read_trylock+0x42/0x60
>  [] xfs_inode_item_push+0x12/0x20
>  [] xfs_trans_push_ail+0x267/0x2b0
>  [] xlog_ticket_get+0xfb/0x140
>  [] xfs_log_reserve+0xee/0x120
>  [] xfs_trans_reserve+0xa8/0x210
>  [] xfs_iomap_write_allocate+0xfa/0x410
>  [] __split_bio+0x38a/0x3c0
>  [] xfs_start_page_writeback+0x27/0x60
>  [] xfs_iomap+0x26c/0x310
>  [] xfs_map_blocks+0x38/0x90
>  [] xfs_page_state_convert+0x2b8/0x630
>  [] xfs_vm_writepage+0x6f/0x120
>  [] __writepage+0xa/0x30
>  [] write_cache_pages+0x23e/0x330
>  [] __writepage+0x0/0x30
>  [] xfs_iflush+0x3b7/0x520
>  [] _xfs_buf_ioapply+0x222/0x320
>  [] __up_read+0x21/0xb0
>  [] xfs_iunlock+0x5c/0xc0
>  [] do_writepages+0x20/0x40
>  [] __writeback_single_inode+0xb0/0x380
>  [] dm_table_any_congested+0x2e/0x80
>  [] generic_sync_sb_inodes+0x20d/0x330
>  [] writeback_inodes+0xa2/0xe0
>  [] wb_kupdate+0xa6/0x120
>  [] pdflush+0x0/0x1e0
>  [] pdflush+0x110/0x1e0
>  [] wb_kupdate+0x0/0x120
>  [] kthread+0x4b/0x80
>  [] child_rip+0xa/0x12
>  [] kthread+0x0/0x80
>  [] child_rip+0x0/0x12
> 
> emergeD 81001fcc2a88 0  3221   8163
>  81008c0679f8 0086 81008c067988 8024a719
>  8100060fb008 8022c8ea 80817b00 80817b00
>  80813f40 80817b00 810100893b18 
> Call Trace:
>  [] autoremove_wake_function+0x9/0x30
>  [] __wake_up_common+0x5a/0x90
>  [] __wake_up_common+0x5a/0x90
>  [] __down+0xa7/0x11e
>  [] default_wake_function+0x0/0x10
>  [] __down_failed+0x35/0x3a
>  [] xfs_buf_lock+0x3e/0x40
>  [] _xfs_buf_find+0x13e/0x240
>  [] xfs_buf_get_flags+0x6f/0x190
>  [] xfs_buf_read_flags+0x12/0xa0
>  [] xfs_trans_read_buf+0x64/0x340
>  [] xfs_itobp+0x81/0x1e0
>  [] xfs_buf_rele+0x2e/0xd0
>  [] xfs_iflush+0xfe/0x520

Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Fengguang Wu
On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote:
 On 10/22/07, Fengguang Wu [EMAIL PROTECTED] wrote:
  On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
   Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?
 
  Bingo! It can be reproduced in -mm on reiserfs:
 
  # mkfs.reiserfs /dev/sdb1
  # mount /dev/sdb1 /test
  # cp bin /test
  wait for a while
  # dmesg
  [...]
  [  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
  [  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
  [  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)
 
 Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
 Each time I noticed this I was using emerge (package util from the
 gentoo distribution) to install/upgrade a package. The last step,
 where this hang occurred, is moving the prepared files from a tmpfs
 partion to the main xfs filesystem.
 The hangs where not fatal, after a few second everything resumed
 normal, so I was not able to capture a good image of what was
 happening.

Thank you for the detailed report.

How severe was the hangs? Only writeouts stalled, all apps stalled, or
cannot type and run new commands?

 Today it happend again, but a little more obvious. During the moving
 process the writeout stalled completly for several minutes until I hit
 SysRq+W.
 
 /proc/meminfo:
 MemTotal:  4061808 kB
 MemFree:881332 kB
 Buffers: 0 kB
 Cached:2566628 kB
 SwapCached: 64 kB
 Active: 926612 kB
 Inactive:  1959136 kB
 SwapTotal: 9775416 kB
 SwapFree:  9775296 kB
 Dirty:   44948 kB
 Writeback:   0 kB
 AnonPages:  319068 kB
 Mapped:  52844 kB
 Slab:   235572 kB
 SReclaimable:   164408 kB
 SUnreclaim:  71164 kB
 PageTables:   9576 kB
 NFS_Unstable:0 kB
 Bounce:  0 kB
 CommitLimit:  11806320 kB
 Committed_AS:   544520 kB
 VmallocTotal: 34359738367 kB
 VmallocUsed: 35004 kB
 VmallocChunk: 34359702447 kB
 HugePages_Total: 0
 HugePages_Free:  0
 HugePages_Rsvd:  0
 HugePages_Surp:  0
 Hugepagesize: 2048 kB
 
 The 'Dirty' count did not decrease during this time and 'Writeback' stayed at 
 0.
 I also have /proc/pagetypeinfo ,but I see nothing interessing in
 there. (But will send it, if needed)
 
 The output from SysRq+W:
 SysRq : Show Blocked State
   taskPC stack   pid father
 pdflush   D 81001fcc2a88 0   285  2
  810005d55580 0046 0800 0071
  0400 8022d61c 80817b00 80817b00
  80813f40 80817b00 810100893b18 
 Call Trace:
  [8022d61c] task_rq_lock+0x4c/0x90
  [8022c8ea] __wake_up_common+0x5a/0x90
  [805b14c7] __down+0xa7/0x11e
  [8022da70] default_wake_function+0x0/0x10
  [805b1145] __down_failed+0x35/0x3a
  [803750be] xfs_buf_lock+0x3e/0x40
  [803771fe] _xfs_buf_find+0x13e/0x240
  [8037736f] xfs_buf_get_flags+0x6f/0x190
  [803774a2] xfs_buf_read_flags+0x12/0xa0
  [80368614] xfs_trans_read_buf+0x64/0x340
  [80352151] xfs_itobp+0x81/0x1e0
  [803759de] xfs_buf_rele+0x2e/0xd0
  [80354afe] xfs_iflush+0xfe/0x520
  [803ae3b2] __down_read_trylock+0x42/0x60
  [80355a72] xfs_inode_item_push+0x12/0x20
  [80368037] xfs_trans_push_ail+0x267/0x2b0
  [8035a33b] xlog_ticket_get+0xfb/0x140
  [8035c5ae] xfs_log_reserve+0xee/0x120
  [803669e8] xfs_trans_reserve+0xa8/0x210
  [8035703a] xfs_iomap_write_allocate+0xfa/0x410
  [804ce67a] __split_bio+0x38a/0x3c0
  [80373657] xfs_start_page_writeback+0x27/0x60
  [8035660c] xfs_iomap+0x26c/0x310
  [803735d8] xfs_map_blocks+0x38/0x90
  [80374a88] xfs_page_state_convert+0x2b8/0x630
  [80374f5f] xfs_vm_writepage+0x6f/0x120
  [8026acda] __writepage+0xa/0x30
  [8026b2ce] write_cache_pages+0x23e/0x330
  [8026acd0] __writepage+0x0/0x30
  [80354db7] xfs_iflush+0x3b7/0x520
  [80375782] _xfs_buf_ioapply+0x222/0x320
  [803ae451] __up_read+0x21/0xb0
  [8034f22c] xfs_iunlock+0x5c/0xc0
  [8026b410] do_writepages+0x20/0x40
  [802b36a0] __writeback_single_inode+0xb0/0x380
  [804d052e] dm_table_any_congested+0x2e/0x80
  [802b3d8d] generic_sync_sb_inodes+0x20d/0x330
  [802b4322] writeback_inodes+0xa2/0xe0
  [8026bde6] wb_kupdate+0xa6/0x120
  [8026c2a0] pdflush+0x0/0x1e0
  [8026c3b0] pdflush+0x110/0x1e0
  [8026bd40] wb_kupdate+0x0/0x120
  [8024a32b] kthread+0x4b/0x80
  [8020c9d8] child_rip+0xa/0x12
  [8024a2e0] kthread+0x0/0x80
  [8020c9ce] child_rip+0x0/0x12
 
 emergeD 81001fcc2a88 0  3221   8163
  81008c0679f8 0086 81008c067988 

Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Torsten Kaiser
On 11/1/07, Fengguang Wu [EMAIL PROTECTED] wrote:
 On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote:
  Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
  Each time I noticed this I was using emerge (package util from the
  gentoo distribution) to install/upgrade a package. The last step,
  where this hang occurred, is moving the prepared files from a tmpfs
  partion to the main xfs filesystem.
  The hangs where not fatal, after a few second everything resumed
  normal, so I was not able to capture a good image of what was
  happening.

 Thank you for the detailed report.

 How severe was the hangs? Only writeouts stalled, all apps stalled, or
 cannot type and run new commands?

Only writeout stalled. The emerge that was moving the files hung, but
everything else worked normaly.
I was able to run new commands, like coping the /proc/meminfo.

[snip]
  After this SysRq+W writeback resumed again. Possible that writing
  above into the syslog triggered that.

 Maybe. Are the log files on another disk/partition?

No, everything was going to /

What might be interesting is, that doing cat /proc/meminfo
~/stall/meminfo did not resume the writeback. So there might some
threshold that only was broken with the additional write from
syslog-ng. Or syslog-ng does some flushing, I dont now. (I'm using the
syslog-ng package from gentoo:
http://www.balabit.com/products/syslog_ng/ , version 2.0.5)

  The source tmpfs is mounted with any special parameters, but the
  target xfs filesystem resides on a dm-crypt device that is on top a 3
  disk RAID5 md.
  During the hang all CPUs where idle.

 No iowaits? ;-)

No, I have a KSysGuard in my taskbar that showed no activity at all.

OK, the subject does not match for my case, but there was also a tmpfs
involved. And I found no thread with stalls on xfs. :-)

  The system is x86_64 with CONFIG_NO_HZ=y, but was still receiving ~330
  interrupts per second because of the bttv driver. (But I was not using
  that device at this time.)
 
  I'm willing to test patches or more provide more information, but lack
  a good testcase to trigger this on demand.

 Thank you. Maybe we can start by the applied debug patch :-)

Will applied it and try to recreate this.

Thanks for looking into it.

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-11-01 Thread Torsten Kaiser
On 11/1/07, Torsten Kaiser [EMAIL PROTECTED] wrote:
 On 11/1/07, Fengguang Wu [EMAIL PROTECTED] wrote:
  Thank you. Maybe we can start by the applied debug patch :-)

 Will applied it and try to recreate this.

Patch applied, used emerge to install a 2.6.24-rc1 kernel.

I had no complete stalls, but three times during the move from tmpfs
to the main xfs the emerge got noticeable slower. There still was
writeout happening, but as emerge prints out every file it has written
during the pause not one file was processed.

vmstat 10:
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  1  0 3146424332 61476800   134  1849  438 2515  3  4 91  2
 0  0  0 3146644332 61478400 2  1628  507  646  0  2 85 13
 0  0  0 3146868332 61486800 5  2359  527 1076  0  3 97  0
 1  0  0 3144372332 6161480096  2829  607 2666  2  5 92  0
- normal writeout
 0  0  0 3140560332 61814400   152  2764  633 3308  3  6 91  0
 0  0  0 3137332332 61990800   114  1801  588 2858  3  4 93  0
 0  0  0 3136912332 6201360020   827  393 1605  1  2 98  0
- first stall
 0  0  0 3137088332 62013600 0   557  339 1437  0  1 99  0
 0  0  0 3137160332 62013600 0   642  310 1400  0  1 99  0
 0  0  0 3136588332 62017200 6  2972  527 1195  0  3 80 16
 0  0  0 3136276332 6203480010  2668  558 1195  0  3 96  0
 0  0  0 3135228332 62042400 8  2712  522 1311  0  4 96  0
 0  0  0 3131740332 6215240075  2935  559 2457  2  5 93  0
 0  0  0 3128348332 6229720085  1470  490 2607  3  4 93  0
 0  0  0 3129292332 62297200 0   527  353 1398  0  1 99  0
- second longer stall
 0  0  0 3128520332 62302800 6   488  249 1390  0  1 99  0
 0  0  0 3128236332 62302800 0   482  222 1222  0  1 99  0
 0  0  0 3128408332 62302800 0   585  269 1301  0  0 99  0
 0  0  0 3128532332 62302800 0   610  262 1278  0  0 99  0
 0  0  0 3128568332 62302800 0   636  345 1639  0  1 99  0
 0  0  0 3129032332 62304000 1   664  337 1466  0  1 99  0
 0  0  0 3129484332 62304000 0   658  300 1508  0  0 100  0
 0  0  0 3129576332 62304000 0   562  271 1454  0  1 99  0
 0  0  0 3129736332 62304000 0   627  278 1406  0  1 99  0
 0  0  0 3129368332 62304000 0   507  274 1301  0  1 99  0
 0  0  0 3129004332 62304000 0   444  211 1213  0  0 99  0
 0  1  0 3127260332 62304000 0  1036  305 1242  0  1 95  4
 0  0  0 3126280332 62312800 7  4241  555 1575  1  5 84 10
 0  0  0 3124948332 62323200 6  4194  529 1505  1  4 95  0
 0  0  0 3125228332 6241680058  1966  586 1964  2  4 94  0
- emerge resumed to normal speed, without any intervention from my side
 0  0  0 3120932332 62590400   112  1546  546 2565  3  4 93  0
 0  0  0 3118012332 62756800   128  1542  612 2705  3  4 93  0


From syslog:
first stall:
[  575.05] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 47259
global 610 0 0 wc __ tw 1023 sk 0
[  586.35] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 50465
global 6117 0 0 wc _M tw 967 sk 0
[  586.36] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 50408
global 6117 0 0 wc __ tw 1022 sk 0
[  599.90] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 53523
global 11141 0 0 wc __ tw 1009 sk 0
[  635.78] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 59397
global 12757 124 0 wc __ tw 0 sk 0
[  638.47] mm/page-writeback.c 418 balance_dirty_pages:
emerge(6113) 1536 global 11405 51 0 wc __ tw 0 sk 0
[  638.82] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 58373
global 11276 48 0 wc __ tw -1 sk 0
[  641.26] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 57348
global 10565 100 0 wc __ tw 0 sk 0
[  643.98] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 56324
global 9788 103 0 wc __ tw -1 sk 0
[  646.12] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 55299
global 8912 6 0 wc __ tw 0 sk 0

second stall:
[  664.04] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 48117
global 2864 81 0 wc _M tw -13 sk 0
[  664.40] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 47080
global 1995 137 0 wc _M tw 176 sk 0
[  664.51] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 46232
global 1929 267 0 wc __ tw 880 sk 0
cron[6927]: (root) CMD (test -x /usr/sbin/run-crons  /usr/sbin/run-crons )
[  809.56] mm/page-writeback.c 655 wb_kupdate: pdflush(285) 49422
global 19166 217 0 wc _M tw 380 sk 0
[  811.72] mm/page-writeback.c 655 wb_kupdate: 

Re: 100% iowait on one of cpus in current -git

2007-10-31 Thread Torsten Kaiser
On 10/22/07, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
> > Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?
>
> Bingo! It can be reproduced in -mm on reiserfs:
>
> # mkfs.reiserfs /dev/sdb1
> # mount /dev/sdb1 /test
> # cp bin /test
> 
> # dmesg
> [...]
> [  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
> [  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
> [  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)

Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
Each time I noticed this I was using emerge (package util from the
gentoo distribution) to install/upgrade a package. The last step,
where this hang occurred, is moving the prepared files from a tmpfs
partion to the main xfs filesystem.
The hangs where not fatal, after a few second everything resumed
normal, so I was not able to capture a good image of what was
happening.

Today it happend again, but a little more obvious. During the moving
process the writeout stalled completly for several minutes until I hit
SysRq+W.

/proc/meminfo:
MemTotal:  4061808 kB
MemFree:881332 kB
Buffers: 0 kB
Cached:2566628 kB
SwapCached: 64 kB
Active: 926612 kB
Inactive:  1959136 kB
SwapTotal: 9775416 kB
SwapFree:  9775296 kB
Dirty:   44948 kB
Writeback:   0 kB
AnonPages:  319068 kB
Mapped:  52844 kB
Slab:   235572 kB
SReclaimable:   164408 kB
SUnreclaim:  71164 kB
PageTables:   9576 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:  11806320 kB
Committed_AS:   544520 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 35004 kB
VmallocChunk: 34359702447 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
HugePages_Surp:  0
Hugepagesize: 2048 kB

The 'Dirty' count did not decrease during this time and 'Writeback' stayed at 0.
I also have /proc/pagetypeinfo ,but I see nothing interessing in
there. (But will send it, if needed)

The output from SysRq+W:
SysRq : Show Blocked State
  taskPC stack   pid father
pdflush   D 81001fcc2a88 0   285  2
 810005d55580 0046 0800 0071
 0400 8022d61c 80817b00 80817b00
 80813f40 80817b00 810100893b18 
Call Trace:
 [] task_rq_lock+0x4c/0x90
 [] __wake_up_common+0x5a/0x90
 [] __down+0xa7/0x11e
 [] default_wake_function+0x0/0x10
 [] __down_failed+0x35/0x3a
 [] xfs_buf_lock+0x3e/0x40
 [] _xfs_buf_find+0x13e/0x240
 [] xfs_buf_get_flags+0x6f/0x190
 [] xfs_buf_read_flags+0x12/0xa0
 [] xfs_trans_read_buf+0x64/0x340
 [] xfs_itobp+0x81/0x1e0
 [] xfs_buf_rele+0x2e/0xd0
 [] xfs_iflush+0xfe/0x520
 [] __down_read_trylock+0x42/0x60
 [] xfs_inode_item_push+0x12/0x20
 [] xfs_trans_push_ail+0x267/0x2b0
 [] xlog_ticket_get+0xfb/0x140
 [] xfs_log_reserve+0xee/0x120
 [] xfs_trans_reserve+0xa8/0x210
 [] xfs_iomap_write_allocate+0xfa/0x410
 [] __split_bio+0x38a/0x3c0
 [] xfs_start_page_writeback+0x27/0x60
 [] xfs_iomap+0x26c/0x310
 [] xfs_map_blocks+0x38/0x90
 [] xfs_page_state_convert+0x2b8/0x630
 [] xfs_vm_writepage+0x6f/0x120
 [] __writepage+0xa/0x30
 [] write_cache_pages+0x23e/0x330
 [] __writepage+0x0/0x30
 [] xfs_iflush+0x3b7/0x520
 [] _xfs_buf_ioapply+0x222/0x320
 [] __up_read+0x21/0xb0
 [] xfs_iunlock+0x5c/0xc0
 [] do_writepages+0x20/0x40
 [] __writeback_single_inode+0xb0/0x380
 [] dm_table_any_congested+0x2e/0x80
 [] generic_sync_sb_inodes+0x20d/0x330
 [] writeback_inodes+0xa2/0xe0
 [] wb_kupdate+0xa6/0x120
 [] pdflush+0x0/0x1e0
 [] pdflush+0x110/0x1e0
 [] wb_kupdate+0x0/0x120
 [] kthread+0x4b/0x80
 [] child_rip+0xa/0x12
 [] kthread+0x0/0x80
 [] child_rip+0x0/0x12

emergeD 81001fcc2a88 0  3221   8163
 81008c0679f8 0086 81008c067988 8024a719
 8100060fb008 8022c8ea 80817b00 80817b00
 80813f40 80817b00 810100893b18 
Call Trace:
 [] autoremove_wake_function+0x9/0x30
 [] __wake_up_common+0x5a/0x90
 [] __wake_up_common+0x5a/0x90
 [] __down+0xa7/0x11e
 [] default_wake_function+0x0/0x10
 [] __down_failed+0x35/0x3a
 [] xfs_buf_lock+0x3e/0x40
 [] _xfs_buf_find+0x13e/0x240
 [] xfs_buf_get_flags+0x6f/0x190
 [] xfs_buf_read_flags+0x12/0xa0
 [] xfs_trans_read_buf+0x64/0x340
 [] xfs_itobp+0x81/0x1e0
 [] xfs_buf_rele+0x2e/0xd0
 [] xfs_iflush+0xfe/0x520
 [] __down_read_trylock+0x42/0x60
 [] xfs_inode_item_push+0x12/0x20
 [] xfs_trans_push_ail+0x267/0x2b0
 [] xfs_log_reserve+0x72/0x120
 [] xfs_trans_reserve+0xa8/0x210
 [] kmem_zone_zalloc+0x32/0x50
 [] xfs_itruncate_finish+0xfb/0x310
 [] xfs_free_eofblocks+0x23b/0x280
 [] xfs_release+0x153/0x200
 [] xfs_file_release+0x10/0x20
 [] __fput+0xb1/0x220
 [] filp_close+0x54/0x90
 [] sys_close+0x9f/0x100
 [] system_call+0x7e/0x83


After this SysRq+W writeback resumed again. Possible 

Re: 100% iowait on one of cpus in current -git

2007-10-31 Thread Torsten Kaiser
On 10/22/07, Fengguang Wu [EMAIL PROTECTED] wrote:
 On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
  Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?

 Bingo! It can be reproduced in -mm on reiserfs:

 # mkfs.reiserfs /dev/sdb1
 # mount /dev/sdb1 /test
 # cp bin /test
 wait for a while
 # dmesg
 [...]
 [  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
 [  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
 [  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)

Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts.
Each time I noticed this I was using emerge (package util from the
gentoo distribution) to install/upgrade a package. The last step,
where this hang occurred, is moving the prepared files from a tmpfs
partion to the main xfs filesystem.
The hangs where not fatal, after a few second everything resumed
normal, so I was not able to capture a good image of what was
happening.

Today it happend again, but a little more obvious. During the moving
process the writeout stalled completly for several minutes until I hit
SysRq+W.

/proc/meminfo:
MemTotal:  4061808 kB
MemFree:881332 kB
Buffers: 0 kB
Cached:2566628 kB
SwapCached: 64 kB
Active: 926612 kB
Inactive:  1959136 kB
SwapTotal: 9775416 kB
SwapFree:  9775296 kB
Dirty:   44948 kB
Writeback:   0 kB
AnonPages:  319068 kB
Mapped:  52844 kB
Slab:   235572 kB
SReclaimable:   164408 kB
SUnreclaim:  71164 kB
PageTables:   9576 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:  11806320 kB
Committed_AS:   544520 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 35004 kB
VmallocChunk: 34359702447 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
HugePages_Surp:  0
Hugepagesize: 2048 kB

The 'Dirty' count did not decrease during this time and 'Writeback' stayed at 0.
I also have /proc/pagetypeinfo ,but I see nothing interessing in
there. (But will send it, if needed)

The output from SysRq+W:
SysRq : Show Blocked State
  taskPC stack   pid father
pdflush   D 81001fcc2a88 0   285  2
 810005d55580 0046 0800 0071
 0400 8022d61c 80817b00 80817b00
 80813f40 80817b00 810100893b18 
Call Trace:
 [8022d61c] task_rq_lock+0x4c/0x90
 [8022c8ea] __wake_up_common+0x5a/0x90
 [805b14c7] __down+0xa7/0x11e
 [8022da70] default_wake_function+0x0/0x10
 [805b1145] __down_failed+0x35/0x3a
 [803750be] xfs_buf_lock+0x3e/0x40
 [803771fe] _xfs_buf_find+0x13e/0x240
 [8037736f] xfs_buf_get_flags+0x6f/0x190
 [803774a2] xfs_buf_read_flags+0x12/0xa0
 [80368614] xfs_trans_read_buf+0x64/0x340
 [80352151] xfs_itobp+0x81/0x1e0
 [803759de] xfs_buf_rele+0x2e/0xd0
 [80354afe] xfs_iflush+0xfe/0x520
 [803ae3b2] __down_read_trylock+0x42/0x60
 [80355a72] xfs_inode_item_push+0x12/0x20
 [80368037] xfs_trans_push_ail+0x267/0x2b0
 [8035a33b] xlog_ticket_get+0xfb/0x140
 [8035c5ae] xfs_log_reserve+0xee/0x120
 [803669e8] xfs_trans_reserve+0xa8/0x210
 [8035703a] xfs_iomap_write_allocate+0xfa/0x410
 [804ce67a] __split_bio+0x38a/0x3c0
 [80373657] xfs_start_page_writeback+0x27/0x60
 [8035660c] xfs_iomap+0x26c/0x310
 [803735d8] xfs_map_blocks+0x38/0x90
 [80374a88] xfs_page_state_convert+0x2b8/0x630
 [80374f5f] xfs_vm_writepage+0x6f/0x120
 [8026acda] __writepage+0xa/0x30
 [8026b2ce] write_cache_pages+0x23e/0x330
 [8026acd0] __writepage+0x0/0x30
 [80354db7] xfs_iflush+0x3b7/0x520
 [80375782] _xfs_buf_ioapply+0x222/0x320
 [803ae451] __up_read+0x21/0xb0
 [8034f22c] xfs_iunlock+0x5c/0xc0
 [8026b410] do_writepages+0x20/0x40
 [802b36a0] __writeback_single_inode+0xb0/0x380
 [804d052e] dm_table_any_congested+0x2e/0x80
 [802b3d8d] generic_sync_sb_inodes+0x20d/0x330
 [802b4322] writeback_inodes+0xa2/0xe0
 [8026bde6] wb_kupdate+0xa6/0x120
 [8026c2a0] pdflush+0x0/0x1e0
 [8026c3b0] pdflush+0x110/0x1e0
 [8026bd40] wb_kupdate+0x0/0x120
 [8024a32b] kthread+0x4b/0x80
 [8020c9d8] child_rip+0xa/0x12
 [8024a2e0] kthread+0x0/0x80
 [8020c9ce] child_rip+0x0/0x12

emergeD 81001fcc2a88 0  3221   8163
 81008c0679f8 0086 81008c067988 8024a719
 8100060fb008 8022c8ea 80817b00 80817b00
 80813f40 80817b00 810100893b18 
Call Trace:
 [8024a719] autoremove_wake_function+0x9/0x30
 [8022c8ea] __wake_up_common+0x5a/0x90
 [8022c8ea] __wake_up_common+0x5a/0x90
 [805b14c7] 

Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
> Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?

Bingo! It can be reproduced in -mm on reiserfs:

# mkfs.reiserfs /dev/sdb1
# mount /dev/sdb1 /test
# cp bin /test

# dmesg
[...]
[  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
[  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
[  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)
[  418.346131] requeue_io 308: inode 9 size 180 at 08:11(sdb1)
[  418.346136] requeue_io 308: inode 10 size 1488 at 08:11(sdb1)
[  418.346142] requeue_io 308: inode 12 size 1358 at 08:11(sdb1)
[  418.346148] requeue_io 308: inode 13 size 482 at 08:11(sdb1)
[  418.346153] requeue_io 308: inode 14 size 171 at 08:11(sdb1)
[  418.346159] requeue_io 308: inode 15 size 93 at 08:11(sdb1)
[  418.346164] requeue_io 308: inode 16 size 81 at 08:11(sdb1)
[  418.346170] requeue_io 308: inode 17 size 212 at 08:11(sdb1)
[  418.346176] requeue_io 308: inode 18 size 431 at 08:11(sdb1)
[  418.346181] requeue_io 308: inode 19 size 231 at 08:11(sdb1)
[  418.346187] requeue_io 308: inode 20 size 1756 at 08:11(sdb1)
[  418.346193] requeue_io 308: inode 21 size 1229 at 08:11(sdb1)
[  418.346198] requeue_io 308: inode 22 size 157 at 08:11(sdb1)
[  418.346204] requeue_io 308: inode 23 size 3430 at 08:11(sdb1)
[  418.346210] requeue_io 308: inode 24 size 200 at 08:11(sdb1)
[  418.346215] requeue_io 308: inode 25 size 202 at 08:11(sdb1)
[  418.346221] requeue_io 308: inode 26 size 386 at 08:11(sdb1)
[  418.346226] requeue_io 308: inode 27 size 264 at 08:11(sdb1)
[  418.346232] requeue_io 308: inode 28 size 268 at 08:11(sdb1)
[  418.346238] requeue_io 308: inode 29 size 1228 at 08:11(sdb1)
[  418.346243] requeue_io 308: inode 30 size 404 at 08:11(sdb1)
[  418.346249] requeue_io 308: inode 31 size 2452 at 08:11(sdb1)
[  418.346255] requeue_io 308: inode 32 size 1236 at 08:11(sdb1)
[  418.346260] requeue_io 308: inode 33 size 655 at 08:11(sdb1)
[  418.346266] requeue_io 308: inode 35 size 330 at 08:11(sdb1)
[  418.346272] requeue_io 308: inode 36 size 248 at 08:11(sdb1)
[  418.346277] requeue_io 308: inode 37 size 683 at 08:11(sdb1)
[  418.346283] requeue_io 308: inode 38 size 1451 at 08:11(sdb1)
[  418.346288] requeue_io 308: inode 39 size 894 at 08:11(sdb1)
[  418.346294] requeue_io 308: inode 40 size 879 at 08:11(sdb1)
[  418.346300] requeue_io 308: inode 42 size 797 at 08:11(sdb1)
[  418.346305] requeue_io 308: inode 43 size 1314 at 08:11(sdb1)
[  418.346311] requeue_io 308: inode 44 size 1463 at 08:11(sdb1)
[  418.346317] requeue_io 308: inode 45 size 3032 at 08:11(sdb1)
[  418.346322] requeue_io 308: inode 46 size 325 at 08:11(sdb1)
[  418.346328] requeue_io 308: inode 47 size 583 at 08:11(sdb1)
[  418.346334] requeue_io 308: inode 48 size 1660 at 08:11(sdb1)
[  418.346339] requeue_io 308: inode 49 size 3159 at 08:11(sdb1)
[  418.346345] requeue_io 308: inode 50 size 510 at 08:11(sdb1)
[  418.346350] requeue_io 308: inode 51 size 100 at 08:11(sdb1)
[  418.346356] requeue_io 308: inode 52 size 143 at 08:11(sdb1)
[  418.346370] requeue_io 308: inode 53 size 954 at 08:11(sdb1)
[  418.346373] requeue_io 308: inode 54 size 322 at 08:11(sdb1)
[  418.346376] requeue_io 308: inode 55 size 970 at 08:11(sdb1)
[  418.346379] requeue_io 308: inode 57 size 483 at 08:11(sdb1)
[  418.346382] requeue_io 308: inode 58 size 1125 at 08:11(sdb1)
[  418.346385] requeue_io 308: inode 59 size 2196 at 08:11(sdb1)
[  418.346388] requeue_io 308: inode 60 size 104 at 08:11(sdb1)
[  418.346391] requeue_io 308: inode 61 size 488 at 08:11(sdb1)
[  418.346394] requeue_io 308: inode 62 size 116 at 08:11(sdb1)
[  418.346397] requeue_io 308: inode 63 size 907 at 08:11(sdb1)
[  418.346400] requeue_io 308: inode 64 size 1076 at 08:11(sdb1)
[  418.346403] requeue_io 308: inode 65 size 460 at 08:11(sdb1)
[  418.346406] requeue_io 308: inode 66 size 1092 at 08:11(sdb1)
[  418.346409] requeue_io 308: inode 67 size 424 at 08:11(sdb1)
[  418.346412] requeue_io 308: inode 68 size 696 at 08:11(sdb1)
[  418.346415] requeue_io 308: inode 70 size 137 at 08:11(sdb1)
[  418.346418] requeue_io 308: inode 71 size 201 at 08:11(sdb1)
[  418.346421] requeue_io 308: inode 72 size 150 at 08:11(sdb1)
[  418.346424] requeue_io 308: inode 73 size 188 at 08:11(sdb1)
[  418.346427] requeue_io 308: inode 75 size 1208 at 08:11(sdb1)
[  418.346431] requeue_io 308: inode 76 size 493 at 08:11(sdb1)
[  418.346434] requeue_io 308: inode 77 size 484 at 08:11(sdb1)
[  418.346437] requeue_io 308: inode 78 size 356 at 08:11(sdb1)
[  418.346440] requeue_io 308: inode 79 size 895 at 08:11(sdb1)
[  418.346443] requeue_io 308: inode 80 size 847 at 08:11(sdb1)
[  418.346446] requeue_io 308: inode 81 size 3281 at 08:11(sdb1)
[  418.346449] requeue_io 308: inode 82 size 3329 at 08:11(sdb1)
[  418.346452] requeue_io 308: inode 83 size 115 at 08:11(sdb1)
[  418.346455] requeue_io 308: inode 84 size 644 at 08:11(sdb1)
[  418.346458] requeue_io 308: inode 85 

Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 15:10:45 Fengguang Wu wrote:
> On Mon, Oct 22, 2007 at 03:05:35PM +0200, Maxim Levitsky wrote:
> > Hi,
> > Thank you very much too, for helping me.
> 
> You are welcome :-)
> 
> > Applied.
> > Had to kill klogd, since kernel generates tons of redirtied inode messages.
> > Size of the kern.log is 863 KB, thus I I don't think it is polite to 
> > attachit.
> > Don't know whenever it is ok to put it on pastebin too.
> > 
> > Anyway, it shows lots of redirtied inode... messages,
> > and while most of them are "at 08:02(sda2)" , my reiserfs root partition, 
> > some are
> > 
> > "Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
> > 00:0f(tmpfs), line 300."
> > 
> > " line 300" is shown always
> > 
> > (I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)
> 
> Thank you for the testing out.
> 
> Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?
No, I don't think so, like I said it sometimes shows the same message on tmpfs
> 
> Thank you,
> Fengguang
> 
> 
Best Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 03:05:35PM +0200, Maxim Levitsky wrote:
> Hi,
> Thank you very much too, for helping me.

You are welcome :-)

> Applied.
> Had to kill klogd, since kernel generates tons of redirtied inode messages.
> Size of the kern.log is 863 KB, thus I I don't think it is polite to attachit.
> Don't know whenever it is ok to put it on pastebin too.
> 
> Anyway, it shows lots of redirtied inode... messages,
> and while most of them are "at 08:02(sda2)" , my reiserfs root partition, 
> some are
> 
> "Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
> 00:0f(tmpfs), line 300."
> 
> " line 300" is shown always
> 
> (I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)

Thank you for the testing out.

Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 14:37:07 Fengguang Wu wrote:
> On Mon, Oct 22, 2007 at 02:21:21PM +0200, Maxim Levitsky wrote:
> > I Bisected this bug to exactly this commit:
> > 
> > 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
> > writeback: introduce writeback_control.more_io to indicate more io
> > 
> > Reverting it and compiling latest git shows no more signs of that bug.
> 
> Thank you very much.
> 
> I guess your system has some difficulty in writing back some inodes.
> (i.e. a bug disclosed by this patch, the 100% iowait only makes it
> more obvious)
> 
> I cannot reproduce it with your .config, so would you recompile and
> run the kernel with the above commit _and_ the below debugging patch?
> 
> Thank you,
> Fengguang
> ---
> 
>  fs/fs-writeback.c |   15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> --- linux-2.6.23-git17.orig/fs/fs-writeback.c
> +++ linux-2.6.23-git17/fs/fs-writeback.c
> @@ -164,12 +164,25 @@ static void redirty_tail(struct inode *i
>   list_move(>i_list, >s_dirty);
>  }
>  
> +#define requeue_io(inode)\
> + do {\
> + __requeue_io(inode, __LINE__);  \
> + } while (0)
> +
>  /*
>   * requeue inode for re-scanning after sb->s_io list is exhausted.
>   */
> -static void requeue_io(struct inode *inode)
> +static void __requeue_io(struct inode *inode, int line)
>  {
>   list_move(>i_list, >i_sb->s_more_io);
> +
> + printk(KERN_DEBUG "redirtied inode %lu size %llu at %02x:%02x(%s), line 
> %d.\n",
> + inode->i_ino,
> + i_size_read(inode),
> + MAJOR(inode->i_sb->s_dev),
> + MINOR(inode->i_sb->s_dev),
> + inode->i_sb->s_id,
> + line);
>  }
>  
>  static void inode_sync_complete(struct inode *inode)
> 
> 

Hi,
Thank you very much too, for helping me.


Applied.
Had to kill klogd, since kernel generates tons of redirtied inode messages.
Size of the kern.log is 863 KB, thus I I don't think it is polite to attachit.
Don't know whenever it is ok to put it on pastebin too.

Anyway, it shows lots of redirtied inode... messages,
and while most of them are "at 08:02(sda2)" , my reiserfs root partition, some 
are

"Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
00:0f(tmpfs), line 300."

" line 300" is shown always

(I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)


Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 02:21:21PM +0200, Maxim Levitsky wrote:
> I Bisected this bug to exactly this commit:
> 
> 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
> writeback: introduce writeback_control.more_io to indicate more io
> 
> Reverting it and compiling latest git shows no more signs of that bug.

Thank you very much.

I guess your system has some difficulty in writing back some inodes.
(i.e. a bug disclosed by this patch, the 100% iowait only makes it
more obvious)

I cannot reproduce it with your .config, so would you recompile and
run the kernel with the above commit _and_ the below debugging patch?

Thank you,
Fengguang
---

 fs/fs-writeback.c |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

--- linux-2.6.23-git17.orig/fs/fs-writeback.c
+++ linux-2.6.23-git17/fs/fs-writeback.c
@@ -164,12 +164,25 @@ static void redirty_tail(struct inode *i
list_move(>i_list, >s_dirty);
 }
 
+#define requeue_io(inode)  \
+   do {\
+   __requeue_io(inode, __LINE__);  \
+   } while (0)
+
 /*
  * requeue inode for re-scanning after sb->s_io list is exhausted.
  */
-static void requeue_io(struct inode *inode)
+static void __requeue_io(struct inode *inode, int line)
 {
list_move(>i_list, >i_sb->s_more_io);
+
+   printk(KERN_DEBUG "redirtied inode %lu size %llu at %02x:%02x(%s), line 
%d.\n",
+   inode->i_ino,
+   i_size_read(inode),
+   MAJOR(inode->i_sb->s_dev),
+   MINOR(inode->i_sb->s_dev),
+   inode->i_sb->s_id,
+   line);
 }
 
 static void inode_sync_complete(struct inode *inode)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 13:19:08 Fengguang Wu wrote:
> On Mon, Oct 22, 2007 at 12:58:11PM +0200, Maxim Levitsky wrote:
> > On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
> > > On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
> > > > On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
> > > > > > [  673.365631] pdflush   D c21bdecc 0   221  2
> > > > > > [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
> > > > > >  c21b3000 0002
> > > > > > [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
> > > > > > c21bdec0 c03ff849 c21bdef0
> > > > > > [  673.365650]00052974  00ff   
> > > > > >  c21bdef0 000529dc
> > > > > > [  673.365657] Call Trace:
> > > > > > [  673.365659]  [] schedule_timeout+0x48/0xc0
> > > > > > [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> > > > > > [  673.365667]  [] congestion_wait+0x71/0x90
> > > > > > [  673.365671]  [] wb_kupdate+0x9e/0xf0
> > > > > > [  673.365675]  [] pdflush+0x102/0x1d0
> > > > > > [  673.365679]  [] kthread+0x42/0x70
> > > > > > [  673.365683]  [] kernel_thread_helper+0x7/0x18
> > > > > > 
> > > > > 
> > > > > That looks more like the inode writeback patches from Wu than the per
> > > > > bdi dirty stuff. The later typically hangs in balance_dirty_pages().
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > Yes, you are right,
> > > > 
> > > > both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
> > > > 3e26c149c358529b1605f8959341d34bc4b880a3 work fine
> > > > But I didn't pay attention that those are before 
> > > > f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
> > > > So, back to the drawing board :-)
> > > > 
> > > > Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
> > > > writeback patches.
> > > 
> > > Thank you. I'll try if I can reproduce it locally...
> > > 
> > > Fengguang
> > > 
> > > 
> > 
> > Bingo,
> > 
> > Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.
> > 
> > I will now bisect to find exact patch that caused this bug,
> 
> This one is most relevant:
> 
> 1f7decf6d9f06dac008b8d66935c0c3b18e564f9
> writeback: introduce writeback_control.more_io to indicate more io
Exactly.
> 
> Still compiling the kernel...
> 
> Thank you,
> Fengguang
> 
> 
Hi,


I Bisected this bug to exactly this commit:

2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
writeback: introduce writeback_control.more_io to indicate more io

Reverting it and compiling latest git shows no more signs of that bug.
Thanks,
Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 12:58:11PM +0200, Maxim Levitsky wrote:
> On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
> > On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
> > > On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
> > > > > [  673.365631] pdflush   D c21bdecc 0   221  2
> > > > > [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
> > > > >  c21b3000 0002
> > > > > [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
> > > > > c21bdec0 c03ff849 c21bdef0
> > > > > [  673.365650]00052974  00ff   
> > > > >  c21bdef0 000529dc
> > > > > [  673.365657] Call Trace:
> > > > > [  673.365659]  [] schedule_timeout+0x48/0xc0
> > > > > [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> > > > > [  673.365667]  [] congestion_wait+0x71/0x90
> > > > > [  673.365671]  [] wb_kupdate+0x9e/0xf0
> > > > > [  673.365675]  [] pdflush+0x102/0x1d0
> > > > > [  673.365679]  [] kthread+0x42/0x70
> > > > > [  673.365683]  [] kernel_thread_helper+0x7/0x18
> > > > > 
> > > > 
> > > > That looks more like the inode writeback patches from Wu than the per
> > > > bdi dirty stuff. The later typically hangs in balance_dirty_pages().
> > > > 
> > > > 
> > > > 
> > > 
> > > Yes, you are right,
> > > 
> > > both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
> > > 3e26c149c358529b1605f8959341d34bc4b880a3 work fine
> > > But I didn't pay attention that those are before 
> > > f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
> > > So, back to the drawing board :-)
> > > 
> > > Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
> > > writeback patches.
> > 
> > Thank you. I'll try if I can reproduce it locally...
> > 
> > Fengguang
> > 
> > 
> 
> Bingo,
> 
> Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.
> 
> I will now bisect to find exact patch that caused this bug,

This one is most relevant:

1f7decf6d9f06dac008b8d66935c0c3b18e564f9
writeback: introduce writeback_control.more_io to indicate more io

Still compiling the kernel...

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
> On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
> > On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
> > > > [  673.365631] pdflush   D c21bdecc 0   221  2
> > > > [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
> > > >  c21b3000 0002
> > > > [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
> > > > c21bdec0 c03ff849 c21bdef0
> > > > [  673.365650]00052974  00ff   
> > > >  c21bdef0 000529dc
> > > > [  673.365657] Call Trace:
> > > > [  673.365659]  [] schedule_timeout+0x48/0xc0
> > > > [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> > > > [  673.365667]  [] congestion_wait+0x71/0x90
> > > > [  673.365671]  [] wb_kupdate+0x9e/0xf0
> > > > [  673.365675]  [] pdflush+0x102/0x1d0
> > > > [  673.365679]  [] kthread+0x42/0x70
> > > > [  673.365683]  [] kernel_thread_helper+0x7/0x18
> > > > 
> > > 
> > > That looks more like the inode writeback patches from Wu than the per
> > > bdi dirty stuff. The later typically hangs in balance_dirty_pages().
> > > 
> > > 
> > > 
> > 
> > Yes, you are right,
> > 
> > both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
> > 3e26c149c358529b1605f8959341d34bc4b880a3 work fine
> > But I didn't pay attention that those are before 
> > f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
> > So, back to the drawing board :-)
> > 
> > Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
> > writeback patches.
> 
> Thank you. I'll try if I can reproduce it locally...
> 
> Fengguang
> 
> 

Bingo,

Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.

I will now bisect to find exact patch that caused this bug,
Thanks,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
> On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
> > > [  673.365631] pdflush   D c21bdecc 0   221  2
> > > [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
> > >  c21b3000 0002
> > > [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
> > > c21bdec0 c03ff849 c21bdef0
> > > [  673.365650]00052974  00ff   
> > >  c21bdef0 000529dc
> > > [  673.365657] Call Trace:
> > > [  673.365659]  [] schedule_timeout+0x48/0xc0
> > > [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> > > [  673.365667]  [] congestion_wait+0x71/0x90
> > > [  673.365671]  [] wb_kupdate+0x9e/0xf0
> > > [  673.365675]  [] pdflush+0x102/0x1d0
> > > [  673.365679]  [] kthread+0x42/0x70
> > > [  673.365683]  [] kernel_thread_helper+0x7/0x18
> > > 
> > 
> > That looks more like the inode writeback patches from Wu than the per
> > bdi dirty stuff. The later typically hangs in balance_dirty_pages().
> > 
> > 
> > 
> 
> Yes, you are right,
> 
> both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
> 3e26c149c358529b1605f8959341d34bc4b880a3 work fine
> But I didn't pay attention that those are before 
> f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
> So, back to the drawing board :-)
> 
> Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
> writeback patches.

Thank you. I'll try if I can reproduce it locally...

Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
> On Mon, 2007-10-22 at 11:59 +0200, Maxim Levitsky wrote:
> > On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
> > > On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
> > > > Hi,
> > > > 
> > > > I found a bug in current -git:
> > > > 
> > > > On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
> > > > Otherwise the system works OK, no disk activity and/or slowdown.
> > > > Suspecting that this is a swap-related problem I tried to turn swap of, 
> > > > but it doesn't affect anything.
> > > > It is probably some accounting bug.
> > > > 
> > > > If I start with init=/bin/bash, then this disappears.
> > > > I tried then to start usual /etc/init.d scripts then, and first one to 
> > > > show this bug was gpm.
> > > > but then I rebooted the system to X without gpm, and I still see 100% 
> > > > iowait.
> > > > 
> > > > No additional messages in dmesg.
> > > 
> > > does sysrq-t show any D state tasks?
> > > 
> > > 
> > This one:
> > Probably per-block device dirty writeback?
> > I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
> > Thanks for the pointer.
> > 
> > 
> > 
> > [  673.365631] pdflush   D c21bdecc 0   221  2
> > [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
> > c21b3000 0002
> > [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
> > c03ff849 c21bdef0
> > [  673.365650]00052974  00ff    
> > c21bdef0 000529dc
> > [  673.365657] Call Trace:
> > [  673.365659]  [] schedule_timeout+0x48/0xc0
> > [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> > [  673.365667]  [] congestion_wait+0x71/0x90
> > [  673.365671]  [] wb_kupdate+0x9e/0xf0
> > [  673.365675]  [] pdflush+0x102/0x1d0
> > [  673.365679]  [] kthread+0x42/0x70
> > [  673.365683]  [] kernel_thread_helper+0x7/0x18
> > 
> 
> That looks more like the inode writeback patches from Wu than the per
> bdi dirty stuff. The later typically hangs in balance_dirty_pages().
> 
> 
> 

Yes, you are right,

both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
3e26c149c358529b1605f8959341d34bc4b880a3 work fine
But I didn't pay attention that those are before 
f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
So, back to the drawing board :-)

Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
writeback patches.
Thanks,
Best regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Peter Zijlstra
On Mon, 2007-10-22 at 11:59 +0200, Maxim Levitsky wrote:
> On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
> > On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
> > > Hi,
> > > 
> > > I found a bug in current -git:
> > > 
> > > On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
> > > Otherwise the system works OK, no disk activity and/or slowdown.
> > > Suspecting that this is a swap-related problem I tried to turn swap of, 
> > > but it doesn't affect anything.
> > > It is probably some accounting bug.
> > > 
> > > If I start with init=/bin/bash, then this disappears.
> > > I tried then to start usual /etc/init.d scripts then, and first one to 
> > > show this bug was gpm.
> > > but then I rebooted the system to X without gpm, and I still see 100% 
> > > iowait.
> > > 
> > > No additional messages in dmesg.
> > 
> > does sysrq-t show any D state tasks?
> > 
> > 
> This one:
> Probably per-block device dirty writeback?
> I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
> Thanks for the pointer.
> 
> 
> 
> [  673.365631] pdflush   D c21bdecc 0   221  2
> [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
> c21b3000 0002
> [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
> c03ff849 c21bdef0
> [  673.365650]00052974  00ff    
> c21bdef0 000529dc
> [  673.365657] Call Trace:
> [  673.365659]  [] schedule_timeout+0x48/0xc0
> [  673.365663]  [] io_schedule_timeout+0x5e/0xb0
> [  673.365667]  [] congestion_wait+0x71/0x90
> [  673.365671]  [] wb_kupdate+0x9e/0xf0
> [  673.365675]  [] pdflush+0x102/0x1d0
> [  673.365679]  [] kthread+0x42/0x70
> [  673.365683]  [] kernel_thread_helper+0x7/0x18
> 

That looks more like the inode writeback patches from Wu than the per
bdi dirty stuff. The later typically hangs in balance_dirty_pages().


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
> On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
> > Hi,
> > 
> > I found a bug in current -git:
> > 
> > On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
> > Otherwise the system works OK, no disk activity and/or slowdown.
> > Suspecting that this is a swap-related problem I tried to turn swap of, but 
> > it doesn't affect anything.
> > It is probably some accounting bug.
> > 
> > If I start with init=/bin/bash, then this disappears.
> > I tried then to start usual /etc/init.d scripts then, and first one to show 
> > this bug was gpm.
> > but then I rebooted the system to X without gpm, and I still see 100% 
> > iowait.
> > 
> > No additional messages in dmesg.
> 
> does sysrq-t show any D state tasks?
> 
> 
This one:
Probably per-block device dirty writeback?
I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
Thanks for the pointer.



[  673.365631] pdflush   D c21bdecc 0   221  2
[  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
c21b3000 0002
[  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
c03ff849 c21bdef0
[  673.365650]00052974  00ff    
c21bdef0 000529dc
[  673.365657] Call Trace:
[  673.365659]  [] schedule_timeout+0x48/0xc0
[  673.365663]  [] io_schedule_timeout+0x5e/0xb0
[  673.365667]  [] congestion_wait+0x71/0x90
[  673.365671]  [] wb_kupdate+0x9e/0xf0
[  673.365675]  [] pdflush+0x102/0x1d0
[  673.365679]  [] kthread+0x42/0x70
[  673.365683]  [] kernel_thread_helper+0x7/0x18


Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 11:11:52 Paolo Ornati wrote:
> On Mon, 22 Oct 2007 08:22:52 +0200
> Maxim Levitsky <[EMAIL PROTECTED]> wrote:
> 
> > I tried to bisect this, but eventually I run into other bugs that cause 
> > system to oops early.
> 
> You can pick a different revision to test with:
>   git-reset --hard  "SHA1"
> 
> Choose one with "git-bisect visualize".
> 

Well, I know that, and I did try this a lot.

The problem is that between good and bad revisions there are few bugs that 
cause the system to oops early,
thus I can't tell whenever the 100% iowait bug is present or not.

Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Peter Zijlstra
On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
> Hi,
> 
> I found a bug in current -git:
> 
> On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
> Otherwise the system works OK, no disk activity and/or slowdown.
> Suspecting that this is a swap-related problem I tried to turn swap of, but 
> it doesn't affect anything.
> It is probably some accounting bug.
> 
> If I start with init=/bin/bash, then this disappears.
> I tried then to start usual /etc/init.d scripts then, and first one to show 
> this bug was gpm.
> but then I rebooted the system to X without gpm, and I still see 100% iowait.
> 
> No additional messages in dmesg.

does sysrq-t show any D state tasks?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Paolo Ornati
On Mon, 22 Oct 2007 08:22:52 +0200
Maxim Levitsky <[EMAIL PROTECTED]> wrote:

> I tried to bisect this, but eventually I run into other bugs that cause 
> system to oops early.

You can pick a different revision to test with:
git-reset --hard  "SHA1"

Choose one with "git-bisect visualize".

-- 
Paolo Ornati
Linux 2.6.23-ge8b8c977 on x86_64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Paolo Ornati
On Mon, 22 Oct 2007 08:22:52 +0200
Maxim Levitsky [EMAIL PROTECTED] wrote:

 I tried to bisect this, but eventually I run into other bugs that cause 
 system to oops early.

You can pick a different revision to test with:
git-reset --hard  SHA1

Choose one with git-bisect visualize.

-- 
Paolo Ornati
Linux 2.6.23-ge8b8c977 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Peter Zijlstra
On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
 Hi,
 
 I found a bug in current -git:
 
 On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
 Otherwise the system works OK, no disk activity and/or slowdown.
 Suspecting that this is a swap-related problem I tried to turn swap of, but 
 it doesn't affect anything.
 It is probably some accounting bug.
 
 If I start with init=/bin/bash, then this disappears.
 I tried then to start usual /etc/init.d scripts then, and first one to show 
 this bug was gpm.
 but then I rebooted the system to X without gpm, and I still see 100% iowait.
 
 No additional messages in dmesg.

does sysrq-t show any D state tasks?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 11:11:52 Paolo Ornati wrote:
 On Mon, 22 Oct 2007 08:22:52 +0200
 Maxim Levitsky [EMAIL PROTECTED] wrote:
 
  I tried to bisect this, but eventually I run into other bugs that cause 
  system to oops early.
 
 You can pick a different revision to test with:
   git-reset --hard  SHA1
 
 Choose one with git-bisect visualize.
 

Well, I know that, and I did try this a lot.

The problem is that between good and bad revisions there are few bugs that 
cause the system to oops early,
thus I can't tell whenever the 100% iowait bug is present or not.

Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
 On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
  Hi,
  
  I found a bug in current -git:
  
  On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
  Otherwise the system works OK, no disk activity and/or slowdown.
  Suspecting that this is a swap-related problem I tried to turn swap of, but 
  it doesn't affect anything.
  It is probably some accounting bug.
  
  If I start with init=/bin/bash, then this disappears.
  I tried then to start usual /etc/init.d scripts then, and first one to show 
  this bug was gpm.
  but then I rebooted the system to X without gpm, and I still see 100% 
  iowait.
  
  No additional messages in dmesg.
 
 does sysrq-t show any D state tasks?
 
 
This one:
Probably per-block device dirty writeback?
I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
Thanks for the pointer.



[  673.365631] pdflush   D c21bdecc 0   221  2
[  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
c21b3000 0002
[  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
c03ff849 c21bdef0
[  673.365650]00052974  00ff    
c21bdef0 000529dc
[  673.365657] Call Trace:
[  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
[  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
[  673.365667]  [c0170d11] congestion_wait+0x71/0x90
[  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
[  673.365675]  [c016beb2] pdflush+0x102/0x1d0
[  673.365679]  [c013fa82] kthread+0x42/0x70
[  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18


Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Peter Zijlstra
On Mon, 2007-10-22 at 11:59 +0200, Maxim Levitsky wrote:
 On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
  On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
   Hi,
   
   I found a bug in current -git:
   
   On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
   Otherwise the system works OK, no disk activity and/or slowdown.
   Suspecting that this is a swap-related problem I tried to turn swap of, 
   but it doesn't affect anything.
   It is probably some accounting bug.
   
   If I start with init=/bin/bash, then this disappears.
   I tried then to start usual /etc/init.d scripts then, and first one to 
   show this bug was gpm.
   but then I rebooted the system to X without gpm, and I still see 100% 
   iowait.
   
   No additional messages in dmesg.
  
  does sysrq-t show any D state tasks?
  
  
 This one:
 Probably per-block device dirty writeback?
 I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
 Thanks for the pointer.
 
 
 
 [  673.365631] pdflush   D c21bdecc 0   221  2
 [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
 c21b3000 0002
 [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
 c03ff849 c21bdef0
 [  673.365650]00052974  00ff    
 c21bdef0 000529dc
 [  673.365657] Call Trace:
 [  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
 [  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
 [  673.365667]  [c0170d11] congestion_wait+0x71/0x90
 [  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
 [  673.365675]  [c016beb2] pdflush+0x102/0x1d0
 [  673.365679]  [c013fa82] kthread+0x42/0x70
 [  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18
 

That looks more like the inode writeback patches from Wu than the per
bdi dirty stuff. The later typically hangs in balance_dirty_pages().


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
 On Mon, 2007-10-22 at 11:59 +0200, Maxim Levitsky wrote:
  On Monday 22 October 2007 11:41:57 Peter Zijlstra wrote:
   On Mon, 2007-10-22 at 08:22 +0200, Maxim Levitsky wrote:
Hi,

I found a bug in current -git:

On my system on of cpus stays 100% in iowait mode (I have core 2 duo)
Otherwise the system works OK, no disk activity and/or slowdown.
Suspecting that this is a swap-related problem I tried to turn swap of, 
but it doesn't affect anything.
It is probably some accounting bug.

If I start with init=/bin/bash, then this disappears.
I tried then to start usual /etc/init.d scripts then, and first one to 
show this bug was gpm.
but then I rebooted the system to X without gpm, and I still see 100% 
iowait.

No additional messages in dmesg.
   
   does sysrq-t show any D state tasks?
   
   
  This one:
  Probably per-block device dirty writeback?
  I am compiling now revision 1f7d6668c29b1dfa307a44844f9bb38356fc989b
  Thanks for the pointer.
  
  
  
  [  673.365631] pdflush   D c21bdecc 0   221  2
  [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4  
  c21b3000 0002
  [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 c21bdec0 
  c03ff849 c21bdef0
  [  673.365650]00052974  00ff    
  c21bdef0 000529dc
  [  673.365657] Call Trace:
  [  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
  [  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
  [  673.365667]  [c0170d11] congestion_wait+0x71/0x90
  [  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
  [  673.365675]  [c016beb2] pdflush+0x102/0x1d0
  [  673.365679]  [c013fa82] kthread+0x42/0x70
  [  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18
  
 
 That looks more like the inode writeback patches from Wu than the per
 bdi dirty stuff. The later typically hangs in balance_dirty_pages().
 
 
 

Yes, you are right,

both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
3e26c149c358529b1605f8959341d34bc4b880a3 work fine
But I didn't pay attention that those are before 
f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
So, back to the drawing board :-)

Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
writeback patches.
Thanks,
Best regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
 On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
   [  673.365631] pdflush   D c21bdecc 0   221  2
   [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
    c21b3000 0002
   [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
   c21bdec0 c03ff849 c21bdef0
   [  673.365650]00052974  00ff   
    c21bdef0 000529dc
   [  673.365657] Call Trace:
   [  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
   [  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
   [  673.365667]  [c0170d11] congestion_wait+0x71/0x90
   [  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
   [  673.365675]  [c016beb2] pdflush+0x102/0x1d0
   [  673.365679]  [c013fa82] kthread+0x42/0x70
   [  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18
   
  
  That looks more like the inode writeback patches from Wu than the per
  bdi dirty stuff. The later typically hangs in balance_dirty_pages().
  
  
  
 
 Yes, you are right,
 
 both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
 3e26c149c358529b1605f8959341d34bc4b880a3 work fine
 But I didn't pay attention that those are before 
 f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
 So, back to the drawing board :-)
 
 Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
 writeback patches.

Thank you. I'll try if I can reproduce it locally...

Fengguang

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
 On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
  On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
[  673.365631] pdflush   D c21bdecc 0   221  2
[  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
 c21b3000 0002
[  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
c21bdec0 c03ff849 c21bdef0
[  673.365650]00052974  00ff   
 c21bdef0 000529dc
[  673.365657] Call Trace:
[  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
[  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
[  673.365667]  [c0170d11] congestion_wait+0x71/0x90
[  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
[  673.365675]  [c016beb2] pdflush+0x102/0x1d0
[  673.365679]  [c013fa82] kthread+0x42/0x70
[  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18

   
   That looks more like the inode writeback patches from Wu than the per
   bdi dirty stuff. The later typically hangs in balance_dirty_pages().
   
   
   
  
  Yes, you are right,
  
  both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
  3e26c149c358529b1605f8959341d34bc4b880a3 work fine
  But I didn't pay attention that those are before 
  f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
  So, back to the drawing board :-)
  
  Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
  writeback patches.
 
 Thank you. I'll try if I can reproduce it locally...
 
 Fengguang
 
 

Bingo,

Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.

I will now bisect to find exact patch that caused this bug,
Thanks,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 12:58:11PM +0200, Maxim Levitsky wrote:
 On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
  On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
   On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
 [  673.365631] pdflush   D c21bdecc 0   221  2
 [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
  c21b3000 0002
 [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
 c21bdec0 c03ff849 c21bdef0
 [  673.365650]00052974  00ff   
  c21bdef0 000529dc
 [  673.365657] Call Trace:
 [  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
 [  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
 [  673.365667]  [c0170d11] congestion_wait+0x71/0x90
 [  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
 [  673.365675]  [c016beb2] pdflush+0x102/0x1d0
 [  673.365679]  [c013fa82] kthread+0x42/0x70
 [  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18
 

That looks more like the inode writeback patches from Wu than the per
bdi dirty stuff. The later typically hangs in balance_dirty_pages().



   
   Yes, you are right,
   
   both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
   3e26c149c358529b1605f8959341d34bc4b880a3 work fine
   But I didn't pay attention that those are before 
   f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
   So, back to the drawing board :-)
   
   Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
   writeback patches.
  
  Thank you. I'll try if I can reproduce it locally...
  
  Fengguang
  
  
 
 Bingo,
 
 Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.
 
 I will now bisect to find exact patch that caused this bug,

This one is most relevant:

1f7decf6d9f06dac008b8d66935c0c3b18e564f9
writeback: introduce writeback_control.more_io to indicate more io

Still compiling the kernel...

Thank you,
Fengguang

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 13:19:08 Fengguang Wu wrote:
 On Mon, Oct 22, 2007 at 12:58:11PM +0200, Maxim Levitsky wrote:
  On Monday 22 October 2007 12:55:25 Fengguang Wu wrote:
   On Mon, Oct 22, 2007 at 12:40:24PM +0200, Maxim Levitsky wrote:
On Monday 22 October 2007 12:22:10 Peter Zijlstra wrote:
  [  673.365631] pdflush   D c21bdecc 0   221  2
  [  673.365635]c21bdee0 0046 0002 c21bdecc c21bdec4 
   c21b3000 0002
  [  673.365643]c0134892 c21b3164 c1e00200 0001 c7109280 
  c21bdec0 c03ff849 c21bdef0
  [  673.365650]00052974  00ff   
   c21bdef0 000529dc
  [  673.365657] Call Trace:
  [  673.365659]  [c03fd728] schedule_timeout+0x48/0xc0
  [  673.365663]  [c03fd50e] io_schedule_timeout+0x5e/0xb0
  [  673.365667]  [c0170d11] congestion_wait+0x71/0x90
  [  673.365671]  [c016b92e] wb_kupdate+0x9e/0xf0
  [  673.365675]  [c016beb2] pdflush+0x102/0x1d0
  [  673.365679]  [c013fa82] kthread+0x42/0x70
  [  673.365683]  [c01050df] kernel_thread_helper+0x7/0x18
  
 
 That looks more like the inode writeback patches from Wu than the per
 bdi dirty stuff. The later typically hangs in balance_dirty_pages().
 
 
 

Yes, you are right,

both revisions 1f7d6668c29b1dfa307a44844f9bb38356fc989b and 
3e26c149c358529b1605f8959341d34bc4b880a3 work fine
But I didn't pay attention that those are before 
f4a1c2bce002f683801bcdbbc9fd89804614fb6b.
So, back to the drawing board :-)

Will test revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, just after 
writeback patches.
   
   Thank you. I'll try if I can reproduce it locally...
   
   Fengguang
   
   
  
  Bingo,
  
  Revision 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b shows this bug.
  
  I will now bisect to find exact patch that caused this bug,
 
 This one is most relevant:
 
 1f7decf6d9f06dac008b8d66935c0c3b18e564f9
 writeback: introduce writeback_control.more_io to indicate more io
Exactly.
 
 Still compiling the kernel...
 
 Thank you,
 Fengguang
 
 
Hi,


I Bisected this bug to exactly this commit:

2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
writeback: introduce writeback_control.more_io to indicate more io

Reverting it and compiling latest git shows no more signs of that bug.
Thanks,
Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 02:21:21PM +0200, Maxim Levitsky wrote:
 I Bisected this bug to exactly this commit:
 
 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
 writeback: introduce writeback_control.more_io to indicate more io
 
 Reverting it and compiling latest git shows no more signs of that bug.

Thank you very much.

I guess your system has some difficulty in writing back some inodes.
(i.e. a bug disclosed by this patch, the 100% iowait only makes it
more obvious)

I cannot reproduce it with your .config, so would you recompile and
run the kernel with the above commit _and_ the below debugging patch?

Thank you,
Fengguang
---

 fs/fs-writeback.c |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

--- linux-2.6.23-git17.orig/fs/fs-writeback.c
+++ linux-2.6.23-git17/fs/fs-writeback.c
@@ -164,12 +164,25 @@ static void redirty_tail(struct inode *i
list_move(inode-i_list, sb-s_dirty);
 }
 
+#define requeue_io(inode)  \
+   do {\
+   __requeue_io(inode, __LINE__);  \
+   } while (0)
+
 /*
  * requeue inode for re-scanning after sb-s_io list is exhausted.
  */
-static void requeue_io(struct inode *inode)
+static void __requeue_io(struct inode *inode, int line)
 {
list_move(inode-i_list, inode-i_sb-s_more_io);
+
+   printk(KERN_DEBUG redirtied inode %lu size %llu at %02x:%02x(%s), line 
%d.\n,
+   inode-i_ino,
+   i_size_read(inode),
+   MAJOR(inode-i_sb-s_dev),
+   MINOR(inode-i_sb-s_dev),
+   inode-i_sb-s_id,
+   line);
 }
 
 static void inode_sync_complete(struct inode *inode)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 14:37:07 Fengguang Wu wrote:
 On Mon, Oct 22, 2007 at 02:21:21PM +0200, Maxim Levitsky wrote:
  I Bisected this bug to exactly this commit:
  
  2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
  writeback: introduce writeback_control.more_io to indicate more io
  
  Reverting it and compiling latest git shows no more signs of that bug.
 
 Thank you very much.
 
 I guess your system has some difficulty in writing back some inodes.
 (i.e. a bug disclosed by this patch, the 100% iowait only makes it
 more obvious)
 
 I cannot reproduce it with your .config, so would you recompile and
 run the kernel with the above commit _and_ the below debugging patch?
 
 Thank you,
 Fengguang
 ---
 
  fs/fs-writeback.c |   15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)
 
 --- linux-2.6.23-git17.orig/fs/fs-writeback.c
 +++ linux-2.6.23-git17/fs/fs-writeback.c
 @@ -164,12 +164,25 @@ static void redirty_tail(struct inode *i
   list_move(inode-i_list, sb-s_dirty);
  }
  
 +#define requeue_io(inode)\
 + do {\
 + __requeue_io(inode, __LINE__);  \
 + } while (0)
 +
  /*
   * requeue inode for re-scanning after sb-s_io list is exhausted.
   */
 -static void requeue_io(struct inode *inode)
 +static void __requeue_io(struct inode *inode, int line)
  {
   list_move(inode-i_list, inode-i_sb-s_more_io);
 +
 + printk(KERN_DEBUG redirtied inode %lu size %llu at %02x:%02x(%s), line 
 %d.\n,
 + inode-i_ino,
 + i_size_read(inode),
 + MAJOR(inode-i_sb-s_dev),
 + MINOR(inode-i_sb-s_dev),
 + inode-i_sb-s_id,
 + line);
  }
  
  static void inode_sync_complete(struct inode *inode)
 
 

Hi,
Thank you very much too, for helping me.


Applied.
Had to kill klogd, since kernel generates tons of redirtied inode messages.
Size of the kern.log is 863 KB, thus I I don't think it is polite to attachit.
Don't know whenever it is ok to put it on pastebin too.

Anyway, it shows lots of redirtied inode... messages,
and while most of them are at 08:02(sda2) , my reiserfs root partition, some 
are

Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
00:0f(tmpfs), line 300.

 line 300 is shown always

(I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)


Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 03:05:35PM +0200, Maxim Levitsky wrote:
 Hi,
 Thank you very much too, for helping me.

You are welcome :-)

 Applied.
 Had to kill klogd, since kernel generates tons of redirtied inode messages.
 Size of the kern.log is 863 KB, thus I I don't think it is polite to attachit.
 Don't know whenever it is ok to put it on pastebin too.
 
 Anyway, it shows lots of redirtied inode... messages,
 and while most of them are at 08:02(sda2) , my reiserfs root partition, 
 some are
 
 Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
 00:0f(tmpfs), line 300.
 
  line 300 is shown always
 
 (I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)

Thank you for the testing out.

Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?

Thank you,
Fengguang

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Maxim Levitsky
On Monday 22 October 2007 15:10:45 Fengguang Wu wrote:
 On Mon, Oct 22, 2007 at 03:05:35PM +0200, Maxim Levitsky wrote:
  Hi,
  Thank you very much too, for helping me.
 
 You are welcome :-)
 
  Applied.
  Had to kill klogd, since kernel generates tons of redirtied inode messages.
  Size of the kern.log is 863 KB, thus I I don't think it is polite to 
  attachit.
  Don't know whenever it is ok to put it on pastebin too.
  
  Anyway, it shows lots of redirtied inode... messages,
  and while most of them are at 08:02(sda2) , my reiserfs root partition, 
  some are
  
  Oct 22 14:50:27 MAIN kernel: [   73.643794] redirtied inode 2582 size 0 at 
  00:0f(tmpfs), line 300.
  
   line 300 is shown always
  
  (I have /var/run, /var/lock, /dev mounted as tmpfs, default kubuntu setup)
 
 Thank you for the testing out.
 
 Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?
No, I don't think so, like I said it sometimes shows the same message on tmpfs
 
 Thank you,
 Fengguang
 
 
Best Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 100% iowait on one of cpus in current -git

2007-10-22 Thread Fengguang Wu
On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote:
 Hmm, Maybe it's an reiserfs related issue.  Do you have the full log file?

Bingo! It can be reproduced in -mm on reiserfs:

# mkfs.reiserfs /dev/sdb1
# mount /dev/sdb1 /test
# cp bin /test
wait for a while
# dmesg
[...]
[  418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1)
[  418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1)
[  418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1)
[  418.346131] requeue_io 308: inode 9 size 180 at 08:11(sdb1)
[  418.346136] requeue_io 308: inode 10 size 1488 at 08:11(sdb1)
[  418.346142] requeue_io 308: inode 12 size 1358 at 08:11(sdb1)
[  418.346148] requeue_io 308: inode 13 size 482 at 08:11(sdb1)
[  418.346153] requeue_io 308: inode 14 size 171 at 08:11(sdb1)
[  418.346159] requeue_io 308: inode 15 size 93 at 08:11(sdb1)
[  418.346164] requeue_io 308: inode 16 size 81 at 08:11(sdb1)
[  418.346170] requeue_io 308: inode 17 size 212 at 08:11(sdb1)
[  418.346176] requeue_io 308: inode 18 size 431 at 08:11(sdb1)
[  418.346181] requeue_io 308: inode 19 size 231 at 08:11(sdb1)
[  418.346187] requeue_io 308: inode 20 size 1756 at 08:11(sdb1)
[  418.346193] requeue_io 308: inode 21 size 1229 at 08:11(sdb1)
[  418.346198] requeue_io 308: inode 22 size 157 at 08:11(sdb1)
[  418.346204] requeue_io 308: inode 23 size 3430 at 08:11(sdb1)
[  418.346210] requeue_io 308: inode 24 size 200 at 08:11(sdb1)
[  418.346215] requeue_io 308: inode 25 size 202 at 08:11(sdb1)
[  418.346221] requeue_io 308: inode 26 size 386 at 08:11(sdb1)
[  418.346226] requeue_io 308: inode 27 size 264 at 08:11(sdb1)
[  418.346232] requeue_io 308: inode 28 size 268 at 08:11(sdb1)
[  418.346238] requeue_io 308: inode 29 size 1228 at 08:11(sdb1)
[  418.346243] requeue_io 308: inode 30 size 404 at 08:11(sdb1)
[  418.346249] requeue_io 308: inode 31 size 2452 at 08:11(sdb1)
[  418.346255] requeue_io 308: inode 32 size 1236 at 08:11(sdb1)
[  418.346260] requeue_io 308: inode 33 size 655 at 08:11(sdb1)
[  418.346266] requeue_io 308: inode 35 size 330 at 08:11(sdb1)
[  418.346272] requeue_io 308: inode 36 size 248 at 08:11(sdb1)
[  418.346277] requeue_io 308: inode 37 size 683 at 08:11(sdb1)
[  418.346283] requeue_io 308: inode 38 size 1451 at 08:11(sdb1)
[  418.346288] requeue_io 308: inode 39 size 894 at 08:11(sdb1)
[  418.346294] requeue_io 308: inode 40 size 879 at 08:11(sdb1)
[  418.346300] requeue_io 308: inode 42 size 797 at 08:11(sdb1)
[  418.346305] requeue_io 308: inode 43 size 1314 at 08:11(sdb1)
[  418.346311] requeue_io 308: inode 44 size 1463 at 08:11(sdb1)
[  418.346317] requeue_io 308: inode 45 size 3032 at 08:11(sdb1)
[  418.346322] requeue_io 308: inode 46 size 325 at 08:11(sdb1)
[  418.346328] requeue_io 308: inode 47 size 583 at 08:11(sdb1)
[  418.346334] requeue_io 308: inode 48 size 1660 at 08:11(sdb1)
[  418.346339] requeue_io 308: inode 49 size 3159 at 08:11(sdb1)
[  418.346345] requeue_io 308: inode 50 size 510 at 08:11(sdb1)
[  418.346350] requeue_io 308: inode 51 size 100 at 08:11(sdb1)
[  418.346356] requeue_io 308: inode 52 size 143 at 08:11(sdb1)
[  418.346370] requeue_io 308: inode 53 size 954 at 08:11(sdb1)
[  418.346373] requeue_io 308: inode 54 size 322 at 08:11(sdb1)
[  418.346376] requeue_io 308: inode 55 size 970 at 08:11(sdb1)
[  418.346379] requeue_io 308: inode 57 size 483 at 08:11(sdb1)
[  418.346382] requeue_io 308: inode 58 size 1125 at 08:11(sdb1)
[  418.346385] requeue_io 308: inode 59 size 2196 at 08:11(sdb1)
[  418.346388] requeue_io 308: inode 60 size 104 at 08:11(sdb1)
[  418.346391] requeue_io 308: inode 61 size 488 at 08:11(sdb1)
[  418.346394] requeue_io 308: inode 62 size 116 at 08:11(sdb1)
[  418.346397] requeue_io 308: inode 63 size 907 at 08:11(sdb1)
[  418.346400] requeue_io 308: inode 64 size 1076 at 08:11(sdb1)
[  418.346403] requeue_io 308: inode 65 size 460 at 08:11(sdb1)
[  418.346406] requeue_io 308: inode 66 size 1092 at 08:11(sdb1)
[  418.346409] requeue_io 308: inode 67 size 424 at 08:11(sdb1)
[  418.346412] requeue_io 308: inode 68 size 696 at 08:11(sdb1)
[  418.346415] requeue_io 308: inode 70 size 137 at 08:11(sdb1)
[  418.346418] requeue_io 308: inode 71 size 201 at 08:11(sdb1)
[  418.346421] requeue_io 308: inode 72 size 150 at 08:11(sdb1)
[  418.346424] requeue_io 308: inode 73 size 188 at 08:11(sdb1)
[  418.346427] requeue_io 308: inode 75 size 1208 at 08:11(sdb1)
[  418.346431] requeue_io 308: inode 76 size 493 at 08:11(sdb1)
[  418.346434] requeue_io 308: inode 77 size 484 at 08:11(sdb1)
[  418.346437] requeue_io 308: inode 78 size 356 at 08:11(sdb1)
[  418.346440] requeue_io 308: inode 79 size 895 at 08:11(sdb1)
[  418.346443] requeue_io 308: inode 80 size 847 at 08:11(sdb1)
[  418.346446] requeue_io 308: inode 81 size 3281 at 08:11(sdb1)
[  418.346449] requeue_io 308: inode 82 size 3329 at 08:11(sdb1)
[  418.346452] requeue_io 308: inode 83 size 115 at 08:11(sdb1)
[  418.346455] requeue_io 308: inode 84 size 644 at 08:11(sdb1)
[  418.346458] requeue_io