Re: kernel 2.6.23: what IS the VM doing?

2007-11-02 Thread Sami Farin
On Fri, Sep 14, 2007 at 20:17:46 +0300, Sami Farin wrote:
> On Wed, Sep 05, 2007 at 18:48:51 -0400, Rik van Riel wrote:
> > Sami Farin wrote:
> >> On Wed, Sep 05, 2007 at 12:24:26 -0400, Rik van Riel wrote:
> >> ...
>  *shrug*
> >>> The attached patch should make sure kswapd does not free an
> >>> excessive number of pages in zone_normal just because the
> >>> pages in zone_highmem are difficult to free.
> >>>
> >>> It does give kswapd a large margin to continue putting equal
> >>> pressure on all zones in normal situations.
> >>>
> >>> Sami, could you try out this patch to see if it helps your
> >>> situation?
> >>
> >> Thanks, Rik.  bzImage is ready, I probably reboot inside one
> >> month for a reason or other 8-)
> >
> > The more I look at the bug, the more I see that it is probably
> > not very easy to reproduce on demand.  I have, however, a full
> 
> Well, I now booted to x86_64 kernel.
> 
> I can still reproduce this.
> When I unload ipset modules, kernel resumes "normal" operation, i.e.,
> not swapping like mad.

I now have 2GB of extra RAM, so 3GB in total, on x86_64 system.
If ipset tries to allocate 512 KB or more, kernel
goes into swapping frenzy, of which system does not recover
in over 30 minutes unless I press sysrq+k.

some kernel settings...:
vm.dirty_ratio = 4
vm.dirty_background_ratio = 2
vm.swappiness = 10
vm.vfs_cache_pressure = 10
vm.overcommit_memory = 2

SysRq : Show Blocked State
  taskPC stack   pid father
kswapd0   D  0   258  2
 8100be333ca0 0046  0286
 8100be333c50  8100be560080 807af3c0
 0064 0001285af6e6 00ff 802463f8
Call Trace:
 [] __mod_timer+0xb8/0xd0
 [] schedule_timeout+0x63/0xd0
 [] process_timeout+0x0/0x10
 [] io_schedule_timeout+0x28/0x40
 [] congestion_wait+0x8c/0xb0
 [] autoremove_wake_function+0x0/0x40
 [] throttle_vm_writeout+0x54/0xc0
 [] shrink_zone+0xe3/0x140
 [] kswapd+0x510/0x5b0
 [] autoremove_wake_function+0x0/0x40
 [] kswapd+0x0/0x5b0
 [] kthread+0x4d/0x80
 [] child_rip+0xa/0x12
 [] kthread+0x0/0x80
 [] child_rip+0x0/0x12

irqbalanceD 00aa7f00 0  2110  1
 8100b8c4fcd8 0082  
 8100a5e57f00 00080076 8100be5f91c0 810060182140
 8100b8c4fc88 0282 8100b8c4fcb8 8040e1f6
Call Trace:
 [] __up_read+0x46/0xb0
 [] io_schedule+0x28/0x40
 [] sync_page+0x37/0x50
 [] __wait_on_bit_lock+0x4e/0x80
 [] sync_page+0x0/0x50
 [] __lock_page+0x65/0x70
 [] wake_bit_function+0x0/0x30
 [] handle_mm_fault+0x269/0x870
 [] do_page_fault+0x1a4/0x900
 [] free_pages_and_swap_cache+0x9e/0xd0
 [] unmap_region+0x136/0x150
 [] remove_vma+0x5e/0x70
 [] __up_write+0xd0/0x130
 [] error_exit+0x0/0x84

svscanD 00df7900 0  2438  1
 8100b6607cd8 0082  
 81005f81fd80 00080076 8100bb516180 810060182140
 8100b6607c88 0282 8100b6607cb8 8040e1f6
Call Trace:
 [] __up_read+0x46/0xb0
 [] io_schedule+0x28/0x40
 [] sync_page+0x37/0x50
 [] __wait_on_bit_lock+0x4e/0x80
 [] sync_page+0x0/0x50
 [] __lock_page+0x65/0x70
 [] wake_bit_function+0x0/0x30
 [] handle_mm_fault+0x269/0x870
 [] xfs_vn_getattr+0x4c/0x140
 [] do_page_fault+0x1a4/0x900
 [] vfs_getattr+0x60/0x90
 [] vfs_fstat+0x45/0x60
 [] sys32_fstat64+0x2e/0x40
 [] error_exit+0x0/0x84
...
ipset D  0  3713   3574
 8100566237a8 0086 810056623748 802335e2
 810056623748 0010 8100b085e0c0 810060182140
 810056623758 0282 810056623788 8040e1f6
Call Trace:
 [] enqueue_entity+0x42/0x60
 [] __up_read+0x46/0xb0
 [] io_schedule+0x28/0x40
 [] sync_page+0x37/0x50
 [] __wait_on_bit+0x55/0x80
 [] sync_page+0x0/0x50
 [] wait_on_page_bit+0x6f/0x80
 [] wake_bit_function+0x0/0x30
 [] shrink_page_list+0x164/0x680
 [] del_timer_sync+0x1a/0x30
 [] schedule_timeout+0x6b/0xd0
 [] process_timeout+0x0/0x10
 [] congestion_wait+0x9a/0xb0
 [] shrink_inactive_list+0x40a/0x420
 [] shrink_zone+0xcf/0x140
 [] try_to_free_pages+0x174/0x270
 [] __alloc_pages+0x160/0x350
 [] printk+0x67/0x70
 [] cache_alloc_refill+0x2e3/0x570
 [] __kmalloc+0x113/0x120
 [] :ip_set_nethash:retry+0x183/0x500
 [] :ip_set:__ip_set_addip+0x6f/0x90
 [] :ip_set:ip_set_sockfn_get+0x93d/0xd50
 [] nf_sockopt+0x142/0x150
 [] nf_getsockopt+0xf/0x20
 [] ip_getsockopt+0x98/0xc0
 [] raw_getsockopt+0x11/0x30
 [] sock_common_getsockopt+0xf/0x20
 [] sys_getsockopt+0x9c/0xd0
 [] tracesys+0xdc/0xe1


SysRq : Show Memory
Mem-info:
DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  22   Cold: hi:   62, btch:  15 usd:  55
CPU1: Hot: hi:  186, btch:  

Re: kernel 2.6.23: what IS the VM doing?

2007-11-02 Thread Sami Farin
On Fri, Sep 14, 2007 at 20:17:46 +0300, Sami Farin wrote:
 On Wed, Sep 05, 2007 at 18:48:51 -0400, Rik van Riel wrote:
  Sami Farin wrote:
  On Wed, Sep 05, 2007 at 12:24:26 -0400, Rik van Riel wrote:
  ...
  *shrug*
  The attached patch should make sure kswapd does not free an
  excessive number of pages in zone_normal just because the
  pages in zone_highmem are difficult to free.
 
  It does give kswapd a large margin to continue putting equal
  pressure on all zones in normal situations.
 
  Sami, could you try out this patch to see if it helps your
  situation?
 
  Thanks, Rik.  bzImage is ready, I probably reboot inside one
  month for a reason or other 8-)
 
  The more I look at the bug, the more I see that it is probably
  not very easy to reproduce on demand.  I have, however, a full
 
 Well, I now booted to x86_64 kernel.
 
 I can still reproduce this.
 When I unload ipset modules, kernel resumes normal operation, i.e.,
 not swapping like mad.

I now have 2GB of extra RAM, so 3GB in total, on x86_64 system.
If ipset tries to allocate 512 KB or more, kernel
goes into swapping frenzy, of which system does not recover
in over 30 minutes unless I press sysrq+k.

some kernel settings...:
vm.dirty_ratio = 4
vm.dirty_background_ratio = 2
vm.swappiness = 10
vm.vfs_cache_pressure = 10
vm.overcommit_memory = 2

SysRq : Show Blocked State
  taskPC stack   pid father
kswapd0   D  0   258  2
 8100be333ca0 0046  0286
 8100be333c50  8100be560080 807af3c0
 0064 0001285af6e6 00ff 802463f8
Call Trace:
 [802463f8] __mod_timer+0xb8/0xd0
 [80657da3] schedule_timeout+0x63/0xd0
 [80246120] process_timeout+0x0/0x10
 [806576f8] io_schedule_timeout+0x28/0x40
 [8027f5fc] congestion_wait+0x8c/0xb0
 [802524e0] autoremove_wake_function+0x0/0x40
 [80279ad4] throttle_vm_writeout+0x54/0xc0
 [8027d6f3] shrink_zone+0xe3/0x140
 [8027dea0] kswapd+0x510/0x5b0
 [802524e0] autoremove_wake_function+0x0/0x40
 [8027d990] kswapd+0x0/0x5b0
 [802520dd] kthread+0x4d/0x80
 [8020cae8] child_rip+0xa/0x12
 [80252090] kthread+0x0/0x80
 [8020cade] child_rip+0x0/0x12

irqbalanceD 00aa7f00 0  2110  1
 8100b8c4fcd8 0082  
 8100a5e57f00 00080076 8100be5f91c0 810060182140
 8100b8c4fc88 0282 8100b8c4fcb8 8040e1f6
Call Trace:
 [8040e1f6] __up_read+0x46/0xb0
 [806579d8] io_schedule+0x28/0x40
 [802736e7] sync_page+0x37/0x50
 [80657e9e] __wait_on_bit_lock+0x4e/0x80
 [802736b0] sync_page+0x0/0x50
 [80273695] __lock_page+0x65/0x70
 [80252520] wake_bit_function+0x0/0x30
 [80282289] handle_mm_fault+0x269/0x870
 [802261b4] do_page_fault+0x1a4/0x900
 [8028cafe] free_pages_and_swap_cache+0x9e/0xd0
 [80285b86] unmap_region+0x136/0x150
 [80285bfe] remove_vma+0x5e/0x70
 [8040e330] __up_write+0xd0/0x130
 [80659d9d] error_exit+0x0/0x84

svscanD 00df7900 0  2438  1
 8100b6607cd8 0082  
 81005f81fd80 00080076 8100bb516180 810060182140
 8100b6607c88 0282 8100b6607cb8 8040e1f6
Call Trace:
 [8040e1f6] __up_read+0x46/0xb0
 [806579d8] io_schedule+0x28/0x40
 [802736e7] sync_page+0x37/0x50
 [80657e9e] __wait_on_bit_lock+0x4e/0x80
 [802736b0] sync_page+0x0/0x50
 [80273695] __lock_page+0x65/0x70
 [80252520] wake_bit_function+0x0/0x30
 [80282289] handle_mm_fault+0x269/0x870
 [803acc2c] xfs_vn_getattr+0x4c/0x140
 [802261b4] do_page_fault+0x1a4/0x900
 [8029cf80] vfs_getattr+0x60/0x90
 [8029d515] vfs_fstat+0x45/0x60
 [8022b35e] sys32_fstat64+0x2e/0x40
 [80659d9d] error_exit+0x0/0x84
...
ipset D  0  3713   3574
 8100566237a8 0086 810056623748 802335e2
 810056623748 0010 8100b085e0c0 810060182140
 810056623758 0282 810056623788 8040e1f6
Call Trace:
 [802335e2] enqueue_entity+0x42/0x60
 [8040e1f6] __up_read+0x46/0xb0
 [806579d8] io_schedule+0x28/0x40
 [802736e7] sync_page+0x37/0x50
 [80657fb5] __wait_on_bit+0x55/0x80
 [802736b0] sync_page+0x0/0x50
 [802738bf] wait_on_page_bit+0x6f/0x80
 [80252520] wake_bit_function+0x0/0x30
 [8027ccd4] shrink_page_list+0x164/0x680
 [8024632a] del_timer_sync+0x1a/0x30
 [80657dab] schedule_timeout+0x6b/0xd0
 [80246120] process_timeout+0x0/0x10
 [8027f60a] congestion_wait+0x9a/0xb0
 [8027d5fa]