Problems with 2.6.11-rc4, Opteron server and MPTBase : Round 2

2005-02-22 Thread Weathers, Norman R.
OK, some more information concerning the previous problems with
2.6.11-rc4.

Ok, 2.6.11-rc3 does the exact same thing as 2.6.11-rc4 does, which is
crashes whenever you try and boot up our Opteron based server which has
an LSI MPT Fusion based SCSI card as the primary card.  Now comes the
weird part...  It only crashes if the mptbase and mptscsih are modules.
If the drivers are built into the kernel, the 2.6.11-rc3 kernel boots
just fine.  I am going to see if the 2.6.11-rc4 kernel boots as well
when the driver is built in.

Thanks again for any help anyone can give.

Norman Weathers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Problems with 2.6.11-rc4, Opteron server and MPTBase

2005-02-22 Thread Weathers, Norman R.


-Original Post 
Weathers, Norman R. wrote:

>To all whom it may concern:
>
>
>I am having trouble with several of the 2.6 kernels.  The last one is
>the one that is perhaps most annoying.
>
>I have a dual Opteron based NFS server that keeps crashing when I try
to
>boot up with 2.6.11-rc4.
>
>The node is trying to boot from an mptbase device, and it is also
>loading modules for a qlogic fiber card (module is qla2300, qla2xxx,
and
>the scsi_transport_fc).  Now, as it is scanning the drives, it does a
>perfect impersonation of a dying duck and crashes.  
>
>Here is the output from the crash:'
>
>Fusion MPT base driver 3.01.18
>Loading scsi_modCopyright (c) 1999-2004 LSI Logic Corporation
>.ko module
>Loadmptbase: Initiating ioc0 bringup
>ing sd_mod.ko module
>Loading mptbase.ko module
>ioc0: 53C1030: Capabilities={Initiator}
>Unable to handle kernel paging request at 25b0 RIP: 
>{vmalloc_fault+557}
>PGD 821ad067 PUD 2c50067 PMD 0 
>Oops:  [1] SMP 
>CPU 0 
>Modules linked in: mptbase sd_mod scsi_mod
>Pid: 0, comm: swapper Not tainted 2.6.11-rc4
>RIP: 0010:[] {vmalloc_fault+557}
>RSP: :80455230  EFLAGS: 00010212
>RAX: 000fe050 RBX: 0001 RCX: 0018
>RDX:  RSI: 03fff000 RDI: 3fff
>RBP:  R08: 8100fba3c000 R09: fba3c000
>R10: 0008 R11: 810081b44760 R12: 80455338
>R13: 0003 R14: c244 R15: 
>FS:  () GS:804c1800()
>knlGS:
>CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
>CR2: 25b0 CR3: 02c58000 CR4: 06e0
>Process swapper (pid: 0, threadinfo 804c8000, task
>80358380)
>Stack: 801207ce 0001 0001
>80455278 
>   80358380 80455338 80317933
> 
>   000b000e 0082 
>Call Trace: {do_page_fault+238}
>{autoremove
>_wake_function+9} 
>   {__wake_up_common+67}
>{error_exit+0} 
>   {:mptbase:mpt_interrupt+45}
>{update_wall_
>time+9} 
>   {handle_IRQ_event+44}
>{__do_IRQ+222} 
>   {do_IRQ+66}
{ret_from_intr+0}
>
> {thread_return+42}
>{default_idle+0
>} 
>   {default_idle+36}
>{cpu_idle+58} 
>   {start_kernel+416}
>{x86_64_start_kernel+4
>04} 
>   
>
>Code: 48 2b 82 b0 25 00 00 48 8d 34 c5 00 00 00 00 48 29 c6 48 8b 
>RIP {vmalloc_fault+557} RSP 
>CR2: 25b0
> <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
>
>Has anyone seen this in this kernel?  2.6.7 - 2.6.10 has not had a
>problem booting, but there has been other problems that are forcing us
>to move up to a newer kernel (2.6.7 has stability issues, 2.6.9 had
some
>interesting issues with our IBM servers and USB keyboards (complete
>lockups), and I had problems with kswapd on 2.6.7 - 2.6.10).
>
>Thanks for any help you may be able to shed on this problem.  Please CC
>me.  I was on the kernel list, but I think my company has blocked that
>email due to the volume of the traffic.
>
>Norman Weathers
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>

---End Original Post--

---Response from Original
Post---
Hi!
Did you change some configuration options or did add/remove hardware?

Matthias-Christian Ott

---End Response

>>>>>> My Response <<<<<<<<<

No, nothing has changed on the box outside of trying to get the OS up
and running stable.

I forgot to mention last time that the OS is Fedora Core2, and the
kernel was compiled on that box using the GCC on that box.  I can get
the config and anything else that anyone may need to help solve this
problem.  In the mean time, I am trying to download 2.6.11-rc3 to see if
it will boot correctly on this box.  If it does, than there is some
change between rc3 and rc4 that may have caused the problem.

Norman Weathers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with 2.6.11-rc4, Opteron server and MPTBase

2005-02-22 Thread Weathers, Norman R.

To all whom it may concern:


I am having trouble with several of the 2.6 kernels.  The last one is
the one that is perhaps most annoying.

I have a dual Opteron based NFS server that keeps crashing when I try to
boot up with 2.6.11-rc4.

The node is trying to boot from an mptbase device, and it is also
loading modules for a qlogic fiber card (module is qla2300, qla2xxx, and
the scsi_transport_fc).  Now, as it is scanning the drives, it does a
perfect impersonation of a dying duck and crashes.  

Here is the output from the crash:'

Fusion MPT base driver 3.01.18
Loading scsi_modCopyright (c) 1999-2004 LSI Logic Corporation
.ko module
Loadmptbase: Initiating ioc0 bringup
ing sd_mod.ko module
Loading mptbase.ko module
ioc0: 53C1030: Capabilities={Initiator}
Unable to handle kernel paging request at 25b0 RIP: 
{vmalloc_fault+557}
PGD 821ad067 PUD 2c50067 PMD 0 
Oops:  [1] SMP 
CPU 0 
Modules linked in: mptbase sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.11-rc4
RIP: 0010:[] {vmalloc_fault+557}
RSP: :80455230  EFLAGS: 00010212
RAX: 000fe050 RBX: 0001 RCX: 0018
RDX:  RSI: 03fff000 RDI: 3fff
RBP:  R08: 8100fba3c000 R09: fba3c000
R10: 0008 R11: 810081b44760 R12: 80455338
R13: 0003 R14: c244 R15: 
FS:  () GS:804c1800()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 25b0 CR3: 02c58000 CR4: 06e0
Process swapper (pid: 0, threadinfo 804c8000, task
80358380)
Stack: 801207ce 0001 0001
80455278 
   80358380 80455338 80317933
 
   000b000e 0082 
Call Trace: {do_page_fault+238}
{autoremove
_wake_function+9} 
   {__wake_up_common+67}
{error_exit+0} 
   {:mptbase:mpt_interrupt+45}
{update_wall_
time+9} 
   {handle_IRQ_event+44}
{__do_IRQ+222} 
   {do_IRQ+66} {ret_from_intr+0}

 {thread_return+42}
{default_idle+0
} 
   {default_idle+36}
{cpu_idle+58} 
   {start_kernel+416}
{x86_64_start_kernel+4
04} 
   

Code: 48 2b 82 b0 25 00 00 48 8d 34 c5 00 00 00 00 48 29 c6 48 8b 
RIP {vmalloc_fault+557} RSP 
CR2: 25b0
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

Has anyone seen this in this kernel?  2.6.7 - 2.6.10 has not had a
problem booting, but there has been other problems that are forcing us
to move up to a newer kernel (2.6.7 has stability issues, 2.6.9 had some
interesting issues with our IBM servers and USB keyboards (complete
lockups), and I had problems with kswapd on 2.6.7 - 2.6.10).

Thanks for any help you may be able to shed on this problem.  Please CC
me.  I was on the kernel list, but I think my company has blocked that
email due to the volume of the traffic.

Norman Weathers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with 2.6.11-rc4, Opteron server and MPTBase

2005-02-22 Thread Weathers, Norman R.

To all whom it may concern:


I am having trouble with several of the 2.6 kernels.  The last one is
the one that is perhaps most annoying.

I have a dual Opteron based NFS server that keeps crashing when I try to
boot up with 2.6.11-rc4.

The node is trying to boot from an mptbase device, and it is also
loading modules for a qlogic fiber card (module is qla2300, qla2xxx, and
the scsi_transport_fc).  Now, as it is scanning the drives, it does a
perfect impersonation of a dying duck and crashes.  

Here is the output from the crash:'

Fusion MPT base driver 3.01.18
Loading scsi_modCopyright (c) 1999-2004 LSI Logic Corporation
.ko module
Loadmptbase: Initiating ioc0 bringup
ing sd_mod.ko module
Loading mptbase.ko module
ioc0: 53C1030: Capabilities={Initiator}
Unable to handle kernel paging request at 25b0 RIP: 
8012064d{vmalloc_fault+557}
PGD 821ad067 PUD 2c50067 PMD 0 
Oops:  [1] SMP 
CPU 0 
Modules linked in: mptbase sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.11-rc4
RIP: 0010:[8012064d] 8012064d{vmalloc_fault+557}
RSP: :80455230  EFLAGS: 00010212
RAX: 000fe050 RBX: 0001 RCX: 0018
RDX:  RSI: 03fff000 RDI: 3fff
RBP:  R08: 8100fba3c000 R09: fba3c000
R10: 0008 R11: 810081b44760 R12: 80455338
R13: 0003 R14: c244 R15: 
FS:  () GS:804c1800()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 25b0 CR3: 02c58000 CR4: 06e0
Process swapper (pid: 0, threadinfo 804c8000, task
80358380)
Stack: 801207ce 0001 0001
80455278 
   80358380 80455338 80317933
 
   000b000e 0082 
Call Trace:IRQ 801207ce{do_page_fault+238}
8014c179{autoremove
_wake_function+9} 
   80131d83{__wake_up_common+67}
8010eddd{error_exit+0} 
   8802c02d{:mptbase:mpt_interrupt+45}
8013fbd9{update_wall_
time+9} 
   8015777c{handle_IRQ_event+44}
8015788e{__do_IRQ+222} 
   80111392{do_IRQ+66} 8010e981{ret_from_intr+0}

EOI 802f7c4a{thread_return+42}
8010c420{default_idle+0
} 
   8010c444{default_idle+36}
8010c58a{cpu_idle+58} 
   804ca910{start_kernel+416}
804ca294{x86_64_start_kernel+4
04} 
   

Code: 48 2b 82 b0 25 00 00 48 8d 34 c5 00 00 00 00 48 29 c6 48 8b 
RIP 8012064d{vmalloc_fault+557} RSP 80455230
CR2: 25b0
 0Kernel panic - not syncing: Aiee, killing interrupt handler!

Has anyone seen this in this kernel?  2.6.7 - 2.6.10 has not had a
problem booting, but there has been other problems that are forcing us
to move up to a newer kernel (2.6.7 has stability issues, 2.6.9 had some
interesting issues with our IBM servers and USB keyboards (complete
lockups), and I had problems with kswapd on 2.6.7 - 2.6.10).

Thanks for any help you may be able to shed on this problem.  Please CC
me.  I was on the kernel list, but I think my company has blocked that
email due to the volume of the traffic.

Norman Weathers

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Problems with 2.6.11-rc4, Opteron server and MPTBase

2005-02-22 Thread Weathers, Norman R.


-Original Post 
Weathers, Norman R. wrote:

To all whom it may concern:


I am having trouble with several of the 2.6 kernels.  The last one is
the one that is perhaps most annoying.

I have a dual Opteron based NFS server that keeps crashing when I try
to
boot up with 2.6.11-rc4.

The node is trying to boot from an mptbase device, and it is also
loading modules for a qlogic fiber card (module is qla2300, qla2xxx,
and
the scsi_transport_fc).  Now, as it is scanning the drives, it does a
perfect impersonation of a dying duck and crashes.  

Here is the output from the crash:'

Fusion MPT base driver 3.01.18
Loading scsi_modCopyright (c) 1999-2004 LSI Logic Corporation
.ko module
Loadmptbase: Initiating ioc0 bringup
ing sd_mod.ko module
Loading mptbase.ko module
ioc0: 53C1030: Capabilities={Initiator}
Unable to handle kernel paging request at 25b0 RIP: 
8012064d{vmalloc_fault+557}
PGD 821ad067 PUD 2c50067 PMD 0 
Oops:  [1] SMP 
CPU 0 
Modules linked in: mptbase sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.11-rc4
RIP: 0010:[8012064d] 8012064d{vmalloc_fault+557}
RSP: :80455230  EFLAGS: 00010212
RAX: 000fe050 RBX: 0001 RCX: 0018
RDX:  RSI: 03fff000 RDI: 3fff
RBP:  R08: 8100fba3c000 R09: fba3c000
R10: 0008 R11: 810081b44760 R12: 80455338
R13: 0003 R14: c244 R15: 
FS:  () GS:804c1800()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 25b0 CR3: 02c58000 CR4: 06e0
Process swapper (pid: 0, threadinfo 804c8000, task
80358380)
Stack: 801207ce 0001 0001
80455278 
   80358380 80455338 80317933
 
   000b000e 0082 
Call Trace:IRQ 801207ce{do_page_fault+238}
8014c179{autoremove
_wake_function+9} 
   80131d83{__wake_up_common+67}
8010eddd{error_exit+0} 
   8802c02d{:mptbase:mpt_interrupt+45}
8013fbd9{update_wall_
time+9} 
   8015777c{handle_IRQ_event+44}
8015788e{__do_IRQ+222} 
   80111392{do_IRQ+66}
8010e981{ret_from_intr+0}

EOI 802f7c4a{thread_return+42}
8010c420{default_idle+0
} 
   8010c444{default_idle+36}
8010c58a{cpu_idle+58} 
   804ca910{start_kernel+416}
804ca294{x86_64_start_kernel+4
04} 
   

Code: 48 2b 82 b0 25 00 00 48 8d 34 c5 00 00 00 00 48 29 c6 48 8b 
RIP 8012064d{vmalloc_fault+557} RSP 80455230
CR2: 25b0
 0Kernel panic - not syncing: Aiee, killing interrupt handler!

Has anyone seen this in this kernel?  2.6.7 - 2.6.10 has not had a
problem booting, but there has been other problems that are forcing us
to move up to a newer kernel (2.6.7 has stability issues, 2.6.9 had
some
interesting issues with our IBM servers and USB keyboards (complete
lockups), and I had problems with kswapd on 2.6.7 - 2.6.10).

Thanks for any help you may be able to shed on this problem.  Please CC
me.  I was on the kernel list, but I think my company has blocked that
email due to the volume of the traffic.

Norman Weathers

-
To unsubscribe from this list: send the line unsubscribe linux-kernel
in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

  


---End Original Post--

---Response from Original
Post---
Hi!
Did you change some configuration options or did add/remove hardware?

Matthias-Christian Ott

---End Response

 My Response 

No, nothing has changed on the box outside of trying to get the OS up
and running stable.

I forgot to mention last time that the OS is Fedora Core2, and the
kernel was compiled on that box using the GCC on that box.  I can get
the config and anything else that anyone may need to help solve this
problem.  In the mean time, I am trying to download 2.6.11-rc3 to see if
it will boot correctly on this box.  If it does, than there is some
change between rc3 and rc4 that may have caused the problem.

Norman Weathers

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with 2.6.11-rc4, Opteron server and MPTBase : Round 2

2005-02-22 Thread Weathers, Norman R.
OK, some more information concerning the previous problems with
2.6.11-rc4.

Ok, 2.6.11-rc3 does the exact same thing as 2.6.11-rc4 does, which is
crashes whenever you try and boot up our Opteron based server which has
an LSI MPT Fusion based SCSI card as the primary card.  Now comes the
weird part...  It only crashes if the mptbase and mptscsih are modules.
If the drivers are built into the kernel, the 2.6.11-rc3 kernel boots
just fine.  I am going to see if the 2.6.11-rc4 kernel boots as well
when the driver is built in.

Thanks again for any help anyone can give.

Norman Weathers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.10: kswapd spins like crazy

2005-02-04 Thread Weathers, Norman R.


We have had a similar problem with all kernels since 2.6.8.1.  It has
gotten so bad that we had to drop back to 2.6.7 with some extra patches
to get our systems working.  Our situation is a little bit different.

We are using smp Opteron boxes as NFS servers.  Under almost any load at
all, kswapd goes nuts, taking up
99 % of the CPU cycles for long periods of time.  With 2.6.7, this has
not been noticed as bad (just periods of about 3 - 5 seconds of 10 - 35
% utilized, then off for a few seconds, then back again.  Sometimes
kswapd lingers longer as the most aggressive app in top, but with 2.6.7,
the nfsd's are the most prevalent).

Also, we have noticed something else.  Our servers have dual Broadcom
gigabit nics (Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
(rev 03)).  We have bonded both NICS back to our core switch, both
running at gigabit speed.  Under different loads, we start to get call
traces in dmesg and the syslog.  An excerpt follows:


Call Trace: {__alloc_pages+816}
{del_timer+115}
   {__get_free_pages+16}
{kmem_getpages+38}
   {cache_grow+190}
{cache_alloc_refill+422}
   {kmem_cache_alloc+54}
{dst_alloc+47}
   {ip_route_input_slow+1639}
{udp_rcv+267}
   {ip_rcv+526}
{netif_receive_skb+477}

{:bcm5700:MM_IndicateRxPackets+920}
   {:bcm5700:bcm5700_poll+158}
{net_rx_action+132}
   {__do_softirq+113}
{do_softirq+53}
   {do_IRQ+335}
{ret_from_intr+0}
 {thread_return+41}
{default_idle+0}
   {default_idle+36}
{cpu_idle+44}
   {start_kernel+453}
swapper: page allocation failure. order:0, mode:0x20

Call Trace: {__alloc_pages+816}
{end_8259A_irq+100}
   {__get_free_pages+16}
{kmem_getpages+38}
   {cache_grow+190}
{cache_alloc_refill+422}
   {kmem_cache_alloc+54}
{dst_alloc+47}
   {ip_route_input_slow+1639}
{ip_rcv+526}
   {try_to_wake_up+523}
{netif_receive_skb+477}

{:bcm5700:MM_IndicateRxPackets+920}
   {:bcm5700:bcm5700_poll+158}
{net_rx_action+132}
   {__do_softirq+113}
{do_softirq+53}
   {do_IRQ+335}
{ret_from_intr+0}
 {thread_return+41}
{default_idle+0}
   {default_idle+36}
{cpu_idle+44}
   {start_kernel+453}
swapper: page allocation failure. order:0, mode:0x20

Call Trace: {__alloc_pages+816}
{end_8259A_irq+100}
   {__get_free_pages+16}
{kmem_getpages+38}
   {cache_grow+190}
{cache_alloc_refill+422}
   {kmem_cache_alloc+54}
{dst_alloc+47}
   {ip_route_input_slow+1639}
{udp_rcv+267}
   {ip_rcv+526}
{netif_receive_skb+477}

{:bcm5700:MM_IndicateRxPackets+920}
   {:bcm5700:bcm5700_poll+158}
{net_rx_action+132}
   {__do_softirq+113}
{do_softirq+53}
   {do_IRQ+335}
{ret_from_intr+0}
 {thread_return+41}
{default_idle+0}
   {default_idle+36}
{cpu_idle+44}
   {start_kernel+453}
swapper: page allocation failure. order:0, mode:0x20

Call Trace: {__alloc_pages+816}
{end_8259A_irq+100}
   {__get_free_pages+16}
{kmem_getpages+38}
   {cache_grow+190}
{cache_alloc_refill+422}
   {kmem_cache_alloc+54}
{dst_alloc+47}
   {ip_route_input_slow+1639}
{ip_rcv+526}
   {netif_receive_skb+477}
{:bcm5700:MM_IndicateRxPackets+920}
   {:bcm5700:bcm5700_poll+158}
{net_rx_action+132}
   {__do_softirq+113}
{do_softirq+53}
   {do_IRQ+335}
{ret_from_intr+0}
 {thread_return+41}
{default_idle+0}
   {default_idle+36}
{cpu_idle+44}
   {start_kernel+453}
swapper: page allocation failure. order:0, mode:0x20

Call Trace: {__alloc_pages+816}
{__get_free_pages+16}
   {kmem_getpages+38}
{cache_grow+190}
   {cache_alloc_refill+422}
{kmem_cache_alloc+54}
   {dst_alloc+47}
{ip_route_input_slow+1639}
   {ip_rcv+526}
{netif_receive_skb+477}

This was just a partial listing from one of our servers.  I had read in
several lists that this was not considered fatal.  The problem is that
with our setup, it has turned fatal, to the point of locking out the
system remotely, and only a reset from the machine itself able to work
(didn't even honor the sysrq-b combo at the console).

Has anyone else run into this?  I can get this kind of error using about
20 clients (100 MB connected) hitting one server (dual gigabit bonded).
With 2.6.8.1 and newer, the errors are reproducible, but I can't exactly
tell when they happen (either write or read).  I think I have seen them
happen in both writes and reads.  And the kswapd problems happened
during writes and reads both as well.

I can also get the kswapd going crazy with a local set of disk I/O
tests.

Any information needed, please ask.  Any help would be appreciated.

Thanks,
Norman Weathers




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Nick Piggin
Sent: Thursday, February 03, 2005 7:20 PM
To: Andrew Morton
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
Subject: Re: 2.6.10: kswapd spins like crazy



Andrew Morton wrote:
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
>>Oh, attached should be a minimal fix if you would 

RE: 2.6.10: kswapd spins like crazy

2005-02-04 Thread Weathers, Norman R.


We have had a similar problem with all kernels since 2.6.8.1.  It has
gotten so bad that we had to drop back to 2.6.7 with some extra patches
to get our systems working.  Our situation is a little bit different.

We are using smp Opteron boxes as NFS servers.  Under almost any load at
all, kswapd goes nuts, taking up
99 % of the CPU cycles for long periods of time.  With 2.6.7, this has
not been noticed as bad (just periods of about 3 - 5 seconds of 10 - 35
% utilized, then off for a few seconds, then back again.  Sometimes
kswapd lingers longer as the most aggressive app in top, but with 2.6.7,
the nfsd's are the most prevalent).

Also, we have noticed something else.  Our servers have dual Broadcom
gigabit nics (Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
(rev 03)).  We have bonded both NICS back to our core switch, both
running at gigabit speed.  Under different loads, we start to get call
traces in dmesg and the syslog.  An excerpt follows:


Jan/06 03:50 pmCall Trace:IRQ 80158fa0{__alloc_pages+816}
8013ffd3{del_timer+115}
Jan/06 03:50 pm   80158fe0{__get_free_pages+16}
8015c886{kmem_getpages+38}
Jan/06 03:50 pm   8015d8be{cache_grow+190}
8015db16{cache_alloc_refill+422}
Jan/06 03:50 pm   8015de06{kmem_cache_alloc+54}
802d5eaf{dst_alloc+47}
Jan/06 03:50 pm   802e3d17{ip_route_input_slow+1639}
803085bb{udp_rcv+267}
Jan/06 03:50 pm   802e612e{ip_rcv+526}
802d297d{netif_receive_skb+477}
Jan/06 03:50 pm
a0120fe8{:bcm5700:MM_IndicateRxPackets+920}
Jan/06 03:50 pm   a011c9fe{:bcm5700:bcm5700_poll+158}
802d2b94{net_rx_action+132}
Jan/06 03:50 pm   8013c4b1{__do_softirq+113}
8013c565{do_softirq+53}
Jan/06 03:50 pm   80113baf{do_IRQ+335}
80111001{ret_from_intr+0}
Jan/06 03:50 pmEOI 8031e419{thread_return+41}
8010eb20{default_idle+0}
Jan/06 03:50 pm   8010eb44{default_idle+36}
8010ebdc{cpu_idle+44}
Jan/06 03:50 pm   80517885{start_kernel+453}
Jan/06 03:50 pmswapper: page allocation failure. order:0, mode:0x20

Jan/06 03:50 pmCall Trace:IRQ 80158fa0{__alloc_pages+816}
801158b4{end_8259A_irq+100}
Jan/06 03:50 pm   80158fe0{__get_free_pages+16}
8015c886{kmem_getpages+38}
Jan/06 03:50 pm   8015d8be{cache_grow+190}
8015db16{cache_alloc_refill+422}
Jan/06 03:50 pm   8015de06{kmem_cache_alloc+54}
802d5eaf{dst_alloc+47}
Jan/06 03:50 pm   802e3d17{ip_route_input_slow+1639}
802e612e{ip_rcv+526}
Jan/06 03:50 pm   80131b2b{try_to_wake_up+523}
802d297d{netif_receive_skb+477}
Jan/06 03:50 pm
a0120fe8{:bcm5700:MM_IndicateRxPackets+920}
Jan/06 03:50 pm   a011c9fe{:bcm5700:bcm5700_poll+158}
802d2b94{net_rx_action+132}
Jan/06 03:50 pm   8013c4b1{__do_softirq+113}
8013c565{do_softirq+53}
Jan/06 03:50 pm   80113baf{do_IRQ+335}
80111001{ret_from_intr+0}
Jan/06 03:50 pmEOI 8031e419{thread_return+41}
8010eb20{default_idle+0}
Jan/06 03:50 pm   8010eb44{default_idle+36}
8010ebdc{cpu_idle+44}
Jan/06 03:50 pm   80517885{start_kernel+453}
Jan/06 03:50 pmswapper: page allocation failure. order:0, mode:0x20

Jan/06 03:50 pmCall Trace:IRQ 80158fa0{__alloc_pages+816}
801158b4{end_8259A_irq+100}
Jan/06 03:50 pm   80158fe0{__get_free_pages+16}
8015c886{kmem_getpages+38}
Jan/06 03:50 pm   8015d8be{cache_grow+190}
8015db16{cache_alloc_refill+422}
Jan/06 03:50 pm   8015de06{kmem_cache_alloc+54}
802d5eaf{dst_alloc+47}
Jan/06 03:50 pm   802e3d17{ip_route_input_slow+1639}
803085bb{udp_rcv+267}
Jan/06 03:50 pm   802e612e{ip_rcv+526}
802d297d{netif_receive_skb+477}
Jan/06 03:50 pm
a0120fe8{:bcm5700:MM_IndicateRxPackets+920}
Jan/06 03:50 pm   a011c9fe{:bcm5700:bcm5700_poll+158}
802d2b94{net_rx_action+132}
Jan/06 03:50 pm   8013c4b1{__do_softirq+113}
8013c565{do_softirq+53}
Jan/06 03:50 pm   80113baf{do_IRQ+335}
80111001{ret_from_intr+0}
Jan/06 03:50 pmEOI 8031e419{thread_return+41}
8010eb20{default_idle+0}
Jan/06 03:50 pm   8010eb44{default_idle+36}
8010ebdc{cpu_idle+44}
Jan/06 03:50 pm   80517885{start_kernel+453}
Jan/06 03:50 pmswapper: page allocation failure. order:0, mode:0x20

Jan/06 03:50 pmCall Trace:IRQ 80158fa0{__alloc_pages+816}
801158b4{end_8259A_irq+100}
Jan/06 03:50 pm   80158fe0{__get_free_pages+16}
8015c886{kmem_getpages+38}
Jan/06 03:50 pm   8015d8be{cache_grow+190}
8015db16{cache_alloc_refill+422}
Jan/06 03:50 pm   8015de06{kmem_cache_alloc+54}