Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-28 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED]
Date: Mon, 27 Oct 2008 18:13:54 -0600

 [PATCH] fix amd8111e rx return code
 
 The amd8111e rx poll routine currently mishandles the case when we process
 exactly the number of packets specified in the budget.
 
 This patch is basically as suggested by David Miller.
 
 Signed-off-by: Chris Friesen [EMAIL PROTECTED]

I had to apply this by hand because your email client heavily
corrupted the patch.

Thanks.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-28 Thread Chris Friesen

David Miller wrote:

From: Chris Friesen [EMAIL PROTECTED]
Date: Mon, 27 Oct 2008 18:13:54 -0600



[PATCH] fix amd8111e rx return code

The amd8111e rx poll routine currently mishandles the case when we process
exactly the number of packets specified in the budget.

This patch is basically as suggested by David Miller.

Signed-off-by: Chris Friesen [EMAIL PROTECTED]



I had to apply this by hand because your email client heavily
corrupted the patch.


Crap.  Sorry about that, it looks like I forgot to set my line-wrap to 0 
before creating the email.  @#$#% Thunderbird.


Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-27 Thread Chris Friesen

David Miller wrote:


 Probably the simplest fix is to get rid of the rx_not_empty label and
 protect the entire:

/* Receive descriptor is empty now */
spin_lock_irqsave(lp-lock, flags);
__netif_rx_complete(dev, napi);
writel(VAL0|RINTEN0, mmio + INTEN0);
writel(VAL2 | RDMD0, mmio + CMD0);
spin_unlock_irqrestore(lp-lock, flags);

 code block with a test such as:

if (rx_pkt_limit  0)

 (yes, greater than zero, not = 0)

 then replace the rx_not_empty goto with a simple break.


Are you sure about that?  Doing that, if we --rx_pkt_limit  0 we'll only 
break out of the inner while loop.  We then check then interrupt status 
register and potentially loop through the do/while loop again (maybe 
decrementing rx_pkt_limit again) even though we've used up our budget.


If I leave the label and jump and just add the rx_pkt_limit  0 test, it 
seems to work.


Chris


From: Chris Friesen[EMAIL PROTECTED]
Subject: [PATCH] fix amd8111e rx return code

The amd8111e rx poll routine currently mishandles the case when we process
exactly the number of packets specified in the budget.

This patch is basically as suggested by David Miller.

Signed-off-by: Chris Friesen [EMAIL PROTECTED]

diff --git a/drivers/net/amd8111e.c b/drivers/net/amd8111e.c
index c54967f..ba1be0b 100644
--- a/drivers/net/amd8111e.c
+++ b/drivers/net/amd8111e.c
@@ -833,12 +833,14 @@ static int amd8111e_rx_poll(struct napi_struct *napi, 
int budget)


} while(intr0  RINT0);

-   /* Receive descriptor is empty now */
-   spin_lock_irqsave(lp-lock, flags);
-   __netif_rx_complete(dev, napi);
-   writel(VAL0|RINTEN0, mmio + INTEN0);
-   writel(VAL2 | RDMD0, mmio + CMD0);
-   spin_unlock_irqrestore(lp-lock, flags);
+   if (rx_pkt_limit  0) {
+   /* Receive descriptor is empty now */
+   spin_lock_irqsave(lp-lock, flags);
+   __netif_rx_complete(dev, napi);
+   writel(VAL0|RINTEN0, mmio + INTEN0);
+   writel(VAL2 | RDMD0, mmio + CMD0);
+   spin_unlock_irqrestore(lp-lock, flags);
+   }

 rx_not_empty:
return num_rx_pkt;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-24 Thread Chris Friesen

David Miller wrote:

From: Brandeburg, Jesse [EMAIL PROTECTED] Date: Thu, 23 Oct
2008 14:50:06 -0700


Chris Friesen wrote:

I tried booting a post 2.6.27 -git on a Motorola ATCA6101 (very similar
to a Maple board).  The first time I booted I got the first log below
via the serial console.  I rebooted and got as far as a login prompt.
I was able to log in via the serial console, but then got an almost
identical oops again, as shown in the second log below.

I configed out the gigE drivers for the backplane so the only remaining
network link was the e100 link used for booting, but the problem
remained.

Anyone have any idea what might be causing this?

Thanks,

Chris


Starting xinetd: [  OK  ] Starting cron: [  OK  ] Unable to handle
kernel paging request for data at address 0x00100108

that 00100108 pattern looks familiar, I'm not much help here, but I think
that had something to do with the list management of the poll_list in a
netdev struct.

so now you just have to figure out why someone's netdev struct is
becoming NULL. :-)


Usually this is an indication of returning the wrong value from the
driver's -poll() routine.


Looks like I was wrong before...the remaining ethernet link is an AMD-8111, 
not an e100.  Sorry about that.


I backed out 6ba33ac amd8111e: delete non NAPI code from the driver.  With 
NAPI disabled, the blade appears stable.  With NAPI enabled, the original 
problem recurred.


So...it would appear that the NAPI code is somehow buggy, and 6ba33ac should 
probably be reverted until the problem is found and fixed.


Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-24 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED]
Date: Fri, 24 Oct 2008 17:39:00 -0600

 So...it would appear that the NAPI code is somehow buggy, and
 6ba33ac should probably be reverted until the problem is found and
 fixed.

No I think the problem is simple enough that someone should study the
-poll() routine quickly and audit it's return values.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


RE: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-23 Thread Brandeburg, Jesse
Chris Friesen wrote:
 I tried booting a post 2.6.27 -git on a Motorola ATCA6101 (very
 similar to a Maple board).  The first time I booted I got the first
 log below via the serial console.  I rebooted and got as far as a
 login prompt.  I was able to log in via the serial console, but then
 got an almost identical oops again, as shown in the second log below.
 
 I configed out the gigE drivers for the backplane so the only
 remaining network link was the e100 link used for booting, but the
 problem remained. 
 
 Anyone have any idea what might be causing this?
 
 Thanks,
 
 Chris
 
 
 Starting xinetd: [  OK  ]
 Starting cron: [  OK  ]
 Unable to handle kernel paging request for data at address 0x00100108

that 00100108 pattern looks familiar, I'm not much help here, but I think that 
had something to do with the list management of the poll_list in a netdev 
struct.

so now you just have to figure out why someone's netdev struct is becoming 
NULL. :-)

 Faulting instruction address: 0xc028c1cc
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=2 Maple
 Modules linked in:
 NIP: c028c1cc LR: c028c13c CTR: 
 REGS: cfff7b90 TRAP: 0300   Not tainted 
 (2.6.27-05329-g39076ba) 
 MSR: 90009032 EE,ME,IR,DR  CR: 2224  XER: 2000
 DAR: 00100108, DSISR: 0a00

 TASK = c0017a061080[0] 'swapper' THREAD: c0017a078000 CPU: 1
 GPR00:  cfff7e10 c059bfe0
 0020 GPR04: 0001 c00178179800
 c027fda8  GPR08: 
 00200200 0001 00100100 GPR12:
 2222 c05bc500  
 GPR16:   
  GPR20:  000a
 0001 0001 GPR24: c05a2280
 c05f5134 fffd9bbe 00ec GPR28:
 c6e30c28 0020 c0543440 c0017a279b40
 NIP [c028c1cc] .net_rx_action+0x1e4/0x26c  
 LR [c028c13c] .net_rx_action+0x154/0x26c
 Call Trace:
 [cfff7e10] [c028c13c] .net_rx_action+0x154/0x26c
 (unreliable) [cfff7ec0] [c0056938]
 .__do_softirq+0xf8/0x1f4 [cfff7f90] [c0024334]
 .call_do_softirq+0x14/0x24 [c0017a07b970] [c000bcdc]
 .do_softirq+0xf0/0x104 [c0017a07ba10] [c0056ae8]
 .irq_exit+0x70/0x88 [c0017a07ba90] [c000ba18]
 .do_IRQ+0x14c/0x244 [c0017a07bb30] [c0004710]
 hardware_interrupt_entry+0x18/0x1c --- Exception: 501 at
   .raw_local_irq_restore+0x38/0x44 LR = .cpu_idle+0xd8/0x154
 [c0017a07be20] [c0012068] .cpu_idle+0x118/0x154
 (unreliable) [c0017a07bec0] [c03d4304]
 .start_secondary+0x310/0x3e8 [c0017a07bf90] [c00072b4]
 .start_secondary_prolog+0x10/0x14 Instruction dump:
 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020
 e81f0010 7809ffe3 40820038 e93f0008 e97f f92b0008 f969
 e95c0008 fb9f 
 
 
 
 
 [EMAIL PROTECTED]:/root uname -a
 Linux 10.41.18.77 2.6.27-05329-g39076ba #1 SMP Tue Oct 21 16:46:06
 CST 2008 ppc64 GNU/Linux
 [EMAIL PROTECTED]:/root Unable to handle kernel paging request for data at
 address 0x00100108
 Faulting instruction address: 0xc028c1cc
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=2 Maple
 Modules linked in:
 NIP: c028c1cc LR: c028c13c CTR: 
 REGS: cfff7b90 TRAP: 0300   Not tainted 
 (2.6.27-05329-g39076ba) 
 MSR: 90009032 EE,ME,IR,DR  CR: 2224  XER: 2000
 DAR: 00100108, DSISR: 0a00
 TASK = c0017a061080[0] 'swapper' THREAD: c0017a078000 CPU: 1
 GPR00:  cfff7e10 c059bfe0
 0020 GPR04: 0001 0001
 c027fda8  GPR08: 
 00200200 0001 00100100 GPR12:
 2222 c05bc500  
 GPR16:   
  GPR20:  000a
 0001 0001 GPR24: c05a2280
 c05f5134 0001000387ff 010c GPR28:
 c6e30c28 0020 c0543440 c0017a2b0b40
 NIP [c028c1cc] .net_rx_action+0x1e4/0x26c  
 LR [c028c13c] .net_rx_action+0x154/0x26c
 Call Trace:
 [cfff7e10] [c028c13c] .net_rx_action+0x154/0x26c
 (unreliable) [cfff7ec0] [c0056938]
 .__do_softirq+0xf8/0x1f4 [cfff7f90] [c0024334]
 .call_do_softirq+0x14/0x24 [c0017a07b970] [c000bcdc]
 .do_softirq+0xf0/0x104 [c0017a07ba10] [c0056ae8]
 .irq_exit+0x70/0x88 [c0017a07ba90] [c000ba18]
 .do_IRQ+0x14c/0x244 [c0017a07bb30] [c0004710]
 hardware_interrupt_entry+0x18/0x1c --- Exception: 501 at
   

Re: [BUG] oops in net_rx_action on 64-bit powerpc

2008-10-23 Thread David Miller
From: Brandeburg, Jesse [EMAIL PROTECTED]
Date: Thu, 23 Oct 2008 14:50:06 -0700

 Chris Friesen wrote:
  I tried booting a post 2.6.27 -git on a Motorola ATCA6101 (very
  similar to a Maple board).  The first time I booted I got the first
  log below via the serial console.  I rebooted and got as far as a
  login prompt.  I was able to log in via the serial console, but then
  got an almost identical oops again, as shown in the second log below.
  
  I configed out the gigE drivers for the backplane so the only
  remaining network link was the e100 link used for booting, but the
  problem remained. 
  
  Anyone have any idea what might be causing this?
  
  Thanks,
  
  Chris
  
  
  Starting xinetd: [  OK  ]
  Starting cron: [  OK  ]
  Unable to handle kernel paging request for data at address 0x00100108
 
 that 00100108 pattern looks familiar, I'm not much help here, but I think 
 that had something to do with the list management of the poll_list in a 
 netdev struct.
 
 so now you just have to figure out why someone's netdev struct is becoming 
 NULL. :-)

Usually this is an indication of returning the wrong value from
the driver's -poll() routine.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev