Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-08-15 Thread Wolfram Sang
Hello Wolfgang (and all),

On Wed, Aug 13, 2008 at 04:12:17PM +0200, Wolfgang Grandegger wrote:

 ...but I prepared a patch to do the reset in the process context. Would be
 nice if you could give the patch below a try.
Will do later. Thanks!

Still, I think it might be useful to discuss if a complete reset
is not overkill anyhow. The documentation says that only the FIFO
and the Bestcom needs to be reset. Or, if we take the big hammer
solution, it would be good to audit if this won't cause any
side-effects (do all related states get updated...).

Remember that there lately have been patches removing some improper
usage of netif_* calls; furthermore, I also found some questionable
areas in this code (mails will be sent later). So, this driver
needs some careful attention IMHO.

All the best,

   Wolfram

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-08-13 Thread Juergen Beisert
On Donnerstag, 10. Juli 2008, Grant Likely wrote:
 On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
  Hello,
 
  today, I was debugging a kernel crash on a board with a MPC5200B using
  2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

 snip

  I assume the proper thing to do is to set a flag in the ISR and handle
  the soft reset later in some other context. Having never dealt with the
  network core and its drivers so far, I am not sure which place would be
  the right one to perform the soft reset. To not make things worse, I
  hope people with more insight to network stuff can deliver a suitable
  solution to this problem.

 Thanks for the bug report.  I'll take a look.

Some update:

Enabling XLB pipelining let occure this error less often. Kernel disables this 
feature by default yet.
The comment talks about cfr errate 292. that is valid for MPC5200A, but 
_it_seems_ no longer for MPC5200B. Has anybody experience if we can enabling 
pipelining on MPC5200B CPUs without triggering this bug?

We currently are playing with this setting:

Index: arch/powerpc/platforms/52xx/mpc52xx_common.c
===
--- arch/powerpc/platforms/52xx/mpc52xx_common.c.orig
+++ arch/powerpc/platforms/52xx/mpc52xx_common.c
@@ -99,11 +99,11 @@
out_be32(xlb-master_pri_enable, 0xff);
out_be32(xlb-master_priority, 0x);
 
-   /* Disable XLB pipelining
-* (cfr errate 292. We could do this only just before ATA PIO
-*  transaction and re-enable it afterwards ...)
+   /*
+* Enable pipelining, fixes FEC problems. The previous workaround seems
+* not needed, as we have an MPC5200B (not A).
 */
-   out_be32(xlb-config, in_be32(xlb-config) | MPC52xx_XLB_CFG_PLDIS);
+   out_be32(xlb-config, in_be32(xlb-config)  ~MPC52xx_XLB_CFG_PLDIS);
 
iounmap(xlb);
 }

jbe

-- 
Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry
    Handelsregister: Amtsgericht Hildesheim, HRA 2686
     Vertretung Sued/Muenchen, Germany
   Phone: +49-8766-939 228 |  Fax: +49-5121-206917-9
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-08-13 Thread Wolfgang Grandegger

Hi Jürgen,

Juergen Beisert wrote:

On Donnerstag, 10. Juli 2008, Grant Likely wrote:

On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:

Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

snip


I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

Thanks for the bug report.  I'll take a look.


Some update:

Enabling XLB pipelining let occure this error less often. Kernel disables this 
feature by default yet.
The comment talks about cfr errate 292. that is valid for MPC5200A, but 
_it_seems_ no longer for MPC5200B. Has anybody experience if we can enabling 
pipelining on MPC5200B CPUs without triggering this bug?


No, I haven't...


We currently are playing with this setting:

Index: arch/powerpc/platforms/52xx/mpc52xx_common.c
===
--- arch/powerpc/platforms/52xx/mpc52xx_common.c.orig
+++ arch/powerpc/platforms/52xx/mpc52xx_common.c
@@ -99,11 +99,11 @@
out_be32(xlb-master_pri_enable, 0xff);
out_be32(xlb-master_priority, 0x);
 
-	/* Disable XLB pipelining

-* (cfr errate 292. We could do this only just before ATA PIO
-*  transaction and re-enable it afterwards ...)
+   /*
+* Enable pipelining, fixes FEC problems. The previous workaround seems
+* not needed, as we have an MPC5200B (not A).
 */
-   out_be32(xlb-config, in_be32(xlb-config) | MPC52xx_XLB_CFG_PLDIS);
+   out_be32(xlb-config, in_be32(xlb-config)  ~MPC52xx_XLB_CFG_PLDIS);
 
 	iounmap(xlb);

 }


...but I prepared a patch to do the reset in the process context. Would be
nice if you could give the patch below a try.

Wolfgang.

===
From: Wolfgang Grandegger [EMAIL PROTECTED]
Subject: [PATCH] powerpc: fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

Calling mpc52xx_fec_reset() in the ISR is a bug. This patch puts a task
for resetting the FEC into the kernel-global workqueue to be processed
lateron savely in process context.

Signed-off-by: Wolfgang Grandegger [EMAIL PROTECTED]
---
drivers/net/fec_mpc52xx.c |   31 +++
1 file changed, 27 insertions(+), 4 deletions(-)

Index: linux-2.6.26/drivers/net/fec_mpc52xx.c
===
--- linux-2.6.26.orig/drivers/net/fec_mpc52xx.c
+++ linux-2.6.26/drivers/net/fec_mpc52xx.c
@@ -24,6 +24,7 @@
#include linux/crc32.h
#include linux/hardirq.h
#include linux/delay.h
+#include linux/workqueue.h
#include linux/of_device.h
#include linux/of_platform.h

@@ -57,6 +58,8 @@ struct mpc52xx_fec_priv {
struct bcom_task *tx_dmatsk;
spinlock_t lock;
int msg_enable;
+   struct work_struct reset_task;
+   struct net_device *ndev;

/* MDIO link details */
int phy_addr;
@@ -83,6 +86,19 @@ static int debug = -1;   /* the above defa
module_param(debug, int, 0);
MODULE_PARM_DESC(debug, debugging messages level);

+static void mpc52xx_fec_reset_task(struct work_struct *work)
+{
+   struct mpc52xx_fec_priv *priv =
+   container_of(work, struct mpc52xx_fec_priv, reset_task);
+   struct net_device *dev = priv-ndev;
+
+   dev_warn(dev-dev, deferred FEC reset due to errors\n);
+
+   mpc52xx_fec_reset(dev);
+
+   netif_wake_queue(dev);
+}
+
static void mpc52xx_fec_tx_timeout(struct net_device *dev)
{
dev_warn(dev-dev, transmit timed out\n);
@@ -355,6 +371,8 @@ static int mpc52xx_fec_close(struct net_
{
struct mpc52xx_fec_priv *priv = netdev_priv(dev);

+   flush_scheduled_work();
+
netif_stop_queue(dev);

mpc52xx_fec_stop(dev);
@@ -522,9 +540,9 @@ static irqreturn_t mpc52xx_fec_interrupt
if (net_ratelimit()  (ievent  FEC_IEVENT_XFIFO_ERROR))
dev_warn(dev-dev, FEC_IEVENT_XFIFO_ERROR\n);

-   mpc52xx_fec_reset(dev);
-
-   netif_wake_queue(dev);
+   netif_stop_queue(dev);
+   netif_carrier_off(dev);
+   schedule_work(priv-reset_task);
return IRQ_HANDLED;
}

@@ -934,7 +952,9 @@ mpc52xx_fec_probe(struct of_device *op, 


priv-t_irq = priv-r_irq = ndev-irq = NO_IRQ; /* IRQ are free for now 
*/

-   spin_lock_init(priv-lock);
+   priv-ndev = ndev;
+   spin_lock_init(priv-lock);
+   INIT_WORK(priv-reset_task, mpc52xx_fec_reset_task);

/* ioremap the zones */
priv-fec = ioremap(mem.start, sizeof(struct mpc52xx_fec));
@@ -1068,6 +1088,9 @@ 

Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-07-26 Thread Grant Likely
On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
 Hello,
 
 today, I was debugging a kernel crash on a board with a MPC5200B using
 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
 
 static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
 {
 [...]
   /* on fifo error, soft-reset fec */
   if (ievent  (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {
 
   if (net_ratelimit()  (ievent  FEC_IEVENT_RFIFO_ERROR))
   dev_warn(dev-dev, FEC_IEVENT_RFIFO_ERROR\n);
   if (net_ratelimit()  (ievent  FEC_IEVENT_XFIFO_ERROR))
   dev_warn(dev-dev, FEC_IEVENT_XFIFO_ERROR\n);
 
   mpc52xx_fec_reset(dev);
 
   netif_wake_queue(dev);
   return IRQ_HANDLED;
   }
 [...]
 }
 
 Calling mpc52xx_fec_reset() from interrupt context is bad, at least
 because
 
 a) it calls phy_write, which contains BUG_ON(in_interrupt())
 b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
if the reset was successful (1..50 us)
 
 I assume the proper thing to do is to set a flag in the ISR and handle
 the soft reset later in some other context. Having never dealt with the
 network core and its drivers so far, I am not sure which place would be
 the right one to perform the soft reset. To not make things worse, I
 hope people with more insight to network stuff can deliver a suitable
 solution to this problem.
 
 All the best,
 
Wolfram

Hi Wolfram,

I'm not an expert on the FEC driver, but I'll take a look when I get back
home to Calgary.  There has also been other churn in this area and I
need to get myself up to speed.

g.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-07-10 Thread Wolfram Sang
Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
{
[...]
/* on fifo error, soft-reset fec */
if (ievent  (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {

if (net_ratelimit()  (ievent  FEC_IEVENT_RFIFO_ERROR))
dev_warn(dev-dev, FEC_IEVENT_RFIFO_ERROR\n);
if (net_ratelimit()  (ievent  FEC_IEVENT_XFIFO_ERROR))
dev_warn(dev-dev, FEC_IEVENT_XFIFO_ERROR\n);

mpc52xx_fec_reset(dev);

netif_wake_queue(dev);
return IRQ_HANDLED;
}
[...]
}

Calling mpc52xx_fec_reset() from interrupt context is bad, at least
because

a) it calls phy_write, which contains BUG_ON(in_interrupt())
b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
   if the reset was successful (1..50 us)

I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

All the best,

   Wolfram

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

2008-07-10 Thread Grant Likely
On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
 Hello,
 
 today, I was debugging a kernel crash on a board with a MPC5200B using
 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
snip
 I assume the proper thing to do is to set a flag in the ISR and handle
 the soft reset later in some other context. Having never dealt with the
 network core and its drivers so far, I am not sure which place would be
 the right one to perform the soft reset. To not make things worse, I
 hope people with more insight to network stuff can deliver a suitable
 solution to this problem.

Thanks for the bug report.  I'll take a look.

g.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev