Re: [uClinux-dev] Re: MCF5282 network stall problem

Greg Ungerer Tue, 26 Feb 2008 21:45:34 -0800

Hi Juan,

Juan Alberto Muñoz Susin wrote:

I have similar problems with uClinux-dist-20070130 with patch

20070130-200701001.

I think the origin of the problem is not a FEC RX ring overflow due to
packets that FEC insert in the ring. I see that a receive FEC interrupt is
beggining its execution before the execution of the previous one is
finished. It makes that the driver pointer to the receiver ring takes an
incorrect value and lost synchronization of FEC and driver. Sequence
example:


        Interrupt N                             Interrupt N+1
        ===========                             =============
        bdp = fep->cur_rx
                                                bdp = fep->cur_rx

                                                bdp += 1
                                                bdp += 1
                                                save bdp (initial bdp+2)
        bdp += 1
        save bdp (initial bdp+1)

The MCF5282 hardware interrupt controller avoids these problems setting
automaticaly the interrupt level mask of the microprocessor status register
to the level of the interrupt that has been served. But something in the
interrupt scheduller of uClinux 2.6.x set the interrupt level mask to 0 when
the FEC interrupt service routine begins its execution.

The problem can be avoided adding the flag IRQF_DISABLED to request_irq:

        if (request_irq(b+idp->irq, fec_enet_interrupt, IRQF_DISABLED,
idp->name, dev) != 0)

So, the interrupt scheduller of uClinux 2.6.x set the interrupt level mask
of the microprocessor status register to 7 when the FEC interrupt service
routine begins its execution. It avoids that a FEC RX interrupt can begin to
execute before the finish of the execution of the previous one.


I think this is in fact the real culprit of all the recently reported
problems with the FEC driver.

The problem is a little larger than just the re-entrance of RX
interrupts. If you look at the interrupt handling for the current
FEC driver it vectors all interrupt types through the common
fec_enet_interrupt() code. And it handles all types, so a new
TX interrupt could interrupt an old RX interrupt that was actually
in the middle of processing TX interrupts!

On the other hand, setting the IRQF_DISABLED flag avoids the execution of
all maskable system interrupts during the FEC interrupt routines. Due to
this fact, I think that it isn't the best way to solve the problem, but it´s
a way to verify the origin of the problem. I think the fix proposed by
Claude, acknowledging the RX event after processing the ring, is better than

setting IRQF_DISABLED flag.


Removing the RX buffer interrupt is probably ok. But the bigger
problem is the common interrupt handler, and that needs the interrupts
to be IRQF_DISABLED to fix.

Another solution may be to setup the IRQ's to only handle their
type (so RX interrupt types call direct to fec_enet_rx() and TX
interrupts call direct to fec_enet_tx()). This would then not
need IRQF_DISABLED either.

In my opinion, the best way to solve this problem and to avoid the
posibility of similar problems in other drivers, will be keep the interrupt
level mask of the microprocessor status register to the real level of
interrupt that has been proccessing. But I don´t know if it is possible.


That is really what IRQF_DISABLED is meant to do. So it is the right
way to flag that this interrupt must be run with interrupts still
disabled.

As a side note, I suspect this may have come to light after the
changes I made to the interrupt sub-system a couple of revisions
ago. It made the ColdFire interrupt handling consistent with other
Linux architectures - and it fixed the problem that interrupts
where always disabled for the entire duration of the handler.

So for now I plan on applying the IRQF_DISABLED flag to the
request_irq() calls as the primary fix for this problem.
Can all involved please test and report goodness or otherwise?

Claude: if you want to look at cleaning up the use of FEC_ENET_TXB/RXB,
        usage I would apply that as a separate patch as well.

Regards
Greg

-----Mensaje original-----
De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
En nombre de Phil Wilshire
Enviado el: domingo, 17 de febrero de 2008 20:20
Para: uClinux development list
Asunto: Re: [uClinux-dev] Re: MCF5282 network stall problem

Hi Claude

Thank you very much for the details.
They are much appreciated.

I'll be taking a look at the 5272 code I have and see if we cant improve
that too.

Oh that everyone would explain their patches so well.

Best Regards
   Phil WIlshire

Claude wrote:
Hi again,

Marco Cavallini wrote:
 >Hi Claude,
>I had similar problems on a MCF5282, which seems being solved with
modifications to cache setting code, I'd like to try your patch as
soon  >as I have some spare time (I hope this week).
>Feel free to contact me privately to discuss your patch and otherthings >I've done to work this board properly.
I had already disabled the CACHE completely but FEC problem persisted.
In order to let people having older driver versions I'd like to make asummary of the changes here:
First, since FEC driver only processes buffers holding entire packets,it is not necessary to enable interrupts for RXB and TXB events (rxbuffer and tx buffer), but only for RXF and TXF events (rx and txframes). This may produce unnecessary interrupts to CPU. Interruptsare enabled infec_enet_init() and fec_restart() functions. If you take a look to thepatch, only necessary int's are enabled:
  fecp->fec_imask = (FEC_ENET_TXF | /*FEC_ENET_TXB |*/
            FEC_ENET_RXF /*| FEC_ENET_RXB */ | FEC_ENET_MII);
Commented lines have been left for convenience. They should beremoved, however, to clean up code.
On the other hand, other changes affected the interrupt service routine.In my opinion, it's fine that, in order to process all packets waitingin the queue, the ring should be entirely processed when an interruptis raised when receiving a packet (see the loop in fec_enet_rx(),which is called from the ISR). But I don't agree with making a loop inthe ISR to handle all different events (tx, rx and mii events) thatare waiting, since they should be acknowledge in a different manner.
The old FEC driver does the following, in the ISR:

while ("pending interrupts"){
    acknowledge all interrupts
    if RX event, then process RX event
    if TX event, then process TX event
    if MII event, then process MII event }

The fixed code (the patch's one) does this, instead:

if ("pending interrupts") {
if RX event, then process RX event and then acknowledge RX event(clearing the corresponding flag)if TX event, then process TX event and then acknowledge TX event(clearing the corresponding flag)if MII event, then process MII event and then acknowledge MIIevent (clearing the corresponding flag) }
I saw that the first solution produced, after some time, a lost ofsynchronization when processing the ring. It seems that acknowledgingthe RX event before processing the ring makes the FEC to allowentering more packets till produce a ring overflow.
Finally, the "pending interrupts" condition above, was implemented asfollows (in the old FEC driver):
    ((int_events = fecp->ievent) != 0)

The fixed code does:
    ((int_events = (fecp->ievent & (FEC_ENET_RXF | FEC_ENET_TXF |
FEC_ENET_MII))) != 0)
This avoids the ISR (after receiving a RX,TX or MII event) to keeprunning if other flag in the fecp->ievent register is raised for anyreason (error flag, for example). However, the replacement of "while"for "if" implicitly avoids it.
Well, I hope this solution could help someone. At least it has helped me!!

Claude










_______________________________________________
uClinux-dev mailing list
[email protected]
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by [email protected] To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev
_______________________________________________
uClinux-dev mailing list
[email protected]
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by [email protected] To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

_______________________________________________
uClinux-dev mailing list
[email protected]
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by [email protected]
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


--
------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     [EMAIL PROTECTED]
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com
_______________________________________________
uClinux-dev mailing list
[email protected]
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by [email protected]
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

Re: [uClinux-dev] Re: MCF5282 network stall problem

Reply via email to