Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-09-20 Thread Jan-Bernd Themann
Hi Milton,

sorry for the delayed answer, was on vacation.

linux-kernel-ow...@vger.kernel.org wrote on 21.05.2010 11:18:01:
 linux-kernel-ow...@vger.kernel.org

 On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner  wrote:
  On Thu, 20 May 2010, Michael Ellerman wrote:
   On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
On Wed, 19 May 2010, Darren Hart wrote:
   
 On 05/18/2010 06:25 PM, Michael Ellerman wrote:
  On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
  
  The result of the discussion about two years ago on this was
that we
  needed a custom flow handler for XICS on RT.

 I'm still not clear on why the ultimate solution wasn't to
 have XICS report
 edge triggered as edge triggered. Probably some complexity
 of the entire power
 stack that I am ignorant of.

  Apart from the issue of loosing interrupts there is also
 the fact that
  masking on the XICS requires an RTAS call which takes a global
lock.
   
Right, I'd love to avoid that but with real level interrupts we'd
run
into an interrupt storm. Though another solution would be to issue
the
EOI after the threaded handler finished, that'd work as well, but
needs testing.
  
   Yeah I think that was the idea for the custom flow handler. We'd
reset
   the processor priority so we can take other interrupts (which the EOI
   usually does for you), then do the actual EOI after the handler
   finished.
 
  That only works when the card does not issue new interrupts until the
  EOI happens. If the EOI is only relevant for the interrupt controller,
  then you are going to lose any edge which comes in before the EOI as
  well.

 Well, the real MSIs have an extra bit to allow the eoi to dally behind
 the mmio on another path and that should cover this race when the irq
 is left enabled.

 Jan-Bernd HEA has that change, right?

I don't now. We never hit problems so we did not look very deep into this
area.
We probably have to talk to the HEA HW developers to be sure.

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-21 Thread Milton Miller
On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner  wrote:
 On Thu, 20 May 2010, Michael Ellerman wrote:
  On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
   On Wed, 19 May 2010, Darren Hart wrote:
  
On 05/18/2010 06:25 PM, Michael Ellerman wrote:
 On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
 
 The result of the discussion about two years ago on this was that we
 needed a custom flow handler for XICS on RT.
   
I'm still not clear on why the ultimate solution wasn't to have XICS 
report
edge triggered as edge triggered. Probably some complexity of the 
entire power
stack that I am ignorant of.
   
 Apart from the issue of loosing interrupts there is also the fact that
 masking on the XICS requires an RTAS call which takes a global lock.
  
   Right, I'd love to avoid that but with real level interrupts we'd run
   into an interrupt storm. Though another solution would be to issue the
   EOI after the threaded handler finished, that'd work as well, but
   needs testing.
 
  Yeah I think that was the idea for the custom flow handler. We'd reset
  the processor priority so we can take other interrupts (which the EOI
  usually does for you), then do the actual EOI after the handler
  finished.
 
 That only works when the card does not issue new interrupts until the
 EOI happens. If the EOI is only relevant for the interrupt controller,
 then you are going to lose any edge which comes in before the EOI as
 well.

Well, the real MSIs have an extra bit to allow the eoi to dally behind
the mmio on another path and that should cover this race when the irq
is left enabled.

Jan-Bernd HEA has that change, right?

milton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-21 Thread Milton Miller
On Thu May 20 at 11:28:36 EST in 2010, Michael Ellerman wrote:
 On Wed, 2010-05-19 at 07:16 -0700, Darren Hart wrote:
  On 05/18/2010 06:25 PM, Michael Ellerman wrote:
   On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
On 05/18/2010 02:52 PM, Brian King wrote:
 Is IRQF_NODELAY something specific to the RT kernel?
 I don't see it in mainline...
Yes, it basically says don't make this handler threaded.
  
   That is a good fix for EHEA, but the threaded handling is still broken
   for anything else that is edge triggered isn't it?
  
  No, I don't believe so. Edge triggered interrupts that are reported as 
  edge triggered interrupts will use the edge handler (which was the 
  approach Sebastien took to make this work back in 2008). Since XICS 
  presents all interrupts as Level Triggered, they use the fasteoi path.
 
 But that's the point, no interrupts on XICS are reported as edge, even
 if they are actually edge somewhere deep in the hardware. I don't think
 we have any reliable way to determine what is what.
 

The platform doesn't tell us this information.  The driver might know
but we don't need this information.

   The result of the discussion about two years ago on this was that we
   needed a custom flow handler for XICS on RT.
  
  I'm still not clear on why the ultimate solution wasn't to have XICS 
  report edge triggered as edge triggered. Probably some complexity of the 
  entire power stack that I am ignorant of.
 
 I'm not really sure either, but I think it's a case of a leaky
 abstraction on the part of the hypervisor. Edge interrupts behave as
 level as long as you handle the irq before EOI, but if you mask they
 don't. But Milton's the expert on that.
 

More like the hardware actually converts them.  They are all handled
with the same presentation.

The XICS interrupt system is highly scalable and distributed in
implementation, with multiple priority delivery and unlimited nesting.

First, a few features and description:

The hardware has two bits of storage for every LSI interrupt source in the
system to say that interrupt is idle, pending, or was rejected and will
be retried later.  The hardware also stores a destination and delivery
priority, settable by software.  The destination can be a specific cpu
thread, or a global distribution queue of all (online) threads (in the
partition).  While the hardware used to have 256 priority levels available
(255 usable, one for cpu not interrupted), some bits have been stolen
and today we only guarantee 16 levels are avalabile to the OS (15 for
delivery and one for source disabled / cpu not processing any interrupt).
[The current linux kernel delivers all device interrupts at one level
but IPIs at a higher level.  To avoid overflowing the irq stack we don't
allow device interrupts while processing any external interrupt.]

The interrupt presentation layer likewise scales, with a seperate instance
for each cpu thread in the system.  A single IPI source per thread is
part of this instance; when a cpu wants to interrupt another it writes
the priority of the IPI to that cpus presentation logic.

When an interrupt is signaled, the hardware checks the state of that
interrupt, and if previously idle it sends an interrupt request with its
source number and priority towards the programmed destination, either a
specific cpu thread or the global queue of all processors in the system.
If that cpu is already handling an interrupt of the same or higher (lower
valued) priority either the incoming interrupt will be passed to the next
cpu (if the destnation was global) or it will be rejected and the isu will
update its state and try again later.  If the cpu had a prior interrupt
pending at a lower priority then the old interrupt will be rejected back
to its isu instead.

The normal behavior is a load to a presentation logic register causes
the interrupt source number and previous priority of the cpu to be
delivered to the cpu and the cpu priority to be raised to that of the
incoming interrupt.  The external interrupt indication to the cpu is
removed.  At this point the presentation hardware forgets all history
of this interrupt.  A store to the same register resets the priority of
the cpu (which would naturally be the level before it was interrupted
if it stores the value loaded) and sends an EOI (end of interrupt) to
the interrupt source specified in the write.  This resets the two bits
from pending to idle.

The software is allowed to reset the cpu priority to allow other
interrupts of equal (or even lower) priority to be presented independently
of creating the EOI for this source.  However, until software creates
an EOI for a specific source it will not be presented until the machine
is reset.  The only rule is you can't raise your priority (which might
have to reject a pending interrupt) when you send create (write) the EOI.

A cpu can also change its priority to tell the hardware to reject
this interrupt (possibly representing to 

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Jan-Bernd Themann
Hi,


Michael Ellerman mich...@ellerman.id.au wrote on 20.05.2010 03:34:08:


 Subject:

 Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

 On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
  On Wed, 19 May 2010, Thomas Gleixner wrote:
I'm still not clear on why the ultimate solution wasn't to
 have XICS report
edge triggered as edge triggered. Probably some complexity of
 the entire power
stack that I am ignorant of.
   
 Apart from the issue of loosing interrupts there is also thefact
that
 masking on the XICS requires an RTAS call which takes a global
lock.
  
   Right, I'd love to avoid that but with real level interrupts we'd run
   into an interrupt storm. Though another solution would be to issue
the
   EOI after the threaded handler finished, that'd work as well, but
   needs testing.
 
  Thought more about that. The case at hand (ehea) is nasty:
 
  The driver does _NOT_ disable the rx interrupt in the card in the rx
  interrupt handler - for whatever reason.

 Yeah I saw that, but I don't know why it's written that way. Perhaps
 Jan-Bernd or Doug will chime in and enlighten us? :)

From our perspective there is no need to disable interrupts for the RX side
as
the chip does not fire further interrupts until we tell the chip to do so
for a
particular queue. We have multiple receive queues with an own interrupt
each
so that the interrupts can arrive on multiple CPUs in parallel.
Interrupts are enabled again when we leave the NAPI Poll function for the
corresponding
receive queue.

Michael, does this answer your question?

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Thomas Gleixner
On Thu, 20 May 2010, Jan-Bernd Themann wrote:
   Thought more about that. The case at hand (ehea) is nasty:
  
   The driver does _NOT_ disable the rx interrupt in the card in the rx
   interrupt handler - for whatever reason.
 
  Yeah I saw that, but I don't know why it's written that way. Perhaps
  Jan-Bernd or Doug will chime in and enlighten us? :)
 
 From our perspective there is no need to disable interrupts for the
 RX side as the chip does not fire further interrupts until we tell
 the chip to do so for a particular queue. We have multiple receive

The traces tell a different story though:

ehea_recv_irq_handler()
  napi_reschedule()
eoi()
ehea_poll()
  ...
  ehea_recv_irq_handler() ???
napi_reschedule()
  ...
  napi_complete()

Can't tell whether you can see the same behaviour in mainline, but I
don't see a reason why not.

 queues with an own interrupt each so that the interrupts can arrive
 on multiple CPUs in parallel.  Interrupts are enabled again when we
 leave the NAPI Poll function for the corresponding receive queue.

I can't see a piece of code which does that, but that's probably just
lack of detailed hardware knowledge on my side.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Thomas Gleixner
On Thu, 20 May 2010, Michael Ellerman wrote:
 On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
  On Wed, 19 May 2010, Darren Hart wrote:
  
   On 05/18/2010 06:25 PM, Michael Ellerman wrote:
On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
 
The result of the discussion about two years ago on this was that we
needed a custom flow handler for XICS on RT.
   
   I'm still not clear on why the ultimate solution wasn't to have XICS 
   report
   edge triggered as edge triggered. Probably some complexity of the entire 
   power
   stack that I am ignorant of.
   
Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.
  
  Right, I'd love to avoid that but with real level interrupts we'd run
  into an interrupt storm. Though another solution would be to issue the
  EOI after the threaded handler finished, that'd work as well, but
  needs testing.
 
 Yeah I think that was the idea for the custom flow handler. We'd reset
 the processor priority so we can take other interrupts (which the EOI
 usually does for you), then do the actual EOI after the handler
 finished.

That only works when the card does not issue new interrupts until the
EOI happens. If the EOI is only relevant for the interrupt controller,
then you are going to lose any edge which comes in before the EOI as
well.

Thanks,

tglx

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Jan-Bernd Themann

Hi Thomas

 Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

 On Thu, 20 May 2010, Jan-Bernd Themann wrote:
Thought more about that. The case at hand (ehea) is nasty:
   
The driver does _NOT_ disable the rx interrupt in the card in the
rx
interrupt handler - for whatever reason.
  
   Yeah I saw that, but I don't know why it's written that way. Perhaps
   Jan-Bernd or Doug will chime in and enlighten us? :)
 
  From our perspective there is no need to disable interrupts for the
  RX side as the chip does not fire further interrupts until we tell
  the chip to do so for a particular queue. We have multiple receive

 The traces tell a different story though:

 ehea_recv_irq_handler()
   napi_reschedule()
 eoi()
 ehea_poll()
   ...
   ehea_recv_irq_handler() ???
 napi_reschedule()
   ...
   napi_complete()

 Can't tell whether you can see the same behaviour in mainline, but I
 don't see a reason why not.

Is this the same interrupt we are seeing here, or do we see a second other
interrupt popping up on the same CPU? As I said, with multiple receive
queues
(if enabled) you can have multiple interrupts in parallel.

Pleaes check if multiple queues are enabled. The following module parameter
is used for that:

MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
);

you should also see the number of used HEA interrupts in /proc/interrupts



  queues with an own interrupt each so that the interrupts can arrive
  on multiple CPUs in parallel.  Interrupts are enabled again when we
  leave the NAPI Poll function for the corresponding receive queue.

 I can't see a piece of code which does that, but that's probably just
 lack of detailed hardware knowledge on my side.

If you mean the re-enable piece of code, it is not very obvious, you are
right.
Interrupts are only generated if a particular register for our completion
queues
is written. We do this in the following line:

  ehea_reset_cq_ep(pr-recv_cq);
  ehea_reset_cq_ep(pr-send_cq);
  ehea_reset_cq_n1(pr-recv_cq);
  ehea_reset_cq_n1(pr-send_cq);

So this is in a way an indirect way to ask for interrupts when new
completions were
written to memory. We don't really disable/enable interrupts on the HEA
chip itself.

I think there are some mechanisms build in the HEA chip that should prevent
that
interrupts don't get lost. But that is something that is / was completely
hidden from
us, so my skill is very limited there.

If more details are needed here we should involve the PHYP guys + eHEA HW
guys if not
already done. Did anyone already talk to them?

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Thomas Gleixner
Jan-Bernd,

On Thu, 20 May 2010, Jan-Bernd Themann wrote:

 
 Hi Thomas
 
  Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
 
  On Thu, 20 May 2010, Jan-Bernd Themann wrote:
 Thought more about that. The case at hand (ehea) is nasty:

 The driver does _NOT_ disable the rx interrupt in the card in the
 rx
 interrupt handler - for whatever reason.
   
Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)
  
   From our perspective there is no need to disable interrupts for the
   RX side as the chip does not fire further interrupts until we tell
   the chip to do so for a particular queue. We have multiple receive
 
  The traces tell a different story though:
 
  ehea_recv_irq_handler()
napi_reschedule()
  eoi()
  ehea_poll()
...
ehea_recv_irq_handler() ???
  napi_reschedule()
...
napi_complete()
 
  Can't tell whether you can see the same behaviour in mainline, but I
  don't see a reason why not.
 
 Is this the same interrupt we are seeing here, or do we see a second other
 interrupt popping up on the same CPU? As I said, with multiple receive
 queues
 (if enabled) you can have multiple interrupts in parallel.

According to the traces it's the very same interrupt number.

 Pleaes check if multiple queues are enabled. The following module parameter
 is used for that:
 
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
 );
 
 you should also see the number of used HEA interrupts in /proc/interrupts

I leave that for Will and Darren, they have the hardware :)

 
   queues with an own interrupt each so that the interrupts can arrive
   on multiple CPUs in parallel.  Interrupts are enabled again when we
   leave the NAPI Poll function for the corresponding receive queue.
 
  I can't see a piece of code which does that, but that's probably just
  lack of detailed hardware knowledge on my side.
 
 If you mean the re-enable piece of code, it is not very obvious,
 you are right.  Interrupts are only generated if a particular
 register for our completion queues is written. We do this in the
 following line:
 
   ehea_reset_cq_ep(pr-recv_cq);
   ehea_reset_cq_ep(pr-send_cq);
   ehea_reset_cq_n1(pr-recv_cq);
   ehea_reset_cq_n1(pr-send_cq);
 
 So this is in a way an indirect way to ask for interrupts when new
 completions were written to memory. We don't really disable/enable
 interrupts on the HEA chip itself.

Ah, ok. That's after the napi_complete which looks correct.
 
 I think there are some mechanisms build in the HEA chip that should
 prevent that interrupts don't get lost. But that is something that
 is / was completely hidden from us, so my skill is very limited
 there.

 If more details are needed here we should involve the PHYP guys +
 eHEA HW guys if not already done. Did anyone already talk to them?

Will or Darren might have, but lets gather more information first
before we rack their nerves :)

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Nivedita Singhvi

Thomas Gleixner wrote:


Pleaes check if multiple queues are enabled. The following module parameter
is used for that:

MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
);

you should also see the number of used HEA interrupts in /proc/interrupts


I leave that for Will and Darren, they have the hardware :)


 16: 477477 ... XICS  Level IPI
 17:129 ... XICS  Level hvc_console
 18:  0 ... XICS  Level RAS_EPOW
 33: 139232 ... XICS  Level mlx4_core
256:  3 ... XICS  Level ehea_neq
259:  0 ... XICS  Level eth0-aff
260:2082153 ... XICS  Level eth0-queue0
289: 119166 ... XICS  Level ipr
305:  0 ... XICS  Level ohci_hcd:usb2
306:  0 ... XICS  Level ohci_hcd:usb3
307:2389839 ... XICS  Level ehci_hcd:usb1


Nope, multiple rx queues not enabled.

thanks,
Nivedita
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Darren Hart

On 05/20/2010 01:14 AM, Thomas Gleixner wrote:

On Thu, 20 May 2010, Jan-Bernd Themann wrote:

Thought more about that. The case at hand (ehea) is nasty:

The driver does _NOT_ disable the rx interrupt in the card in the rx
interrupt handler - for whatever reason.


Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)


 From our perspective there is no need to disable interrupts for the
RX side as the chip does not fire further interrupts until we tell
the chip to do so for a particular queue. We have multiple receive


The traces tell a different story though:

 ehea_recv_irq_handler()
   napi_reschedule()
 eoi()
 ehea_poll()
   ...
   ehea_recv_irq_handler() ???
 napi_reschedule()
   ...
   napi_complete()

Can't tell whether you can see the same behaviour in mainline, but I
don't see a reason why not.


I was going to suggest that because these are threaded handlers, perhaps 
they are rescheduled on a different CPU and then receive the interrupt 
for the other CPU/queue that Jan was mentioning.


But, the handlers are affined if I remember correctly, and we aren't 
running with multiple receive queues. So, we're back to the same 
question, why are we seeing another irq. It comes in before 
napi_complete() and therefor before the ehea_reset*() block of calls 
which do the equivalent of re-enabling interrupts.


--
Darren




queues with an own interrupt each so that the interrupts can arrive
on multiple CPUs in parallel.  Interrupts are enabled again when we
leave the NAPI Poll function for the corresponding receive queue.


I can't see a piece of code which does that, but that's probably just
lack of detailed hardware knowledge on my side.

Thanks,

tglx



--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Thomas Gleixner
On Thu, 20 May 2010, Darren Hart wrote:

 On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
  On Thu, 20 May 2010, Jan-Bernd Themann wrote:
 Thought more about that. The case at hand (ehea) is nasty:
 
 The driver does _NOT_ disable the rx interrupt in the card in the rx
 interrupt handler - for whatever reason.

Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)
   
From our perspective there is no need to disable interrupts for the
   RX side as the chip does not fire further interrupts until we tell
   the chip to do so for a particular queue. We have multiple receive
  
  The traces tell a different story though:
  
   ehea_recv_irq_handler()
 napi_reschedule()
   eoi()
   ehea_poll()
 ...
 ehea_recv_irq_handler() ???
   napi_reschedule()
 ...
 napi_complete()
  
  Can't tell whether you can see the same behaviour in mainline, but I
  don't see a reason why not.
 
 I was going to suggest that because these are threaded handlers, perhaps they
 are rescheduled on a different CPU and then receive the interrupt for the
 other CPU/queue that Jan was mentioning.
 
 But, the handlers are affined if I remember correctly, and we aren't running
 with multiple receive queues. So, we're back to the same question, why are we
 seeing another irq. It comes in before napi_complete() and therefor before the
 ehea_reset*() block of calls which do the equivalent of re-enabling
 interrupts.

Can you slap a few trace points into that driver with a stock mainline
kernel and verify that ?

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Will Schmidt
On Thu, 2010-05-20 at 11:05 +0200, Jan-Bernd Themann wrote:
 Hi Thomas
 
  Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
 
  On Thu, 20 May 2010, Jan-Bernd Themann wrote:
 Thought more about that. The case at hand (ehea) is nasty:

 The driver does _NOT_ disable the rx interrupt in the card in the
 rx
 interrupt handler - for whatever reason.
   
Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)
  
   From our perspective there is no need to disable interrupts for the
   RX side as the chip does not fire further interrupts until we tell
   the chip to do so for a particular queue. We have multiple receive
 
  The traces tell a different story though:
 
  ehea_recv_irq_handler()
napi_reschedule()
  eoi()
  ehea_poll()
...
ehea_recv_irq_handler() ???
  napi_reschedule()
...
napi_complete()
 
  Can't tell whether you can see the same behaviour in mainline, but I
  don't see a reason why not.
 
 Is this the same interrupt we are seeing here, or do we see a second other
 interrupt popping up on the same CPU? As I said, with multiple receive
 queues
 (if enabled) you can have multiple interrupts in parallel.

Same interrupt number (260).  Per the trace data, the first
ehea_recv_irq_handler (at 117.904525) was on cpu 0, the second (at
117.904689) was on cpu 1.



...-2180  [000]  117.904525: .ehea_recv_irq_handler: ENTER 0 c000e8bd08b0
...-2180  [000]  117.904527: .ehea_recv_irq_handler: napi_reschedule 
COMpleted  c000e8bd08b0
...-2180  [000]  117.904528: .ehea_recv_irq_handler: EXIT reschedule(1) 1 
c000e8bd08b0
...-2180  [000]  117.904529: .xics_unmask_irq: xics: unmask virq 260 772
...-2180  [000]  117.904547: .xics_unmask_irq: xics: unmask virq pre-xive 260 
772 0 status:0 ff
...-2180  [000]  117.904586: .xics_unmask_irq: xics: unmask virq post-xive 
260 772 0 D:11416 status:0 5
...-2180  [000]  117.904602: .handle_fasteoi_irq: 260 8004000
...-2180  [000]  117.904603: .xics_mask_irq: xics: mask virq 260 772
...-2180  [000]  117.904634: .xics_mask_real_irq: xics: before: mask_real 772 
status:0 5
...-2180  [000]  117.904668: .xics_mask_real_irq: xics: after: mask_real 772 
status:0 ff
...-2180  [000]  117.904669: .handle_fasteoi_irq: pre-action:  260 8004100
...-2180  [000]  117.904671: .handle_fasteoi_irq: post-action: 260 8004100
...-2180  [000]  117.904672: .handle_fasteoi_irq: exit.  260 8004000
...-7 [000]  117.904681: .ehea_poll:  ENTER  1  c000e8bd08b0 
poll_counter:0 force:0
...-7 [000]  117.904683: .ehea_proc_rwqes: ehea_check_cqe 0
...-2180  [001]  117.904689: .ehea_recv_irq_handler: ENTER 1 c000e8bd08b0
...-7 [000]  117.904690: .ehea_proc_rwqes: ehea_check_cqe 0
...-2180  [001]  117.904691: .ehea_recv_irq_handler: napi_reschedule 
inCOMplete  c000e8bd08b0
...-2180  [001]  117.904692: .ehea_recv_irq_handler: EXIT reschedule(0) 1 
c000e8bd08b0
...-2180  [001]  117.904694: .xics_unmask_irq: xics: unmask virq 260 772
...-7 [000]  117.904702: .ehea_refill_rq2: ehea_refill_rq2
...-7 [000]  117.904703: .ehea_refill_rq_def: ehea_refill_rq_def
...-7 [000]  117.904704: .ehea_refill_rq3: ehea_refill_rq3
...-7 [000]  117.904705: .ehea_refill_rq_def: ehea_refill_rq_def
...-7 [000]  117.904706: .napi_complete: napi_complete: ENTER  state: 1  
c000e8bd08b0
...-7 [000]  117.904707: .napi_complete: napi_complete: EXIT  state: 0  
c000e8bd08b0
...-7 [000]  117.904710: .ehea_poll: EXIT  !cqe rx(2).   0  
c000e8bd08b0
...-2180  [001]  117.904719: .xics_unmask_irq: xics: unmask virq pre-xive 260 
772 0 status:0 ff
...-2180  [001]  117.904761: .xics_unmask_irq: xics: unmask virq post-xive 
260 772 0 D:12705 status:0 5




 Pleaes check if multiple queues are enabled. The following module parameter
 is used for that:
 
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
 );

No module parameters were used, should be plain old defaults.  

 
 you should also see the number of used HEA interrupts in /proc/interrupts
 

256:  100000   00   XICS  Level ehea_neq
259:  000000   00   XICS  Level eth0-aff
260: 36196500000   00   XICS  Level 
eth0-queue0




 
 
   queues with an own interrupt each so that the interrupts can arrive
   on multiple CPUs in parallel.  Interrupts are enabled again when we
   leave the NAPI Poll function for the corresponding receive queue.
 
  I can't see a piece of code which does that, but that's probably just
  lack of detailed hardware knowledge on my side.
 
 If you mean the re-enable piece of code, it is not very obvious, you are
 right.
 Interrupts are only generated if a particular register for our completion
 queues
 is written. We do this in the following line

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Will Schmidt
On Thu, 2010-05-20 at 16:45 +0200, Thomas Gleixner wrote:
 On Thu, 20 May 2010, Darren Hart wrote:
 
  On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
   On Thu, 20 May 2010, Jan-Bernd Themann wrote:
  Thought more about that. The case at hand (ehea) is nasty:
  
  The driver does _NOT_ disable the rx interrupt in the card in the rx
  interrupt handler - for whatever reason.
 
 Yeah I saw that, but I don't know why it's written that way. Perhaps
 Jan-Bernd or Doug will chime in and enlighten us? :)

 From our perspective there is no need to disable interrupts for the
RX side as the chip does not fire further interrupts until we tell
the chip to do so for a particular queue. We have multiple receive
   
   The traces tell a different story though:
   
ehea_recv_irq_handler()
  napi_reschedule()
eoi()
ehea_poll()
  ...
  ehea_recv_irq_handler() ???
napi_reschedule()
  ...
  napi_complete()
   
   Can't tell whether you can see the same behaviour in mainline, but I
   don't see a reason why not.
  
  I was going to suggest that because these are threaded handlers, perhaps 
  they
  are rescheduled on a different CPU and then receive the interrupt for the
  other CPU/queue that Jan was mentioning.
  
  But, the handlers are affined if I remember correctly, and we aren't running
  with multiple receive queues. So, we're back to the same question, why are 
  we
  seeing another irq. It comes in before napi_complete() and therefor before 
  the
  ehea_reset*() block of calls which do the equivalent of re-enabling
  interrupts.
 
 Can you slap a few trace points into that driver with a stock mainline
 kernel and verify that ?

2.6.33.4 (non-rt kernel) with similar trace_printk hooks in place...
Most data lumps look like so:

  idle-0 [000]  1097.685337: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.685339: .handle_fasteoi_irq: pre-action 260 
4100
  idle-0 [000]  1097.685339: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.685340: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.685341: .ehea_recv_irq_handler: 
napi_schedule Calling __napi_schedule ... c000e8980700
  idle-0 [000]  1097.685342: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.685343: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.685344: .handle_fasteoi_irq: EXIT. 260 4000
  idle-0 [000]  1097.685346: .ehea_poll:  ENTER c000e8980700
  idle-0 [000]  1097.685352: .napi_complete: napi_complete: ENTER 
c000e8980700
  idle-0 [000]  1097.685352: .napi_complete: napi_complete: EXIT 
c000e8980700
  idle-0 [000]  1097.685355: .ehea_poll:  EXIT !cqe rx(1) 
c000e8980700

But I did see one like this, which shows a ehea_recv_irq_handler ENTER
within a ehea_poll ENTER.   (which I think is what you were expecting,
or wanted to verify..)   


  idle-0 [000]  1097.616261: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.616262: .handle_fasteoi_irq: pre-action 260 
4100
*  idle-0 [000]  1097.616263: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.616264: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.616265: .ehea_recv_irq_handler: 
napi_schedule Calling __napi_schedule ... c000e8980700
  idle-0 [000]  1097.616265: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.616266: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.616268: .handle_fasteoi_irq: EXIT. 260 4000
*  idle-0 [000]  1097.616270: .ehea_poll:  ENTER c000e8980700
  idle-0 [000]  1097.616282: .handle_fasteoi_irq: ENTER 260 4000
  idle-0 [000]  1097.616283: .handle_fasteoi_irq: pre-action 260 
4100
*  idle-0 [000]  1097.616284: .ehea_recv_irq_handler:  ENTER  
c000e8980700
  idle-0 [000]  1097.616285: .ehea_recv_irq_handler: 
napi_schedule ... c000e8980700
  idle-0 [000]  1097.616286: .ehea_recv_irq_handler: 
napi_schedule NOT Calling __napi_schedule... c000e8980700
  idle-0 [000]  1097.616286: .ehea_recv_irq_handler:  EXIT 
c000e8980700
  idle-0 [000]  1097.616287: .handle_fasteoi_irq: post-action 260 
4100
  idle-0 [000]  1097.616289: .handle_fasteoi_irq: EXIT. 260 4000
  idle-0 [000]  1097.616299: .napi_complete: napi_complete: ENTER 
c000e8980700
  idle-0 [000]  1097.616300: .napi_complete: napi_complete: EXIT 
c000e8980700
  idle-0 [000]  1097.616302: .ehea_poll: napi_reschedule 
COMpleted c000e8980700
  idle-0 [000]  

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Darren Hart

On 05/18/2010 06:25 PM, Michael Ellerman wrote:

On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:

On 05/18/2010 02:52 PM, Brian King wrote:

Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
mainline...


Yes, it basically says don't make this handler threaded.


That is a good fix for EHEA, but the threaded handling is still broken
for anything else that is edge triggered isn't it?


No, I don't believe so. Edge triggered interrupts that are reported as 
edge triggered interrupts will use the edge handler (which was the 
approach Sebastien took to make this work back in 2008). Since XICS 
presents all interrupts as Level Triggered, they use the fasteoi path.




The result of the discussion about two years ago on this was that we
needed a custom flow handler for XICS on RT.


I'm still not clear on why the ultimate solution wasn't to have XICS 
report edge triggered as edge triggered. Probably some complexity of the 
entire power stack that I am ignorant of.



Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.


Right, one of may reasons why we felt this was the right fix. The other 
is that there is no real additional overhead in running this as 
non-threaded since the receive handler is so short (just napi_schedule()).


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Thomas Gleixner
On Wed, 19 May 2010, Darren Hart wrote:

 On 05/18/2010 06:25 PM, Michael Ellerman wrote:
  On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
   On 05/18/2010 02:52 PM, Brian King wrote:
Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
mainline...
   
   Yes, it basically says don't make this handler threaded.
  
  That is a good fix for EHEA, but the threaded handling is still broken
  for anything else that is edge triggered isn't it?
 
 No, I don't believe so. Edge triggered interrupts that are reported as edge
 triggered interrupts will use the edge handler (which was the approach
 Sebastien took to make this work back in 2008). Since XICS presents all
 interrupts as Level Triggered, they use the fasteoi path.

I wonder whether the XICS interrupts which are edge type can be
identified from the irq_desc-flags. Then we could avoid the masking
for those in the fasteoi_handler in general.

  
  The result of the discussion about two years ago on this was that we
  needed a custom flow handler for XICS on RT.
 
 I'm still not clear on why the ultimate solution wasn't to have XICS report
 edge triggered as edge triggered. Probably some complexity of the entire power
 stack that I am ignorant of.
 
  Apart from the issue of loosing interrupts there is also the fact that
  masking on the XICS requires an RTAS call which takes a global lock.

Right, I'd love to avoid that but with real level interrupts we'd run
into an interrupt storm. Though another solution would be to issue the
EOI after the threaded handler finished, that'd work as well, but
needs testing.

 Right, one of may reasons why we felt this was the right fix. The other is
 that there is no real additional overhead in running this as non-threaded
 since the receive handler is so short (just napi_schedule()).

Yes, in the case at hand it's the right thing to do, as we avoid
another wakeup/context switch.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Thomas Gleixner
On Wed, 19 May 2010, Thomas Gleixner wrote:
  I'm still not clear on why the ultimate solution wasn't to have XICS report
  edge triggered as edge triggered. Probably some complexity of the entire 
  power
  stack that I am ignorant of.
  
   Apart from the issue of loosing interrupts there is also the fact that
   masking on the XICS requires an RTAS call which takes a global lock.
 
 Right, I'd love to avoid that but with real level interrupts we'd run
 into an interrupt storm. Though another solution would be to issue the
 EOI after the threaded handler finished, that'd work as well, but
 needs testing.

Thought more about that. The case at hand (ehea) is nasty:

The driver does _NOT_ disable the rx interrupt in the card in the rx
interrupt handler - for whatever reason.

 So even in mainline you get repeated rx interrupts when packets
 arrive while napi is processing the poll, which is suboptimal at
 least. In fact it is counterproductive as the whole purpose of NAPI
 is to _NOT_ get interrupts for consecutive incoming packets while the
 poll is active.

Most of the other network drivers do:

  rx_irq()
disable rx interrupts on card
napi_schedule()

Now when the napi poll is done (no more packets available) then the
driver reenables the rx interrupt on the card.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Michael Ellerman
On Wed, 2010-05-19 at 07:16 -0700, Darren Hart wrote:
 On 05/18/2010 06:25 PM, Michael Ellerman wrote:
  On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
  On 05/18/2010 02:52 PM, Brian King wrote:
  Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
  mainline...
 
  Yes, it basically says don't make this handler threaded.
 
  That is a good fix for EHEA, but the threaded handling is still broken
  for anything else that is edge triggered isn't it?
 
 No, I don't believe so. Edge triggered interrupts that are reported as 
 edge triggered interrupts will use the edge handler (which was the 
 approach Sebastien took to make this work back in 2008). Since XICS 
 presents all interrupts as Level Triggered, they use the fasteoi path.

But that's the point, no interrupts on XICS are reported as edge, even
if they are actually edge somewhere deep in the hardware. I don't think
we have any reliable way to determine what is what.

  The result of the discussion about two years ago on this was that we
  needed a custom flow handler for XICS on RT.
 
 I'm still not clear on why the ultimate solution wasn't to have XICS 
 report edge triggered as edge triggered. Probably some complexity of the 
 entire power stack that I am ignorant of.

I'm not really sure either, but I think it's a case of a leaky
abstraction on the part of the hypervisor. Edge interrupts behave as
level as long as you handle the irq before EOI, but if you mask they
don't. But Milton's the expert on that.

  Apart from the issue of loosing interrupts there is also the fact that
  masking on the XICS requires an RTAS call which takes a global lock.
 
 Right, one of may reasons why we felt this was the right fix. The other 
 is that there is no real additional overhead in running this as 
 non-threaded since the receive handler is so short (just napi_schedule()).

True. It's not a fix in general though. I'm worried that we're going to
see the exact same bug for MSI(-X) interrupts.

cheers




signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Michael Ellerman
On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
 On Wed, 19 May 2010, Darren Hart wrote:
 
  On 05/18/2010 06:25 PM, Michael Ellerman wrote:
   On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
On 05/18/2010 02:52 PM, Brian King wrote:
 Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
 mainline...

Yes, it basically says don't make this handler threaded.
   
   That is a good fix for EHEA, but the threaded handling is still broken
   for anything else that is edge triggered isn't it?
  
  No, I don't believe so. Edge triggered interrupts that are reported as edge
  triggered interrupts will use the edge handler (which was the approach
  Sebastien took to make this work back in 2008). Since XICS presents all
  interrupts as Level Triggered, they use the fasteoi path.
 
 I wonder whether the XICS interrupts which are edge type can be
 identified from the irq_desc-flags. Then we could avoid the masking
 for those in the fasteoi_handler in general.

I'm not sure they can be. I know on other similar HW we can detect LSI
vs MSI, but that's without the HV in the equation.

   The result of the discussion about two years ago on this was that we
   needed a custom flow handler for XICS on RT.
  
  I'm still not clear on why the ultimate solution wasn't to have XICS report
  edge triggered as edge triggered. Probably some complexity of the entire 
  power
  stack that I am ignorant of.
  
   Apart from the issue of loosing interrupts there is also the fact that
   masking on the XICS requires an RTAS call which takes a global lock.
 
 Right, I'd love to avoid that but with real level interrupts we'd run
 into an interrupt storm. Though another solution would be to issue the
 EOI after the threaded handler finished, that'd work as well, but
 needs testing.

Yeah I think that was the idea for the custom flow handler. We'd reset
the processor priority so we can take other interrupts (which the EOI
usually does for you), then do the actual EOI after the handler
finished.

cheers


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-19 Thread Michael Ellerman
On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
 On Wed, 19 May 2010, Thomas Gleixner wrote:
   I'm still not clear on why the ultimate solution wasn't to have XICS 
   report
   edge triggered as edge triggered. Probably some complexity of the entire 
   power
   stack that I am ignorant of.
   
Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.
  
  Right, I'd love to avoid that but with real level interrupts we'd run
  into an interrupt storm. Though another solution would be to issue the
  EOI after the threaded handler finished, that'd work as well, but
  needs testing.
 
 Thought more about that. The case at hand (ehea) is nasty:
 
 The driver does _NOT_ disable the rx interrupt in the card in the rx
 interrupt handler - for whatever reason.

Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)

cheers




signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-18 Thread Brian King
Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
mainline...

-Brian


On 05/18/2010 04:33 PM, dvh...@linux.vnet.ibm.com wrote:
From ad81664794e33d785f533c5edee37aaba20dd92d Mon Sep 17 00:00:00 2001
 From: Darren Hart dvh...@us.ibm.com
 Date: Tue, 18 May 2010 11:07:13 -0700
 Subject: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
 
 The underlying hardware is edge triggered but presented by XICS as level
 triggered. The edge triggered interrupts are not reissued after masking. This
 is not a problem in mainline which does not mask the interrupt (relying on the
 EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the
 interrupt, and can lose interrupts that occurred while masked, resulting in a
 hung ethernet interface.
 
 The receive handler simply calls napi_schedule(), as such, there is no
 significant additional overhead in making this non-threaded, since we either
 wakeup the threaded irq handler to call napi_schedule(), or just call
 napi_schedule() directly to wakeup the softirqs.  As the receive handler is
 lockless, there is no need to convert any of the ehea spinlock_t's to
 atomic_spinlock_t's.
 
 Without this patch, a simple scp file copy loop would fail quickly (usually
 seconds). We have over two hours of sustained scp activity with the patch
 applied.
 
 Credit goes to Will Schmidt for lots of instrumentation and tracing which
 clarified the scenario and to Thomas Gleixner for the incredibly simple
 solution.
 
 Signed-off-by: Darren Hart dvh...@us.ibm.com
 Acked-by: Will Schmidt will_schm...@vnet.ibm.com
 Cc: Thomas Gleixner t...@linuxtronix.de
 Cc: Jan-Bernd Themann them...@de.ibm.com
 Cc: Nivedita Singhvi n...@us.ibm.com
 Cc: Brian King bjki...@us.ibm.com
 Cc: Michael Ellerman eller...@au1.ibm.com
 Cc: Doug Maxey doug.ma...@us.ibm.com
 ---
  drivers/net/ehea/ehea_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
 index 977c3d3..2c53df2 100644
 --- a/drivers/net/ehea/ehea_main.c
 +++ b/drivers/net/ehea/ehea_main.c
 @@ -1263,7 +1263,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
%s-queue%d, dev-name, i);
   ret = ibmebus_request_irq(pr-eq-attr.ist1,
 ehea_recv_irq_handler,
 -   IRQF_DISABLED, pr-int_send_name,
 +   IRQF_DISABLED | IRQF_NODELAY, 
 pr-int_send_name,
 pr);
   if (ret) {
   ehea_error(failed registering irq for ehea_queue 


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-18 Thread Nivedita Singhvi

Brian King wrote:

Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
mainline...


Yep, this is RT only.

thanks,
Nivedita
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-18 Thread Darren Hart

On 05/18/2010 02:52 PM, Brian King wrote:

Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
mainline...


Yes, it basically says don't make this handler threaded.

--
Darren



-Brian


On 05/18/2010 04:33 PM, dvh...@linux.vnet.ibm.com wrote:

 From ad81664794e33d785f533c5edee37aaba20dd92d Mon Sep 17 00:00:00 2001
From: Darren Hartdvh...@us.ibm.com
Date: Tue, 18 May 2010 11:07:13 -0700
Subject: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

The underlying hardware is edge triggered but presented by XICS as level
triggered. The edge triggered interrupts are not reissued after masking. This
is not a problem in mainline which does not mask the interrupt (relying on the
EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the
interrupt, and can lose interrupts that occurred while masked, resulting in a
hung ethernet interface.

The receive handler simply calls napi_schedule(), as such, there is no
significant additional overhead in making this non-threaded, since we either
wakeup the threaded irq handler to call napi_schedule(), or just call
napi_schedule() directly to wakeup the softirqs.  As the receive handler is
lockless, there is no need to convert any of the ehea spinlock_t's to
atomic_spinlock_t's.

Without this patch, a simple scp file copy loop would fail quickly (usually
seconds). We have over two hours of sustained scp activity with the patch
applied.

Credit goes to Will Schmidt for lots of instrumentation and tracing which
clarified the scenario and to Thomas Gleixner for the incredibly simple
solution.

Signed-off-by: Darren Hartdvh...@us.ibm.com
Acked-by: Will Schmidtwill_schm...@vnet.ibm.com
Cc: Thomas Gleixnert...@linuxtronix.de
Cc: Jan-Bernd Themannthem...@de.ibm.com
Cc: Nivedita Singhvin...@us.ibm.com
Cc: Brian Kingbjki...@us.ibm.com
Cc: Michael Ellermaneller...@au1.ibm.com
Cc: Doug Maxeydoug.ma...@us.ibm.com
---
  drivers/net/ehea/ehea_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 977c3d3..2c53df2 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1263,7 +1263,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
 %s-queue%d, dev-name, i);
ret = ibmebus_request_irq(pr-eq-attr.ist1,
  ehea_recv_irq_handler,
- IRQF_DISABLED, pr-int_send_name,
+ IRQF_DISABLED | IRQF_NODELAY, 
pr-int_send_name,
  pr);
if (ret) {
ehea_error(failed registering irq for ehea_queue 






--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-18 Thread Michael Ellerman
On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
 On 05/18/2010 02:52 PM, Brian King wrote:
  Is IRQF_NODELAY something specific to the RT kernel? I don't see it in 
  mainline...
 
 Yes, it basically says don't make this handler threaded.

That is a good fix for EHEA, but the threaded handling is still broken
for anything else that is edge triggered isn't it?

The result of the discussion about two years ago on this was that we
needed a custom flow handler for XICS on RT.

Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.

cheers




signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev