subject:"question about softirqs"

Re: question about softirqs

2009-05-13 Thread Thomas Gleixner

On Wed, 13 May 2009, Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> > Andi Kleen wrote:
> > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> > >> Andi Kleen wrote:
> > >>
> > >>> network packets are normally processed by the network packet interrupt's
> > >>> softirq or alternatively in the NAPI poll loop.
> > >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > > 
> > > In this case the next interrupt will also process them. It will just
> > > go more slowly because interrupts limit the work compared to ksoftirqd.
> > 
> > I realize that they will eventually get processed.  My point is that the
> > documentation (in-kernel, online, and in various books) says that
> > softirqs will be processed _on the return from a syscall_. 
> 
> They are. The documentation is correct. 

No, the documentation is wrong for the case that the task, which
raised the softirq and therefor woke up ksoftirqd, has a higher
priority than ksoftirqd. In that case the kernel does _NOT_ schedule
ksoftirqd in the return from syscall path.

And that's all what Chris is pointing out.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt's
> >>> softirq or alternatively in the NAPI poll loop.
> >> If we have a high priority task, ksoftirqd may not get a chance to run.
> > 
> > In this case the next interrupt will also process them. It will just
> > go more slowly because interrupts limit the work compared to ksoftirqd.
> 
> I realize that they will eventually get processed.  My point is that the
> documentation (in-kernel, online, and in various books) says that
> softirqs will be processed _on the return from a syscall_. 

They are. The documentation is correct. 

What might not be all processed is all packets that are in the per CPU
backlog queue when the network softirq runs (for non NAPI, for NAPI that's 
obsolete anyways). That's because there are limits.

Or when new work comes in in parallel it doesn't process it all.

But that's always the case -- no queue is infinite, so you have
always situations where it can drop or delay items.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Chris Friesen

Andi Kleen wrote:
> On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
>> Andi Kleen wrote:
>>
>>> network packets are normally processed by the network packet interrupt's
>>> softirq or alternatively in the NAPI poll loop.
>> If we have a high priority task, ksoftirqd may not get a chance to run.
> 
> In this case the next interrupt will also process them. It will just
> go more slowly because interrupts limit the work compared to ksoftirqd.

I realize that they will eventually get processed.  My point is that the
documentation (in-kernel, online, and in various books) says that
softirqs will be processed _on the return from a syscall_.  As we all
agree, this is not necessarily the case.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> 
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
> 
> If we have a high priority task, ksoftirqd may not get a chance to run.

In this case the next interrupt will also process them. It will just
go more slowly because interrupts limit the work compared to ksoftirqd.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Chris Friesen

Andi Kleen wrote:

> network packets are normally processed by the network packet interrupt's
> softirq or alternatively in the NAPI poll loop.

If we have a high priority task, ksoftirqd may not get a chance to run.

My point is simply that the documentation says that softirqs are
processed on return from a syscall, and this is not necessarily the case.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner  writes:
> 
> >>Err, no. Chris is completely correct:
> >>
> >>if (!in_interrupt())
> >>wakeup_softirqd();
> > 
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy. 
> 
> What about the scenario I raised earlier, where we have incoming network
> packets,

network packets are normally processed by the network packet interrupt's
softirq or alternatively in the NAPI poll loop.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Chris Friesen

Thomas Gleixner wrote:
> On Wed, 13 May 2009, Chris Friesen wrote:

>> As far as I can tell, in this scenario softirqs may not get processed on
>> return from a syscall (contradicting the documentation).  In the worst
>> case, they may not get processed until the next timer tick.
> 
> Right because your high prio tasks prevents that ksoftirqd runs,
> because it can not preempt the high priority task.

Exactly.

I'm suggesting that this point (the idea that softirqs may or may not
get processed on return from syscall depending on relative task
priority) should probably be documented somewhere, because the current
documentation (in the kernel and on the web) doesn't mention it at all.

Maybe I should just submit a patch to
Documentation/DocBook/kernel-hacking.tmpl.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Thomas Gleixner

On Wed, 13 May 2009, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner  writes:
> 
> >>Err, no. Chris is completely correct:
> >>
> >>if (!in_interrupt())
> >>wakeup_softirqd();
> > 
> > Yes you have to wake it up just in case, but it doesn't normally
> > process the data because a normal softirq comes in faster. It's
> > just a safety policy. 
> 
> What about the scenario I raised earlier, where we have incoming network
> packets, no hardware interrupts coming in other than the timer tick, and
> a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
> set?
> 
> As far as I can tell, in this scenario softirqs may not get processed on
> return from a syscall (contradicting the documentation).  In the worst
> case, they may not get processed until the next timer tick.

Right because your high prio tasks prevents that ksoftirqd runs,
because it can not preempt the high priority task.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Chris Friesen

Andi Kleen wrote:
> Thomas Gleixner  writes:

>>Err, no. Chris is completely correct:
>>
>>if (!in_interrupt())
>>  wakeup_softirqd();
> 
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy. 

What about the scenario I raised earlier, where we have incoming network
packets, no hardware interrupts coming in other than the timer tick, and
a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT
set?

As far as I can tell, in this scenario softirqs may not get processed on
return from a syscall (contradicting the documentation).  In the worst
case, they may not get processed until the next timer tick.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Eric Dumazet

Andi Kleen a écrit :
> Thomas Gleixner  writes:
> 
> 
>> Err, no. Chris is completely correct:
>>
>> if (!in_interrupt())
>>  wakeup_softirqd();
> 
> Yes you have to wake it up just in case, but it doesn't normally
> process the data because a normal softirq comes in faster. It's
> just a safety policy. 
> 
> You can check this by checking the accumulated CPU time on your
> ksoftirqs.  Mine are all 0 even on long running systems.
> 

Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethernet 
link.

commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something
(making mpstat and top correctly display softirq on cpu stats),
but apparently we still have a problem to report correct time on processes,
particularly on ksoftirq/x

I have one machine SMP flooded by network frames, CPU0 handling all
the work, inside ksoftirq/0 (napi processing : almost no more hard interrupts 
delivered)

Still, top or ps reports no more than 30% of cpu time used by
ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has no idle 
time.

$ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4
UIDPID  PPID  C STIME TTY  TIME CMD
root 4 2  1 15:35 ?00:00:46 [ksoftirqd/0]
Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001)   05/13/2009  
_i686_

04:45:01 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
04:45:02 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:03 PM00.000.000.000.000.00   99.010.00
0.000.99
04:45:04 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:05 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:06 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:07 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:08 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:09 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:10 PM00.000.000.000.000.00  100.000.00
0.000.00
04:45:11 PM00.000.000.000.000.00  100.000.00
0.000.00
Average:   00.000.000.000.000.00   99.900.00
0.000.10
UIDPID  PPID  C STIME TTY  TIME CMD
root 4 2  1 15:35 ?00:00:49 [ksoftirqd/0]

You can see here time consumed by ksoftirqd/0 suring this 10 seconds time frame 
is *only* 3 seconds.

Therefore, we cannot trust ps, not with current kernel.

# cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0 0 
4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0
4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0 0 
4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0


> The reason Andrea originally added the softirqds was just that
> if you have very softirq intensive workloads they would tie
> up too much CPU time or not make enough process with the default
> "don't loop too often" heuristics. 
> 
>> We can not rely on irqs coming in when the softirq is raised from
> 
> You can't rely on it, but it happens in near all cases.
> 
> -Andi


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

> I have one machine SMP flooded by network frames, CPU0 handling all

Yes that's the case softirqd is supposed to handle. When you 
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.

But most systems (like mine) don't do that.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Thomas Gleixner

On Wed, 13 May 2009, Andi Kleen wrote:

> > "If a soft irq is raised in process context, raise_softirq() in
> > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> 
> softirqd is only used when the softirq runs for too long or when
> there are no suitable irq exits for a long time.
> 
> In normal situations (not excessive time in softirq) they don't
> do anything. 

Err, no. Chris is completely correct:

if (!in_interrupt())
wakeup_softirqd();

We can not rely on irqs coming in when the softirq is raised from
thread context. An irq_exit might be faster to process it than the
scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs
nevertheless. If it finds a softirq pending then it processes them in
it's context and irq_exit calls to softirq are returning right away.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

Thomas Gleixner  writes:

> Err, no. Chris is completely correct:
>
> if (!in_interrupt())
>   wakeup_softirqd();

Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy. 

You can check this by checking the accumulated CPU time on your
ksoftirqs.  Mine are all 0 even on long running systems.

The reason Andrea originally added the softirqds was just that
if you have very softirq intensive workloads they would tie
up too much CPU time or not make enough process with the default
"don't loop too often" heuristics. 

> We can not rely on irqs coming in when the softirq is raised from

You can't rely on it, but it happens in near all cases.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd

softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.

In normal situations (not excessive time in softirq) they don't
do anything. 

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Chris Friesen

Andi Kleen wrote:
> "Chris Friesen"  writes:
> 
>>One of the reasons I brought up this issue is that there is a lot of
>>documentation out there that says "softirqs will be processed on return
>>from a syscall".  The fact that it actually depends on the scheduler
>>parameters of the task issuing the syscall isn't ever mentioned.

> It's not mentioned because it is not currently.

Paul Mackerras explained the current behaviour earlier in the thread
(when it was still on the ppc list).  His explanation agrees with my
exporation of the code.

"If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq.  So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage)."

If the current task is of higher priority, ksoftirqd doesn't get a
chance to run and we don't process softirqs on return from a syscall.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-13 Thread Andi Kleen

"Chris Friesen"  writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall".  The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever mentioned.

It's not mentioned because it is not currently.

However some network TCP RX processing can happen in process context,
which gives you most of the benefit anyways.

> In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
> source still has the following:
>
> Whenever a system call is about to return to userspace, or a
> hardware interrupt handler exits, any 'software interrupts'
> which are marked pending (usually by hardware interrupts) are
> run (kernel/softirq.c).
>
> If anyone is looking at changing this code, it might be good to ensure
> that at least the kernel docs are updated.

So far the code is not changed in mainline. There have been some
proposals only.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Evgeniy Polyakov

Hi.

On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra 
(a.p.zijls...@chello.nl) wrote:
> Wouldn't the even better solution be to get rid of softirqs
> all-together?

And move tasklets into some thread context?

Only if we are ready to fix 7 times rescheduling regressions compared to
kernel threads (work queue actually). At least that's how tasklet
behaved compared to work queue 1.5 years ago in the simplest
and quite naive test where tasklet/work rescheduled iself number of
times:

http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2

-- 
Evgeniy Polyakov
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread David Miller

From: Paul Mackerras 
Date: Wed, 13 May 2009 15:15:34 +1000

> David Miller writes:
> 
>> I fully expected us to be, at this point, talking about putting the
>> pending softirq check back into the trap return path :-/
> 
> Would that actually do any good, in the case where the system has
> decided that ksoftirqd is handling soft irqs at the moment?

Even if ksoftirqd is running, we check and run pending softirqs from
trap return.

Sure, I imagine we could re-enter this "ksoftirq blocked by highprio
thread" situation if we get flooded every single time over and over
again.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Paul Mackerras

David Miller writes:

> I fully expected us to be, at this point, talking about putting the
> pending softirq check back into the trap return path :-/

Would that actually do any good, in the case where the system has
decided that ksoftirqd is handling soft irqs at the moment?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread David Miller

From: Steven Rostedt 
Date: Tue, 12 May 2009 08:20:51 -0400 (EDT)

> I'm going to be playing around with bypassing the net-rx/tx with my 
> network drivers. I'm going to add threaded irqs for my network cards and 
> have the driver threads do the work to get through the tcp/ip stack.
> 
> I'll still keep the softirqs for other cards, but I want to see how fast 
> it speeds things up if I have the driver thread do it.

I think your latency is going to be dreadful.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread David Miller

From: Ingo Molnar 
Date: Tue, 12 May 2009 11:23:48 +0200

>> Wouldn't the even better solution be to get rid of softirqs 
>> all-together?
>> 
>> I see the recent work by Thomas to get threaded interrupts 
>> upstream as a good first step towards that goal, once the RX 
>> processing is moved to a thread (or multiple threads) one can 
>> priorize them in the regular sys_sched_setscheduler() way and its 
>> obvious that a FIFO task above the priority of the network tasks 
>> will have network starvation issues.
> 
> Yeah, that would be "nice". A single IRQ thread plus the process 
> context(s) doing networking might perform well.

Nice for -rt goals, but not for latency.

So we're going to regress in this area again?  I can't see how
that's so desirable, to be honest with you.

The fact that this discussion started about a task with a certain
priority not being able to make forward progress, even though it
was correct coded, just because softirqs are being processed in
a thread context, should be a big red flag that this is a buggered up
design.

I fully expected us to be, at this point, talking about putting the
pending softirq check back into the trap return path :-/
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Chris Friesen

Ingo Molnar wrote:
> * Chris Friesen  wrote:

>>I think I see a possible problem with this. Suppose I have a 
>>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
>>the scenario above, schedule() would re-run the spinning task 
>>rather than ksoftirqd, thus preventing any incoming packets from 
>>being sent up the stack until we get a real hardware 
>>interrupt--which could be a whole jiffy if interrupt mitigation is 
>>enabled in the net device.

>>DaveM pointed out that if we're doing transmits we're likely to 
>>hit local_bh_enable(), which would process the softirq work.  
>>However, I think we may still have a problem in the above rx-only 
>>scenario--or is it too contrived to matter?

> This could occur, and the problem is really that task priorities do 
> not extend across softirq work processing.
> 
> This could occur in ordinary SCHED_OTHER tasks as well, if the 
> softirq is bounced to ksoftirqd - which it only should be if there's 
> serious softirq overload - or, as you describe it above, if the 
> softirq is raised in process context:

One of the reasons I brought up this issue is that there is a lot of
documentation out there that says "softirqs will be processed on return
from a syscall".  The fact that it actually depends on the scheduler
parameters of the task issuing the syscall isn't ever mentioned.

In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel
source still has the following:

Whenever a system call is about to return to userspace, or a
hardware interrupt handler exits, any 'software interrupts'
which are marked pending (usually by hardware interrupts) are
run (kernel/softirq.c).

If anyone is looking at changing this code, it might be good to ensure
that at least the kernel docs are updated.

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Steven Rostedt

On Tue, 12 May 2009, Peter Zijlstra wrote:

> On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> > 
> > Yeah, that would be "nice". A single IRQ thread plus the process 
> > context(s) doing networking might perform well.
> > 
> > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
> > sure about - it's extra context-switching cost.
> 
> Sure, that was implied by the getting rid of softirqs ;-), on -rt we
> currently suffer this hardirq/softirq thread ping-pong, it sucks.

I'm going to be playing around with bypassing the net-rx/tx with my 
network drivers. I'm going to add threaded irqs for my network cards and 
have the driver threads do the work to get through the tcp/ip stack.

I'll still keep the softirqs for other cards, but I want to see how fast 
it speeds things up if I have the driver thread do it.

-- Steve

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Peter Zijlstra

On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote:
> 
> Yeah, that would be "nice". A single IRQ thread plus the process 
> context(s) doing networking might perform well.
> 
> Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
> sure about - it's extra context-switching cost.

Sure, that was implied by the getting rid of softirqs ;-), on -rt we
currently suffer this hardirq/softirq thread ping-pong, it sucks.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Peter Zijlstra

On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> * Chris Friesen  wrote:
> 
> > This started out as a thread on the ppc list, but on the 
> > suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> > list a bit.
> > 
> > Currently, if a softirq is raised in process context the 
> > TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> > run the scheduler, expecting it to switch to ksoftirqd to handle 
> > the softirqd processing.
> > 
> > I think I see a possible problem with this. Suppose I have a 
> > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> > the scenario above, schedule() would re-run the spinning task 
> > rather than ksoftirqd, thus preventing any incoming packets from 
> > being sent up the stack until we get a real hardware 
> > interrupt--which could be a whole jiffy if interrupt mitigation is 
> > enabled in the net device.
> 
> TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
> SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
> will occur.
> 
> > DaveM pointed out that if we're doing transmits we're likely to 
> > hit local_bh_enable(), which would process the softirq work.  
> > However, I think we may still have a problem in the above rx-only 
> > scenario--or is it too contrived to matter?
> 
> This could occur, and the problem is really that task priorities do 
> not extend across softirq work processing.
> 
> This could occur in ordinary SCHED_OTHER tasks as well, if the 
> softirq is bounced to ksoftirqd - which it only should be if there's 
> serious softirq overload - or, as you describe it above, if the 
> softirq is raised in process context:
> 
> if (!in_interrupt())
> wakeup_softirqd();
> 
> that's not really clean. We look into eliminating process context 
> use of raise_softirq_irqsoff(). Such code sequence:
> 
>   local_irq_save(flags);
>   ...
>   raise_softirq_irqsoff(nr);
>   ...
>   local_irq_restore(flags);
> 
> should be converted to something like:
> 
>   local_irq_save(flags);
>   ...
>   raise_softirq_irqsoff(nr);
>   ...
>   local_irq_restore(flags);
>   recheck_softirqs();
> 
> If someone does not do proper local_bh_disable()/enable() sequences 
> for micro-optimization reasons, then push the check to after the 
> critcal section - and dont cause extra reschedules by waking up 
> ksoftirqd. raise_softirq_irqsoff() will also be faster.


Wouldn't the even better solution be to get rid of softirqs
all-together?

I see the recent work by Thomas to get threaded interrupts upstream as a
good first step towards that goal, once the RX processing is moved to a
thread (or multiple threads) one can priorize them in the regular
sys_sched_setscheduler() way and its obvious that a FIFO task above the
priority of the network tasks will have network starvation issues.



___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote:
> > * Chris Friesen  wrote:
> > 
> > > This started out as a thread on the ppc list, but on the 
> > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> > > list a bit.
> > > 
> > > Currently, if a softirq is raised in process context the 
> > > TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> > > run the scheduler, expecting it to switch to ksoftirqd to handle 
> > > the softirqd processing.
> > > 
> > > I think I see a possible problem with this. Suppose I have a 
> > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> > > the scenario above, schedule() would re-run the spinning task 
> > > rather than ksoftirqd, thus preventing any incoming packets from 
> > > being sent up the stack until we get a real hardware 
> > > interrupt--which could be a whole jiffy if interrupt mitigation is 
> > > enabled in the net device.
> > 
> > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
> > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
> > will occur.
> > 
> > > DaveM pointed out that if we're doing transmits we're likely to 
> > > hit local_bh_enable(), which would process the softirq work.  
> > > However, I think we may still have a problem in the above rx-only 
> > > scenario--or is it too contrived to matter?
> > 
> > This could occur, and the problem is really that task priorities do 
> > not extend across softirq work processing.
> > 
> > This could occur in ordinary SCHED_OTHER tasks as well, if the 
> > softirq is bounced to ksoftirqd - which it only should be if there's 
> > serious softirq overload - or, as you describe it above, if the 
> > softirq is raised in process context:
> > 
> > if (!in_interrupt())
> > wakeup_softirqd();
> > 
> > that's not really clean. We look into eliminating process context 
> > use of raise_softirq_irqsoff(). Such code sequence:
> > 
> > local_irq_save(flags);
> > ...
> > raise_softirq_irqsoff(nr);
> > ...
> > local_irq_restore(flags);
> > 
> > should be converted to something like:
> > 
> > local_irq_save(flags);
> > ...
> > raise_softirq_irqsoff(nr);
> > ...
> > local_irq_restore(flags);
> > recheck_softirqs();
> > 
> > If someone does not do proper local_bh_disable()/enable() sequences 
> > for micro-optimization reasons, then push the check to after the 
> > critcal section - and dont cause extra reschedules by waking up 
> > ksoftirqd. raise_softirq_irqsoff() will also be faster.
> 
> 
> Wouldn't the even better solution be to get rid of softirqs 
> all-together?
> 
> I see the recent work by Thomas to get threaded interrupts 
> upstream as a good first step towards that goal, once the RX 
> processing is moved to a thread (or multiple threads) one can 
> priorize them in the regular sys_sched_setscheduler() way and its 
> obvious that a FIFO task above the priority of the network tasks 
> will have network starvation issues.

Yeah, that would be "nice". A single IRQ thread plus the process 
context(s) doing networking might perform well.

Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so 
sure about - it's extra context-switching cost.

Btw, i noticed that using scheduling for work (packet, etc.) flow 
distribution standardizes and evens out the behavior of workloads. 
Softirq scheduling is really quite random currently. We have a 
random processing loop-limit in the core code and various batching 
and work-limit controls at individual usage sites. We sometimes 
piggyback to ksoftirqd. It's far easier to keep performance in check 
when things are more predictable.

But this is not an easy endevour, and performance regressions have 
to be expected and addressed if they occur. There can be random 
packet queuing details in networking drivers that just happen to 
work fine now, and might work worse with a kernel thread in place. 
So there has to be broad buy-in for the concept, and a concerted 
effort to eliminate softirq processing and most of hardirq 
processing by pushing those two elements into a single hardirq 
thread (and the rest into process context).

Not for the faint hearted. Nor is it recommended to be done without 
a good layer of asbestos.

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-12 Thread Ingo Molnar


* Chris Friesen  wrote:

> This started out as a thread on the ppc list, but on the 
> suggestion of DaveM and Paul Mackerras I'm expanding the receiver 
> list a bit.
> 
> Currently, if a softirq is raised in process context the 
> TIF_RESCHED_PENDING flag gets set and on return to userspace we 
> run the scheduler, expecting it to switch to ksoftirqd to handle 
> the softirqd processing.
> 
> I think I see a possible problem with this. Suppose I have a 
> SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under 
> the scenario above, schedule() would re-run the spinning task 
> rather than ksoftirqd, thus preventing any incoming packets from 
> being sent up the stack until we get a real hardware 
> interrupt--which could be a whole jiffy if interrupt mitigation is 
> enabled in the net device.

TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a 
SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing 
will occur.

> DaveM pointed out that if we're doing transmits we're likely to 
> hit local_bh_enable(), which would process the softirq work.  
> However, I think we may still have a problem in the above rx-only 
> scenario--or is it too contrived to matter?

This could occur, and the problem is really that task priorities do 
not extend across softirq work processing.

This could occur in ordinary SCHED_OTHER tasks as well, if the 
softirq is bounced to ksoftirqd - which it only should be if there's 
serious softirq overload - or, as you describe it above, if the 
softirq is raised in process context:

if (!in_interrupt())
wakeup_softirqd();

that's not really clean. We look into eliminating process context 
use of raise_softirq_irqsoff(). Such code sequence:

local_irq_save(flags);
...
raise_softirq_irqsoff(nr);
...
local_irq_restore(flags);

should be converted to something like:

local_irq_save(flags);
...
raise_softirq_irqsoff(nr);
...
local_irq_restore(flags);
recheck_softirqs();

If someone does not do proper local_bh_disable()/enable() sequences 
for micro-optimization reasons, then push the check to after the 
critcal section - and dont cause extra reschedules by waking up 
ksoftirqd. raise_softirq_irqsoff() will also be faster.

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-11 Thread Chris Friesen


This started out as a thread on the ppc list, but on the suggestion of
DaveM and Paul Mackerras I'm expanding the receiver list a bit.

Currently, if a softirq is raised in process context the
TIF_RESCHED_PENDING flag gets set and on return to userspace we run the
scheduler, expecting it to switch to ksoftirqd to handle the softirqd
processing.

I think I see a possible problem with this. Suppose I have a SCHED_FIFO
task spinning on recvmsg() with MSG_DONTWAIT set. Under the scenario
above, schedule() would re-run the spinning task rather than ksoftirqd,
thus preventing any incoming packets from being sent up the stack until
we get a real hardware interrupt--which could be a whole jiffy if
interrupt mitigation is enabled in the net device.

DaveM pointed out that if we're doing transmits we're likely to hit
local_bh_enable(), which would process the softirq work.  However, I
think we may still have a problem in the above rx-only scenario--or is
it too contrived to matter?

Thanks,

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-11 Thread Paul Mackerras

Chris Friesen writes:

> Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT
> set (and maybe doing other stuff if there are no messages). In this
> case, schedule() would re-run the spinning task rather than running
> ksoftirqd. This could prevent any incoming packets from actually being
> sent up the stack until we get a real hardware interrupt--which could be
> a whole jiffy if interrupt mitigation is enabled in the net device.

I suggest you ask Ingo Molnar about that.

> (And maybe longer if NOHZ is enabled.)

We still have a timer interrupt every jiffy when stuff is running; we
only turn off the timer interrupts when idle.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-11 Thread David Miller

From: "Chris Friesen" 
Date: Mon, 11 May 2009 12:25:54 -0600

> David Miller wrote:
> 
>> You know, for networking over loopback (one of the only real cases
>> that even matters, if we get a hard interrupt then the return from
>> that would process any softints), we probably make out just fine
>> anyways.  As long as we hit a local_bh_enable() (and in the return
>> path from device transmit that's exceedingly likely as all of the
>> networking locking is BH safe) we'll run the softints from that and
>> thus long before we get to syscall return.
> 
> What about the issue I raised earlier?  (I don't think you were copied
> at that point.)

I'm sure all of the networking experts on linuxppc-dev will have
an answer.

And yes that was sarcasm :-)  You need to ask this on netdev or similar
list.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-11 Thread Chris Friesen

David Miller wrote:

> You know, for networking over loopback (one of the only real cases
> that even matters, if we get a hard interrupt then the return from
> that would process any softints), we probably make out just fine
> anyways.  As long as we hit a local_bh_enable() (and in the return
> path from device transmit that's exceedingly likely as all of the
> networking locking is BH safe) we'll run the softints from that and
> thus long before we get to syscall return.

What about the issue I raised earlier?  (I don't think you were copied
at that point.)

Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT
set (and maybe doing other stuff if there are no messages). In this
case, schedule() would re-run the spinning task rather than running
ksoftirqd. This could prevent any incoming packets from actually being
sent up the stack until we get a real hardware interrupt--which could be
a whole jiffy if interrupt mitigation is enabled in the net device.
(And maybe longer if NOHZ is enabled.)

Chris

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread David Miller

From: Paul Mackerras 
Date: Sat, 9 May 2009 13:31:23 +1000

> David Miller writes:
> 
>> Grumble, when did that happen :-(
> 
> Ages ago (i.e. before the switch to git :).  Talk to Ingo, it's his
> doing IIRC.

I'll first do some data mining before coming to any (further)
conclusions :-)

>> That's horrible for latency compared to handling it directly
>> in the trap return path.
> 
> Actually, I don't know why we ever let there be softirqs pending when
> we're in process context.  I would think that we should just call
> do_softirq immediately if we raise a softirq when !in_interrupt().
> But I might be missing some subtlety.

I bet it was a non-starter before IRQ stacks.

It does seem like a good idea to me.

You know, for networking over loopback (one of the only real cases
that even matters, if we get a hard interrupt then the return from
that would process any softints), we probably make out just fine
anyways.  As long as we hit a local_bh_enable() (and in the return
path from device transmit that's exceedingly likely as all of the
networking locking is BH safe) we'll run the softints from that and
thus long before we get to syscall return.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread Paul Mackerras

David Miller writes:

> Grumble, when did that happen :-(

Ages ago (i.e. before the switch to git :).  Talk to Ingo, it's his
doing IIRC.

> That's horrible for latency compared to handling it directly
> in the trap return path.

Actually, I don't know why we ever let there be softirqs pending when
we're in process context.  I would think that we should just call
do_softirq immediately if we raise a softirq when !in_interrupt().
But I might be missing some subtlety.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread Benjamin Herrenschmidt


> > The soft irq stuff is pretty much all generic code these days, except
> > for the code to switch to the softirq stack.
> 
> Grumble, when did that happen :-(
> 
> That's horrible for latency compared to handling it directly
> in the trap return path.

If it is indeed such a problem, it would be reasonably easy to
handle it in the return-to-userspace path around the same place
where we test for pending signals (isn't what we used to do
anyway ?)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread Chris Friesen


Paul Mackerras wrote:


If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq.  So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage).


I think I see a problem with this.  Suppose I have a SCHED_FIFO task 
spinning on recvmsg() with MSG_DONTWAIT set (and maybe doing other stuff 
if there are no messages).  Under the scenario you described, schedule() 
would re-run the spinning task, no?  This could prevent any incoming 
packets from actually being sent up the stack until we get a real 
hardware interrupt--which could be a whole jiffy if interrupt mitigation 
is enabled in the net device.


Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread David Miller

From: Paul Mackerras 
Date: Sat, 9 May 2009 09:34:29 +1000

> If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
> runs soon to process the soft irq.  So what would happen is that we
> would see the TIF_RESCHED_PENDING flag on the current task in the
> syscall exit path and call schedule() which would switch to ksoftirqd
> to process the soft irq (if it hasn't already been processed by that
> stage).
> 
> If the soft irq is raised in interrupt context, then the soft irq gets
> run via the do_softirq() call in irq_exit(), as you saw.
> 
> The soft irq stuff is pretty much all generic code these days, except
> for the code to switch to the softirq stack.

Grumble, when did that happen :-(

That's horrible for latency compared to handling it directly
in the trap return path.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread Paul Mackerras

Chris Friesen writes:

> I'm trying to figure out where exactly softirqs are called on return 
> from a syscall in 64-bit powerpc.  I can see where they get called for a 
> normal interrupt via the irq_exit() path, but not for syscalls.

If a soft irq is raised in process context, raise_softirq() in
kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
runs soon to process the soft irq.  So what would happen is that we
would see the TIF_RESCHED_PENDING flag on the current task in the
syscall exit path and call schedule() which would switch to ksoftirqd
to process the soft irq (if it hasn't already been processed by that
stage).

If the soft irq is raised in interrupt context, then the soft irq gets
run via the do_softirq() call in irq_exit(), as you saw.

The soft irq stuff is pretty much all generic code these days, except
for the code to switch to the softirq stack.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: question about softirqs

2009-05-08 Thread David Miller

From: "Chris Friesen" 
Date: Fri, 08 May 2009 16:51:25 -0600

> I'm trying to figure out where exactly softirqs are called on return
> from a syscall in 64-bit powerpc.  I can see where they get called for
> a normal interrupt via the irq_exit() path, but not for syscalls.
> 
> I'm sure I'm missing something obvious...can anyone help?

I can't see where it does this either, strange.

That would be a very terrible bug if it's not invoking
pending softirqs before return from system calls.

Although, it might be happening via some clever side effect
of how the software managed hardware interrupt stuff works
on powerpc.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

question about softirqs

2009-05-08 Thread Chris Friesen


Hi all,

I'm trying to figure out where exactly softirqs are called on return 
from a syscall in 64-bit powerpc.  I can see where they get called for a 
normal interrupt via the irq_exit() path, but not for syscalls.


I'm sure I'm missing something obvious...can anyone help?

Thanks,

Chris
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

39 matches

Mail list logo