Re: question about softirqs
On Wed, 13 May 2009, Andi Kleen wrote: > On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote: > > Andi Kleen wrote: > > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > > >> Andi Kleen wrote: > > >> > > >>> network packets are normally processed by the network packet interrupt's > > >>> softirq or alternatively in the NAPI poll loop. > > >> If we have a high priority task, ksoftirqd may not get a chance to run. > > > > > > In this case the next interrupt will also process them. It will just > > > go more slowly because interrupts limit the work compared to ksoftirqd. > > > > I realize that they will eventually get processed. My point is that the > > documentation (in-kernel, online, and in various books) says that > > softirqs will be processed _on the return from a syscall_. > > They are. The documentation is correct. No, the documentation is wrong for the case that the task, which raised the softirq and therefor woke up ksoftirqd, has a higher priority than ksoftirqd. In that case the kernel does _NOT_ schedule ksoftirqd in the return from syscall path. And that's all what Chris is pointing out. Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > >> Andi Kleen wrote: > >> > >>> network packets are normally processed by the network packet interrupt's > >>> softirq or alternatively in the NAPI poll loop. > >> If we have a high priority task, ksoftirqd may not get a chance to run. > > > > In this case the next interrupt will also process them. It will just > > go more slowly because interrupts limit the work compared to ksoftirqd. > > I realize that they will eventually get processed. My point is that the > documentation (in-kernel, online, and in various books) says that > softirqs will be processed _on the return from a syscall_. They are. The documentation is correct. What might not be all processed is all packets that are in the per CPU backlog queue when the network softirq runs (for non NAPI, for NAPI that's obsolete anyways). That's because there are limits. Or when new work comes in in parallel it doesn't process it all. But that's always the case -- no queue is infinite, so you have always situations where it can drop or delay items. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Andi Kleen wrote: > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: >> Andi Kleen wrote: >> >>> network packets are normally processed by the network packet interrupt's >>> softirq or alternatively in the NAPI poll loop. >> If we have a high priority task, ksoftirqd may not get a chance to run. > > In this case the next interrupt will also process them. It will just > go more slowly because interrupts limit the work compared to ksoftirqd. I realize that they will eventually get processed. My point is that the documentation (in-kernel, online, and in various books) says that softirqs will be processed _on the return from a syscall_. As we all agree, this is not necessarily the case. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > > network packets are normally processed by the network packet interrupt's > > softirq or alternatively in the NAPI poll loop. > > If we have a high priority task, ksoftirqd may not get a chance to run. In this case the next interrupt will also process them. It will just go more slowly because interrupts limit the work compared to ksoftirqd. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Andi Kleen wrote: > network packets are normally processed by the network packet interrupt's > softirq or alternatively in the NAPI poll loop. If we have a high priority task, ksoftirqd may not get a chance to run. My point is simply that the documentation says that softirqs are processed on return from a syscall, and this is not necessarily the case. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > Thomas Gleixner writes: > > >>Err, no. Chris is completely correct: > >> > >>if (!in_interrupt()) > >>wakeup_softirqd(); > > > > Yes you have to wake it up just in case, but it doesn't normally > > process the data because a normal softirq comes in faster. It's > > just a safety policy. > > What about the scenario I raised earlier, where we have incoming network > packets, network packets are normally processed by the network packet interrupt's softirq or alternatively in the NAPI poll loop. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Thomas Gleixner wrote: > On Wed, 13 May 2009, Chris Friesen wrote: >> As far as I can tell, in this scenario softirqs may not get processed on >> return from a syscall (contradicting the documentation). In the worst >> case, they may not get processed until the next timer tick. > > Right because your high prio tasks prevents that ksoftirqd runs, > because it can not preempt the high priority task. Exactly. I'm suggesting that this point (the idea that softirqs may or may not get processed on return from syscall depending on relative task priority) should probably be documented somewhere, because the current documentation (in the kernel and on the web) doesn't mention it at all. Maybe I should just submit a patch to Documentation/DocBook/kernel-hacking.tmpl. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Wed, 13 May 2009, Chris Friesen wrote: > Andi Kleen wrote: > > Thomas Gleixner writes: > > >>Err, no. Chris is completely correct: > >> > >>if (!in_interrupt()) > >>wakeup_softirqd(); > > > > Yes you have to wake it up just in case, but it doesn't normally > > process the data because a normal softirq comes in faster. It's > > just a safety policy. > > What about the scenario I raised earlier, where we have incoming network > packets, no hardware interrupts coming in other than the timer tick, and > a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT > set? > > As far as I can tell, in this scenario softirqs may not get processed on > return from a syscall (contradicting the documentation). In the worst > case, they may not get processed until the next timer tick. Right because your high prio tasks prevents that ksoftirqd runs, because it can not preempt the high priority task. Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Andi Kleen wrote: > Thomas Gleixner writes: >>Err, no. Chris is completely correct: >> >>if (!in_interrupt()) >> wakeup_softirqd(); > > Yes you have to wake it up just in case, but it doesn't normally > process the data because a normal softirq comes in faster. It's > just a safety policy. What about the scenario I raised earlier, where we have incoming network packets, no hardware interrupts coming in other than the timer tick, and a high-priority userspace app is spinning on recvmsg() with MSG_DONTWAIT set? As far as I can tell, in this scenario softirqs may not get processed on return from a syscall (contradicting the documentation). In the worst case, they may not get processed until the next timer tick. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Andi Kleen a écrit : > Thomas Gleixner writes: > > >> Err, no. Chris is completely correct: >> >> if (!in_interrupt()) >> wakeup_softirqd(); > > Yes you have to wake it up just in case, but it doesn't normally > process the data because a normal softirq comes in faster. It's > just a safety policy. > > You can check this by checking the accumulated CPU time on your > ksoftirqs. Mine are all 0 even on long running systems. > Then its a bug Andi. Its quite easy to trigger ksoftirqd with a Gb ethernet link. commit f5f293a4e3d0a0c52cec31de6762c95050156516 corrected something (making mpstat and top correctly display softirq on cpu stats), but apparently we still have a problem to report correct time on processes, particularly on ksoftirq/x I have one machine SMP flooded by network frames, CPU0 handling all the work, inside ksoftirq/0 (napi processing : almost no more hard interrupts delivered) Still, top or ps reports no more than 30% of cpu time used by ksoftirqd, while this cpu only runs ksoftirqd/0 (100% in sirq), and has no idle time. $ps -fp 4 ; mpstat -P 0 1 10 ; ps -fp 4 UIDPID PPID C STIME TTY TIME CMD root 4 2 1 15:35 ?00:00:46 [ksoftirqd/0] Linux 2.6.30-rc5-tip-01595-g6f75dad-dirty (svivoipvnx001) 05/13/2009 _i686_ 04:45:01 PM CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 04:45:02 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:03 PM00.000.000.000.000.00 99.010.00 0.000.99 04:45:04 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:05 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:06 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:07 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:08 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:09 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:10 PM00.000.000.000.000.00 100.000.00 0.000.00 04:45:11 PM00.000.000.000.000.00 100.000.00 0.000.00 Average: 00.000.000.000.000.00 99.900.00 0.000.10 UIDPID PPID C STIME TTY TIME CMD root 4 2 1 15:35 ?00:00:49 [ksoftirqd/0] You can see here time consumed by ksoftirqd/0 suring this 10 seconds time frame is *only* 3 seconds. Therefore, we cannot trust ps, not with current kernel. # cat /proc/4/stat ; sleep 10 ; cat /proc/4/stat 4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15347 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 4 (ksoftirqd/0) R 2 0 0 0 -1 2216730688 0 0 0 0 0 15670 0 0 15 -5 1 0 6 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 > The reason Andrea originally added the softirqds was just that > if you have very softirq intensive workloads they would tie > up too much CPU time or not make enough process with the default > "don't loop too often" heuristics. > >> We can not rely on irqs coming in when the softirq is raised from > > You can't rely on it, but it happens in near all cases. > > -Andi ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
> I have one machine SMP flooded by network frames, CPU0 handling all Yes that's the case softirqd is supposed to handle. When you spend a significant part of your CPU time in softirq context it kicks in to provide somewhat fair additional CPU time. But most systems (like mine) don't do that. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Wed, 13 May 2009, Andi Kleen wrote: > > "If a soft irq is raised in process context, raise_softirq() in > > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd > > softirqd is only used when the softirq runs for too long or when > there are no suitable irq exits for a long time. > > In normal situations (not excessive time in softirq) they don't > do anything. Err, no. Chris is completely correct: if (!in_interrupt()) wakeup_softirqd(); We can not rely on irqs coming in when the softirq is raised from thread context. An irq_exit might be faster to process it than the scheduler can schedule ksoftirqd in, but ksoftirqd is woken and runs nevertheless. If it finds a softirq pending then it processes them in it's context and irq_exit calls to softirq are returning right away. Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Thomas Gleixner writes: > Err, no. Chris is completely correct: > > if (!in_interrupt()) > wakeup_softirqd(); Yes you have to wake it up just in case, but it doesn't normally process the data because a normal softirq comes in faster. It's just a safety policy. You can check this by checking the accumulated CPU time on your ksoftirqs. Mine are all 0 even on long running systems. The reason Andrea originally added the softirqds was just that if you have very softirq intensive workloads they would tie up too much CPU time or not make enough process with the default "don't loop too often" heuristics. > We can not rely on irqs coming in when the softirq is raised from You can't rely on it, but it happens in near all cases. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
> "If a soft irq is raised in process context, raise_softirq() in > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd softirqd is only used when the softirq runs for too long or when there are no suitable irq exits for a long time. In normal situations (not excessive time in softirq) they don't do anything. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Andi Kleen wrote: > "Chris Friesen" writes: > >>One of the reasons I brought up this issue is that there is a lot of >>documentation out there that says "softirqs will be processed on return >>from a syscall". The fact that it actually depends on the scheduler >>parameters of the task issuing the syscall isn't ever mentioned. > It's not mentioned because it is not currently. Paul Mackerras explained the current behaviour earlier in the thread (when it was still on the ppc list). His explanation agrees with my exporation of the code. "If a soft irq is raised in process context, raise_softirq() in kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd runs soon to process the soft irq. So what would happen is that we would see the TIF_RESCHED_PENDING flag on the current task in the syscall exit path and call schedule() which would switch to ksoftirqd to process the soft irq (if it hasn't already been processed by that stage)." If the current task is of higher priority, ksoftirqd doesn't get a chance to run and we don't process softirqs on return from a syscall. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
"Chris Friesen" writes: > > One of the reasons I brought up this issue is that there is a lot of > documentation out there that says "softirqs will be processed on return > from a syscall". The fact that it actually depends on the scheduler > parameters of the task issuing the syscall isn't ever mentioned. It's not mentioned because it is not currently. However some network TCP RX processing can happen in process context, which gives you most of the benefit anyways. > In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel > source still has the following: > > Whenever a system call is about to return to userspace, or a > hardware interrupt handler exits, any 'software interrupts' > which are marked pending (usually by hardware interrupts) are > run (kernel/softirq.c). > > If anyone is looking at changing this code, it might be good to ensure > that at least the kernel docs are updated. So far the code is not changed in mainline. There have been some proposals only. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Hi. On Tue, May 12, 2009 at 11:12:58AM +0200, Peter Zijlstra (a.p.zijls...@chello.nl) wrote: > Wouldn't the even better solution be to get rid of softirqs > all-together? And move tasklets into some thread context? Only if we are ready to fix 7 times rescheduling regressions compared to kernel threads (work queue actually). At least that's how tasklet behaved compared to work queue 1.5 years ago in the simplest and quite naive test where tasklet/work rescheduled iself number of times: http://marc.info/?l=linux-crypto-vger&m=119462472517405&w=2 -- Evgeniy Polyakov ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: Paul Mackerras Date: Wed, 13 May 2009 15:15:34 +1000 > David Miller writes: > >> I fully expected us to be, at this point, talking about putting the >> pending softirq check back into the trap return path :-/ > > Would that actually do any good, in the case where the system has > decided that ksoftirqd is handling soft irqs at the moment? Even if ksoftirqd is running, we check and run pending softirqs from trap return. Sure, I imagine we could re-enter this "ksoftirq blocked by highprio thread" situation if we get flooded every single time over and over again. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
David Miller writes: > I fully expected us to be, at this point, talking about putting the > pending softirq check back into the trap return path :-/ Would that actually do any good, in the case where the system has decided that ksoftirqd is handling soft irqs at the moment? Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: Steven Rostedt Date: Tue, 12 May 2009 08:20:51 -0400 (EDT) > I'm going to be playing around with bypassing the net-rx/tx with my > network drivers. I'm going to add threaded irqs for my network cards and > have the driver threads do the work to get through the tcp/ip stack. > > I'll still keep the softirqs for other cards, but I want to see how fast > it speeds things up if I have the driver thread do it. I think your latency is going to be dreadful. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: Ingo Molnar Date: Tue, 12 May 2009 11:23:48 +0200 >> Wouldn't the even better solution be to get rid of softirqs >> all-together? >> >> I see the recent work by Thomas to get threaded interrupts >> upstream as a good first step towards that goal, once the RX >> processing is moved to a thread (or multiple threads) one can >> priorize them in the regular sys_sched_setscheduler() way and its >> obvious that a FIFO task above the priority of the network tasks >> will have network starvation issues. > > Yeah, that would be "nice". A single IRQ thread plus the process > context(s) doing networking might perform well. Nice for -rt goals, but not for latency. So we're going to regress in this area again? I can't see how that's so desirable, to be honest with you. The fact that this discussion started about a task with a certain priority not being able to make forward progress, even though it was correct coded, just because softirqs are being processed in a thread context, should be a big red flag that this is a buggered up design. I fully expected us to be, at this point, talking about putting the pending softirq check back into the trap return path :-/ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Ingo Molnar wrote: > * Chris Friesen wrote: >>I think I see a possible problem with this. Suppose I have a >>SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under >>the scenario above, schedule() would re-run the spinning task >>rather than ksoftirqd, thus preventing any incoming packets from >>being sent up the stack until we get a real hardware >>interrupt--which could be a whole jiffy if interrupt mitigation is >>enabled in the net device. >>DaveM pointed out that if we're doing transmits we're likely to >>hit local_bh_enable(), which would process the softirq work. >>However, I think we may still have a problem in the above rx-only >>scenario--or is it too contrived to matter? > This could occur, and the problem is really that task priorities do > not extend across softirq work processing. > > This could occur in ordinary SCHED_OTHER tasks as well, if the > softirq is bounced to ksoftirqd - which it only should be if there's > serious softirq overload - or, as you describe it above, if the > softirq is raised in process context: One of the reasons I brought up this issue is that there is a lot of documentation out there that says "softirqs will be processed on return from a syscall". The fact that it actually depends on the scheduler parameters of the task issuing the syscall isn't ever mentioned. In fact, "Documentation/DocBook/kernel-hacking.tmpl" in the kernel source still has the following: Whenever a system call is about to return to userspace, or a hardware interrupt handler exits, any 'software interrupts' which are marked pending (usually by hardware interrupts) are run (kernel/softirq.c). If anyone is looking at changing this code, it might be good to ensure that at least the kernel docs are updated. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Tue, 12 May 2009, Peter Zijlstra wrote: > On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote: > > > > Yeah, that would be "nice". A single IRQ thread plus the process > > context(s) doing networking might perform well. > > > > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so > > sure about - it's extra context-switching cost. > > Sure, that was implied by the getting rid of softirqs ;-), on -rt we > currently suffer this hardirq/softirq thread ping-pong, it sucks. I'm going to be playing around with bypassing the net-rx/tx with my network drivers. I'm going to add threaded irqs for my network cards and have the driver threads do the work to get through the tcp/ip stack. I'll still keep the softirqs for other cards, but I want to see how fast it speeds things up if I have the driver thread do it. -- Steve ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Tue, 2009-05-12 at 11:23 +0200, Ingo Molnar wrote: > > Yeah, that would be "nice". A single IRQ thread plus the process > context(s) doing networking might perform well. > > Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so > sure about - it's extra context-switching cost. Sure, that was implied by the getting rid of softirqs ;-), on -rt we currently suffer this hardirq/softirq thread ping-pong, it sucks. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote: > * Chris Friesen wrote: > > > This started out as a thread on the ppc list, but on the > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > > list a bit. > > > > Currently, if a softirq is raised in process context the > > TIF_RESCHED_PENDING flag gets set and on return to userspace we > > run the scheduler, expecting it to switch to ksoftirqd to handle > > the softirqd processing. > > > > I think I see a possible problem with this. Suppose I have a > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > > the scenario above, schedule() would re-run the spinning task > > rather than ksoftirqd, thus preventing any incoming packets from > > being sent up the stack until we get a real hardware > > interrupt--which could be a whole jiffy if interrupt mitigation is > > enabled in the net device. > > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing > will occur. > > > DaveM pointed out that if we're doing transmits we're likely to > > hit local_bh_enable(), which would process the softirq work. > > However, I think we may still have a problem in the above rx-only > > scenario--or is it too contrived to matter? > > This could occur, and the problem is really that task priorities do > not extend across softirq work processing. > > This could occur in ordinary SCHED_OTHER tasks as well, if the > softirq is bounced to ksoftirqd - which it only should be if there's > serious softirq overload - or, as you describe it above, if the > softirq is raised in process context: > > if (!in_interrupt()) > wakeup_softirqd(); > > that's not really clean. We look into eliminating process context > use of raise_softirq_irqsoff(). Such code sequence: > > local_irq_save(flags); > ... > raise_softirq_irqsoff(nr); > ... > local_irq_restore(flags); > > should be converted to something like: > > local_irq_save(flags); > ... > raise_softirq_irqsoff(nr); > ... > local_irq_restore(flags); > recheck_softirqs(); > > If someone does not do proper local_bh_disable()/enable() sequences > for micro-optimization reasons, then push the check to after the > critcal section - and dont cause extra reschedules by waking up > ksoftirqd. raise_softirq_irqsoff() will also be faster. Wouldn't the even better solution be to get rid of softirqs all-together? I see the recent work by Thomas to get threaded interrupts upstream as a good first step towards that goal, once the RX processing is moved to a thread (or multiple threads) one can priorize them in the regular sys_sched_setscheduler() way and its obvious that a FIFO task above the priority of the network tasks will have network starvation issues. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
* Peter Zijlstra wrote: > On Tue, 2009-05-12 at 10:12 +0200, Ingo Molnar wrote: > > * Chris Friesen wrote: > > > > > This started out as a thread on the ppc list, but on the > > > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > > > list a bit. > > > > > > Currently, if a softirq is raised in process context the > > > TIF_RESCHED_PENDING flag gets set and on return to userspace we > > > run the scheduler, expecting it to switch to ksoftirqd to handle > > > the softirqd processing. > > > > > > I think I see a possible problem with this. Suppose I have a > > > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > > > the scenario above, schedule() would re-run the spinning task > > > rather than ksoftirqd, thus preventing any incoming packets from > > > being sent up the stack until we get a real hardware > > > interrupt--which could be a whole jiffy if interrupt mitigation is > > > enabled in the net device. > > > > TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a > > SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing > > will occur. > > > > > DaveM pointed out that if we're doing transmits we're likely to > > > hit local_bh_enable(), which would process the softirq work. > > > However, I think we may still have a problem in the above rx-only > > > scenario--or is it too contrived to matter? > > > > This could occur, and the problem is really that task priorities do > > not extend across softirq work processing. > > > > This could occur in ordinary SCHED_OTHER tasks as well, if the > > softirq is bounced to ksoftirqd - which it only should be if there's > > serious softirq overload - or, as you describe it above, if the > > softirq is raised in process context: > > > > if (!in_interrupt()) > > wakeup_softirqd(); > > > > that's not really clean. We look into eliminating process context > > use of raise_softirq_irqsoff(). Such code sequence: > > > > local_irq_save(flags); > > ... > > raise_softirq_irqsoff(nr); > > ... > > local_irq_restore(flags); > > > > should be converted to something like: > > > > local_irq_save(flags); > > ... > > raise_softirq_irqsoff(nr); > > ... > > local_irq_restore(flags); > > recheck_softirqs(); > > > > If someone does not do proper local_bh_disable()/enable() sequences > > for micro-optimization reasons, then push the check to after the > > critcal section - and dont cause extra reschedules by waking up > > ksoftirqd. raise_softirq_irqsoff() will also be faster. > > > Wouldn't the even better solution be to get rid of softirqs > all-together? > > I see the recent work by Thomas to get threaded interrupts > upstream as a good first step towards that goal, once the RX > processing is moved to a thread (or multiple threads) one can > priorize them in the regular sys_sched_setscheduler() way and its > obvious that a FIFO task above the priority of the network tasks > will have network starvation issues. Yeah, that would be "nice". A single IRQ thread plus the process context(s) doing networking might perform well. Multiple IRQ threads (softirq and hardirq threads mixed) i'm not so sure about - it's extra context-switching cost. Btw, i noticed that using scheduling for work (packet, etc.) flow distribution standardizes and evens out the behavior of workloads. Softirq scheduling is really quite random currently. We have a random processing loop-limit in the core code and various batching and work-limit controls at individual usage sites. We sometimes piggyback to ksoftirqd. It's far easier to keep performance in check when things are more predictable. But this is not an easy endevour, and performance regressions have to be expected and addressed if they occur. There can be random packet queuing details in networking drivers that just happen to work fine now, and might work worse with a kernel thread in place. So there has to be broad buy-in for the concept, and a concerted effort to eliminate softirq processing and most of hardirq processing by pushing those two elements into a single hardirq thread (and the rest into process context). Not for the faint hearted. Nor is it recommended to be done without a good layer of asbestos. Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
* Chris Friesen wrote: > This started out as a thread on the ppc list, but on the > suggestion of DaveM and Paul Mackerras I'm expanding the receiver > list a bit. > > Currently, if a softirq is raised in process context the > TIF_RESCHED_PENDING flag gets set and on return to userspace we > run the scheduler, expecting it to switch to ksoftirqd to handle > the softirqd processing. > > I think I see a possible problem with this. Suppose I have a > SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under > the scenario above, schedule() would re-run the spinning task > rather than ksoftirqd, thus preventing any incoming packets from > being sent up the stack until we get a real hardware > interrupt--which could be a whole jiffy if interrupt mitigation is > enabled in the net device. TIF_RESCHED_PENDING will not be set if a SCHED_FIFO task wakes up a SCHED_OTHER ksoftirqd task. But starvation of ksoftirqd processing will occur. > DaveM pointed out that if we're doing transmits we're likely to > hit local_bh_enable(), which would process the softirq work. > However, I think we may still have a problem in the above rx-only > scenario--or is it too contrived to matter? This could occur, and the problem is really that task priorities do not extend across softirq work processing. This could occur in ordinary SCHED_OTHER tasks as well, if the softirq is bounced to ksoftirqd - which it only should be if there's serious softirq overload - or, as you describe it above, if the softirq is raised in process context: if (!in_interrupt()) wakeup_softirqd(); that's not really clean. We look into eliminating process context use of raise_softirq_irqsoff(). Such code sequence: local_irq_save(flags); ... raise_softirq_irqsoff(nr); ... local_irq_restore(flags); should be converted to something like: local_irq_save(flags); ... raise_softirq_irqsoff(nr); ... local_irq_restore(flags); recheck_softirqs(); If someone does not do proper local_bh_disable()/enable() sequences for micro-optimization reasons, then push the check to after the critcal section - and dont cause extra reschedules by waking up ksoftirqd. raise_softirq_irqsoff() will also be faster. Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
This started out as a thread on the ppc list, but on the suggestion of DaveM and Paul Mackerras I'm expanding the receiver list a bit. Currently, if a softirq is raised in process context the TIF_RESCHED_PENDING flag gets set and on return to userspace we run the scheduler, expecting it to switch to ksoftirqd to handle the softirqd processing. I think I see a possible problem with this. Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set. Under the scenario above, schedule() would re-run the spinning task rather than ksoftirqd, thus preventing any incoming packets from being sent up the stack until we get a real hardware interrupt--which could be a whole jiffy if interrupt mitigation is enabled in the net device. DaveM pointed out that if we're doing transmits we're likely to hit local_bh_enable(), which would process the softirq work. However, I think we may still have a problem in the above rx-only scenario--or is it too contrived to matter? Thanks, Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Chris Friesen writes: > Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT > set (and maybe doing other stuff if there are no messages). In this > case, schedule() would re-run the spinning task rather than running > ksoftirqd. This could prevent any incoming packets from actually being > sent up the stack until we get a real hardware interrupt--which could be > a whole jiffy if interrupt mitigation is enabled in the net device. I suggest you ask Ingo Molnar about that. > (And maybe longer if NOHZ is enabled.) We still have a timer interrupt every jiffy when stuff is running; we only turn off the timer interrupts when idle. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: "Chris Friesen" Date: Mon, 11 May 2009 12:25:54 -0600 > David Miller wrote: > >> You know, for networking over loopback (one of the only real cases >> that even matters, if we get a hard interrupt then the return from >> that would process any softints), we probably make out just fine >> anyways. As long as we hit a local_bh_enable() (and in the return >> path from device transmit that's exceedingly likely as all of the >> networking locking is BH safe) we'll run the softints from that and >> thus long before we get to syscall return. > > What about the issue I raised earlier? (I don't think you were copied > at that point.) I'm sure all of the networking experts on linuxppc-dev will have an answer. And yes that was sarcasm :-) You need to ask this on netdev or similar list. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
David Miller wrote: > You know, for networking over loopback (one of the only real cases > that even matters, if we get a hard interrupt then the return from > that would process any softints), we probably make out just fine > anyways. As long as we hit a local_bh_enable() (and in the return > path from device transmit that's exceedingly likely as all of the > networking locking is BH safe) we'll run the softints from that and > thus long before we get to syscall return. What about the issue I raised earlier? (I don't think you were copied at that point.) Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set (and maybe doing other stuff if there are no messages). In this case, schedule() would re-run the spinning task rather than running ksoftirqd. This could prevent any incoming packets from actually being sent up the stack until we get a real hardware interrupt--which could be a whole jiffy if interrupt mitigation is enabled in the net device. (And maybe longer if NOHZ is enabled.) Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: Paul Mackerras Date: Sat, 9 May 2009 13:31:23 +1000 > David Miller writes: > >> Grumble, when did that happen :-( > > Ages ago (i.e. before the switch to git :). Talk to Ingo, it's his > doing IIRC. I'll first do some data mining before coming to any (further) conclusions :-) >> That's horrible for latency compared to handling it directly >> in the trap return path. > > Actually, I don't know why we ever let there be softirqs pending when > we're in process context. I would think that we should just call > do_softirq immediately if we raise a softirq when !in_interrupt(). > But I might be missing some subtlety. I bet it was a non-starter before IRQ stacks. It does seem like a good idea to me. You know, for networking over loopback (one of the only real cases that even matters, if we get a hard interrupt then the return from that would process any softints), we probably make out just fine anyways. As long as we hit a local_bh_enable() (and in the return path from device transmit that's exceedingly likely as all of the networking locking is BH safe) we'll run the softints from that and thus long before we get to syscall return. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
David Miller writes: > Grumble, when did that happen :-( Ages ago (i.e. before the switch to git :). Talk to Ingo, it's his doing IIRC. > That's horrible for latency compared to handling it directly > in the trap return path. Actually, I don't know why we ever let there be softirqs pending when we're in process context. I would think that we should just call do_softirq immediately if we raise a softirq when !in_interrupt(). But I might be missing some subtlety. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
> > The soft irq stuff is pretty much all generic code these days, except > > for the code to switch to the softirq stack. > > Grumble, when did that happen :-( > > That's horrible for latency compared to handling it directly > in the trap return path. If it is indeed such a problem, it would be reasonably easy to handle it in the return-to-userspace path around the same place where we test for pending signals (isn't what we used to do anyway ?) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Paul Mackerras wrote: If a soft irq is raised in process context, raise_softirq() in kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd runs soon to process the soft irq. So what would happen is that we would see the TIF_RESCHED_PENDING flag on the current task in the syscall exit path and call schedule() which would switch to ksoftirqd to process the soft irq (if it hasn't already been processed by that stage). I think I see a problem with this. Suppose I have a SCHED_FIFO task spinning on recvmsg() with MSG_DONTWAIT set (and maybe doing other stuff if there are no messages). Under the scenario you described, schedule() would re-run the spinning task, no? This could prevent any incoming packets from actually being sent up the stack until we get a real hardware interrupt--which could be a whole jiffy if interrupt mitigation is enabled in the net device. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: Paul Mackerras Date: Sat, 9 May 2009 09:34:29 +1000 > If a soft irq is raised in process context, raise_softirq() in > kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd > runs soon to process the soft irq. So what would happen is that we > would see the TIF_RESCHED_PENDING flag on the current task in the > syscall exit path and call schedule() which would switch to ksoftirqd > to process the soft irq (if it hasn't already been processed by that > stage). > > If the soft irq is raised in interrupt context, then the soft irq gets > run via the do_softirq() call in irq_exit(), as you saw. > > The soft irq stuff is pretty much all generic code these days, except > for the code to switch to the softirq stack. Grumble, when did that happen :-( That's horrible for latency compared to handling it directly in the trap return path. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
Chris Friesen writes: > I'm trying to figure out where exactly softirqs are called on return > from a syscall in 64-bit powerpc. I can see where they get called for a > normal interrupt via the irq_exit() path, but not for syscalls. If a soft irq is raised in process context, raise_softirq() in kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd runs soon to process the soft irq. So what would happen is that we would see the TIF_RESCHED_PENDING flag on the current task in the syscall exit path and call schedule() which would switch to ksoftirqd to process the soft irq (if it hasn't already been processed by that stage). If the soft irq is raised in interrupt context, then the soft irq gets run via the do_softirq() call in irq_exit(), as you saw. The soft irq stuff is pretty much all generic code these days, except for the code to switch to the softirq stack. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: question about softirqs
From: "Chris Friesen" Date: Fri, 08 May 2009 16:51:25 -0600 > I'm trying to figure out where exactly softirqs are called on return > from a syscall in 64-bit powerpc. I can see where they get called for > a normal interrupt via the irq_exit() path, but not for syscalls. > > I'm sure I'm missing something obvious...can anyone help? I can't see where it does this either, strange. That would be a very terrible bug if it's not invoking pending softirqs before return from system calls. Although, it might be happening via some clever side effect of how the software managed hardware interrupt stuff works on powerpc. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
question about softirqs
Hi all, I'm trying to figure out where exactly softirqs are called on return from a syscall in 64-bit powerpc. I can see where they get called for a normal interrupt via the irq_exit() path, but not for syscalls. I'm sure I'm missing something obvious...can anyone help? Thanks, Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev