Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
I asked on 2016-06-07 17:19:43 [-0700]: >>cpsw_rx_poll() is called even when there is essentially no network >>traffic, so I'm not sure how to tell if NAPI is working as intended. On Thu, Jun 9, 2016 at 5:37 AM, Sebastian Andrzej Siewior <bige...@linutronix.de> wrote: > You should see an invocation of __raise_softirq_irqoff_ksoft() and then > cpsw's poll function should run in "ksoftirqd/" context instead in the > context of the task it runs now. The attached patch uses a kprobe to detect when Ethernet switches to NAPI on a Freescale i.MX6 board. Thanks to Sebastian for the suggestion about the method. As expected, there are no messages when I ping-flood the board from another host. However, if I also spawn multiple scp's of large files at the same time, then the messages appear. I tested with 4.4.4-rt11, but the virtually identical patch is against 4.1.18-rt17. I'm posting it here in case it's useful to someone else. It seems to me that if the various IRQs that can invoke the net_rx_action() are pinned to different cores, that the use of smp_processor_id() to identify the device that spawns the IRQ is therefore robust. The RT scheduling problem we had (namely, system falls over under ping-flood) was solved by my colleague Brian Silverman, who pinned our userspace application that ran the critical event loop and adjusted its priority. Doing so prevented a ping-flood from causing the event loop to miss cycles. Thanks again to everyone for your advice, and I hope to meet some of you in Berlin next month. -- Alison From 1c83b4ee5d572bc1ede630fc72d01228ff2338e2 Mon Sep 17 00:00:00 2001 From: Alison Chaiken <ali...@peloton-tech.com> Date: Sat, 3 Sep 2016 15:51:41 -0700 Subject: [PATCH] kprobes: detect ethernet and CAN NAPI Inserting this module will induce the core network driver to print a message whenever handling of CAN or Ethernet packets is performed via NAPI in ksoftirqd rather than in the context of the invoking hard IRQ thread. Tested on Boundary Devices Nitrogen board with i.MX6Q SOC with 4.4.4-rt11 kernel. Signed-off-by: Alison Chaiken <ali...@she-devel.com> --- samples/kprobes/kprobe_example.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 366db1a..ef3b5ee 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -16,7 +16,8 @@ /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { - .symbol_name = "do_fork", +/* .symbol_name = "do_fork", */ + .symbol_name = "__raise_softirq_irqoff_ksoft", }; /* kprobe pre_handler: called just before the probed instruction is executed */ @@ -51,6 +52,8 @@ static int handler_pre(struct kprobe *p, struct pt_regs *regs) static void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { + unsigned id = smp_processor_id(); + #ifdef CONFIG_X86 printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n", p->addr, regs->flags); @@ -67,6 +70,25 @@ static void handler_post(struct kprobe *p, struct pt_regs *regs, printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n", p->addr, regs->ex1); #endif + + /* change id to that where the eth IRQ is pinned */ + if (id == 0) { + pr_info("Switched to ethernet NAPI.\n"); +#ifdef CONFIG_ARM + pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx," + " lr = 0x%lx, cpsr = 0x%lx\n", + p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr); +#endif + } + /* change id to that where the CAN IRQ is pinned */ + if (id == 1) { + pr_info("Switched to CAN NAPI.\n"); +#ifdef CONFIG_ARM + pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx," + " lr = 0x%lx, cpsr = 0x%lx\n", + p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr); +#endif + } } /* -- 2.1.4
Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
I wrote: >>We've applied Sebastian's commit "softirq: split timer softirqs out of >>ksoftirqd," which improved event loop stability substantially when we Sebastian Andrzej Siewiorreplied: >Why did you apply that one? You have 4.1.18-ti-rt so I don't know how >that works but v4.1.15-rt18 had this patch included. Also "net: provide >a way to delegate processing a softirq to ksoftirqd" should be applied >(which is also part of v4.1.15-rt18). Sorry to be obscure; I had applied that patch to v4.1.6-rt5. > What I remember from testing the two patches on am335x was that before a > ping flood on gbit froze the serial console but with them it the ping > flood was not noticed. I compiled a kernel from upstream d060a36 "Merge branch 'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into ti-rt-linux-4.1.y" which is unpatched except for using a board-appropriate device-tree.The serial console is responsive with all our RT userspace applications running alongside a rapid external ping. However, our main event loop misses frequently as soon as ping faster than 'ping -i 0.0002' is run.mpstat shows that the sum of the hard IRQ rates in a second is equal precisely to the NET_RX rate, which is ~3400/s. Does the fact that 3400 < (1/0.0002) already mean that some packets are dropped? ftrace shows that cpsw_rx_poll() is called even when there is essentially no network traffic, so I'm not sure how to tell if NAPI is working as intended. I tried running the wakeup_rt tracer, but it loads the system too much. With ftrace capturing IRQ, scheduler and net events, we're writing out markers into the trace buffer when the event loop makes its deadline and then when it misses so that we can compare the normal and long-latency intervals, but there doesn't appear to be a smoking gun in the difference between the two. Thanks for all your help, Alison
Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always take poll_lock when doing polling" >> >> [ Alison, can you try this patch ] >> Sebastian follows up: >> >Alison, did you try it? I wrote: >> I did try that patch, but it hasn't made much difference. Let me >> back up and restate the problem I'm trying to solve, which is that a >> DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a >> main event loop in user space that misses latency deadlines under the >> test condition where I ping-flood it from another box. While in >> production, the system would not be expected to support high rates of >> network traffic, but the instability with the ping-flood makes me >> wonder if there aren't underlying configuration problems. Clark asked: > What sort of tunings have you applied, regarding thread and interrupt > affinity? > Also, what scheduler policy/priority are you using for the user-space > application? We have the most critical hard IRQs (CAN, UART) pinned to one core, scheduled with FIFO, and running at highest RT priority. The less critical IRQs (ethernet, MMC, DMA) are pinned to the other core and are running at lower FIFO priority. Next in FIFO priority we have the ktimersoftd threads. Then we have our critical userspace application running under RR with slightly lower priority and no pinning. When there is not much network traffic, the userspace event_loop makes its deadlines, but when there is a lot of network traffic, the two network hard IRQs shoot to the top of the process table, with one of them using about 80% of one core. This behavior persists whether the kernel includes "net: provide a way to delegate processing a softirq to ksoftirqd", "softirq: Perform softirqs in local_bh_enable() for a limited amount of time", or reverts c10d73671 "softirq: reduce latencies". It's hard to see how a *hard* IRQ could take so much processor time. I guess this gets back to http://article.gmane.org/gmane.linux.kernel/2219110: From: Rik van Riel <> Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop I need to get back to fixing irq & softirq time accounting, which does not currently work correctly in all time keeping modes... So most likely the softirq budget is getting charged to the hard IRQ that raises it. >If you have not, you might try isolating one of your cores and just run the >user-space application on that core, with interrupt threads running on the >other core. You could use the 'tuna' application like this: > $ sudo tuna --cpus=1 --isolate > This will move all the threads that *can* be moved off of cpu1 (probably to > cpu0 since I believe the OMAP5 is a dual-core processor?). Thanks, I installed tuna and gave that a try, but it actually makes things worse. I also tried lowering the priority of the ethernet hard IRQ below that of the most critical userspace application, to no avail. Perhaps expecting an RT system to survive a ping-flood is just unreasonable? It would be nice to deliver a system that I didn't know how to bring down. At least in our real use case, the critical system will be NAT'ed and packets will not be forwarded to it. Thanks, Alison
Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always take poll_lock when doing polling" >> [ Alison, can you try this patch ] Sebastian follows up: >Alison, did you try it? Sorry for not responding sooner. I was hoping to come to a complete understanding of the system before replying . . . I did try that patch, but it hasn't made much difference. Let me back up and restate the problem I'm trying to solve, which is that a DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a main event loop in user space that misses latency deadlines under the test condition where I ping-flood it from another box. While in production, the system would not be expected to support high rates of network traffic, but the instability with the ping-flood makes me wonder if there aren't underlying configuration problems. We've applied Sebastian's commit "softirq: split timer softirqs out of ksoftirqd," which improved event loop stability substantially when we left ksoftirqd running at userspace default but elevated ktimersoftd. That made me think that focusing on the softirqs was pertinent. Subsequently, I've tried "[PATCH][RT] netpoll: Always take poll_lock when doing polling" (which seems like a good idea in any event). After reading the "net: threadable napi poll loop discussion" (https://lkml.org/lkml/2016/5/10/472), and https://lkml.org/lkml/2016/2/27/152, I tried reverting commit c10d73671ad30f54692f7f69f0e09e75d3a8926a Author: Eric Dumazet <eduma...@google.com> Date: Thu Jan 10 15:26:34 2013 -0800 softirq: reduce latencies but that didn't help. When the userspace application (running at -3 priority) starts having problems, I see that the hard IRQ associated with the ethernet device uses about 35% of one core, which seems awfully high if the NAPI has triggered a switch to polling. I vaguely recall David Miller saying in the "threadable napi poll loop" discussion that accounting was broken for net IRQs, so perhaps that number is misleading. mpstat shows that the NET_RX softirqs run on the same core where we've pinned the ethernet IRQ, so you might hope that userspace might be able to run happily on the other one. What I see in ftrace while watching scheduler and IRQ events is that the userspace application is yielding to ethernet or CAN IRQs, which also raise NET_RX.In the following, ping-flood is running, and irq/343 is the ethernet one: userspace_application-4767 [000] dn.h1.. 4196.422318: irq_handler_entry: irq=347 name=can1 userspace_application-4767 [000] dn.h1.. 4196.422319: irq_handler_exit: irq=347 ret=handled userspace_application-4767 [000] dn.h2.. 4196.422321: sched_waking: comm=irq/347-can1 pid=2053 prio=28 target_cpu=000 irq/343-4848400-874 [001] 112 4196.422323: softirq_entry: vec=3 [action=NET_RX] userspace_application-4767 [000] dn.h3.. 4196.422325: sched_wakeup: comm=irq/347-can1 pid=2053 prio=28 target_cpu=000 irq/343-4848400-874 [001] 112 4196.422328: napi_poll: napi poll on napi struct edd5f560 for device eth0 irq/343-4848400-874 [001] 112 4196.422329: softirq_exit: vec=3 [action=NET_RX] userspace_application-4767 [000] dn..3.. 4196.422332: sched_stat_runtime: comm=userspace_application pid=4767 runtime=22448 [ns] vruntime=338486919642 [ns] userspace_application-4767 [000] d...3.. 4196.422336: sched_switch: prev_comm=userspace_application prev_pid=4767 prev_prio=120 prev_state=R ==> next_comm=irq/347-can1 next_pid=2053 next_prio=28 irq/343-4848400-874 [001] d...3.. 4196.422339: sched_switch: prev_comm=irq/343-4848400 prev_pid=874 prev_prio=47 prev_state=S ==> next_comm=irq/344-4848400 next_pid=875 next_prio=47 You can see why the application is having problems: it is constantly interrupted by eth and CAN IRQs. Given that CAN traffic is critical for our application, perhaps we will simply have to reduce the eth hard IRQ priority in order to make the system more robust? It would be great to offload the network traffic-handling to the Cortex-M processor on the DRA7, but I fear that the development schedule will not allow for that option. I still am not sure how to tell if the NAPI switch from interrupt-driven to polling is properly taking place. Any suggestion on how best to monitor that behavior with overly loading the system would be appreciated. Thanks again for the patches, Alison Chaiken Peloton Technology