On Mon, Jan 25, 2016 at 3:53 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > On Mon, Jan 25, 2016 at 1:06 PM, Taylor R Campbell > <campbell+netbsd-tech-k...@mumble.net> wrote: >> Date: Mon, 25 Jan 2016 11:25:16 +0900 >> From: Ryota Ozaki <ozak...@netbsd.org> >> >> On Tue, Jan 19, 2016 at 2:22 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >> (snip) >> >> (a) a per-CPU pktq that never distributes packets to another CPU, or >> >> (b) a single-CPU pktq, to be used only from the CPU to which the >> >> device's (queue's) interrupt handler is bound. >> >> >> > I'll rewrite the patch as your suggestion (I prefer (a) for now). >> >> Through rewriting it, I feel that it seems to be a lesser version of >> pktqueue. So I think it may be better changing pktqueue to have a flag >> to not distribute packets between CPUs than implementing another one >> duplicating pktqueue. Here is a patch with the approach: >> http://www.netbsd.org/~ozaki-r/pktq-without-ipi.diff >> >> If we call pktq_create with PKTQ_F_NO_DISTRIBUTION, pktqueue doesn't >> setup IPI for softint and never call softint_schedule_cpu (i.e., >> never distribute packets). >> >> How about the approach? >> >> Some disjointed thoughts: >> >> 1. I don't think you actually need to change pktq(9). It looks like >> if you pass in cpu_index(curcpu()) for the hash, it will consistently >> use the current CPU, for which softint_schedule_cpu has a special case >> that avoids ipi. So I don't expect it's substantially different from >> <https://www.netbsd.org/~ozaki-r/softint-if_input.diff> -- though >> maybe measurements will show my analysis is wrong! > > My intention is to prevent ipi_register in pktq_create and > so we don't need ipi_sysinit movement... > >> >> 2. Even though you avoid ipi(9), you're still using pcq(9), which >> requires interprocessor synchronization -- but that is an unnecessary >> cost because you're simply passing packets from hardintr to softintr >> context on a single CPU. So that's why I specifically suggested ifq, >> not pcq or pktqueue. > > ...though, right. membars in pcq(9) are just overhead. > > Okay, I'll implement softint + percpu irqs.
Here it is: http://www.netbsd.org/~ozaki-r/softint-if_input-ifqueue.diff Results of performance measurements of it are also added to https://gist.github.com/ozaki-r/975b06216a54a084debc The results are good but bothers me; it achieves better performance than vanilla (and the 1st implementation) on high load (IP forwarding). For fast forward, it also beats the 1st one. I thought that holding splnet during ifp->if_input (splnet is needed for ifqueue operations and so keep holding in the patch) might affect the results. So I tried to release during ifp->if_input but the results didn't change so much (the result of IP forwarding is still better than vanilla). Anyone have any ideas? ozaki-r