Re: netisr observations
[snip] So, hm, the thing that comes to mind is the flowid. What's the various flowid's for flows? Are they all mapping to CPU 3 somehow? -a ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
tunnels, mtu and payload length
Hi. Can someone explain me where are the 4 missing bytes when capturing traffic on a gif interface with a tcpdump ? I expect to see the length of the first fragment (offset = 0) to be equal to an mtu (1280 bytes), but clearly it's 1276 bytes. Same thing happens to a gre tunnel. # ifconfig gif0 gif0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST metric 0 mtu 1280 tunnel inet 192.168.3.24 -- 192.168.3.17 inet 172.16.5.40 -- 172.16.5.41 netmask 0x inet6 fe80::21a:64ff:fe21:8e80%gif0 prefixlen 64 scopeid 0x1c nd6 options=21PERFORMNUD,AUTO_LINKLOCAL # ping -s 4096 172.16.5.41 PING 172.16.5.41 (172.16.5.41): 4096 data bytes 4104 bytes from 172.16.5.41: icmp_seq=0 ttl=64 time=0.837 ms 4104 bytes from 172.16.5.41: icmp_seq=1 ttl=64 time=0.870 ms 4104 bytes from 172.16.5.41: icmp_seq=2 ttl=64 time=0.779 ms 4104 bytes from 172.16.5.41: icmp_seq=3 ttl=64 time=0.823 ms 4104 bytes from 172.16.5.41: icmp_seq=4 ttl=64 time=0.794 ms tcpdump: 12:58:33.430450 IP (tos 0x0, ttl 64, id 40760, offset 0, flags [+], proto ICMP (1), length 1276) 172.16.5.40 172.16.5.41: ICMP echo request, id 62980, seq 17, length 1256 12:58:33.430467 IP (tos 0x0, ttl 64, id 40760, offset 1256, flags [+], proto ICMP (1), length 1276) 172.16.5.40 172.16.5.41: ip-proto-1 12:58:33.430481 IP (tos 0x0, ttl 64, id 40760, offset 2512, flags [+], proto ICMP (1), length 1276) 172.16.5.40 172.16.5.41: ip-proto-1 12:58:33.430494 IP (tos 0x0, ttl 64, id 40760, offset 3768, flags [none], proto ICMP (1), length 356) 172.16.5.40 172.16.5.41: ip-proto-1 Thanks. Eugene. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
dummynet/ipfw high load?
Good day, gurus! We have a servers on the FreeBSD. They do NAT, shaping and traffic accounting for our home (mainly) customers. NAT realized with pf nat, shaping with ipfw dummynet and traffic accounting with ng_netflow via ipfw ng_tee. The problem is performance on (relatively) high traffic. On Xeon E3-1270, whereas use Intel 10Gbit/sec 82599-based NIC(ix) or Intel I350 (82579) in lagg transit traffic in 800 Mbit/sec and 100 kpps [to customers] cause CPU load almost at 100% by interrupts from NIC or, in case of net.isr.dispatch=deferred and net.inet.ip.fastforwarding=0. Deleting ipfw pipe decrease load at ~30% per cpu. Deleting ipfw ng_tee (to ng_netflow) decrease load at 15% per cpu. Turning off ipfw (sysctl net.inet.ip.fw.enable=0) decrease load more, so what server can pass (nat'ed!) traffic on 1600 Mbit/sec and 200 kpps with only load ~40% per cpu. So my questions are: 1. Are there any way to decrease system load caused by dummynet/ipfw? 2. Why dummynet/ipfw increase *interrupts* load, not kernel or something like that? 3. Are there any way to profiling that kind of load? Existing DTrace and pmcstat examples almost useless or I just doesn't know how to do it properly. Huge size of debugging info (including dtrace and pmcstat samples), sysctl settings and so on, I opened appropriate topic at russian network operator's forum: http://forum.nag.ru/forum/index.php?showtopic=93674 In english it's available via google translate: http://translate.google.com/translate?hl=ensl=autotl=enu=http%3A%2F%2Fforum.nag.ru%2Fforum%2Findex.php%3Fshowtopic%3D93674 Feel free to ask me any question and do actions on the server! I would be VERY appreciate for any help and can take any measuring and debugging on the one server. Moreover, I'm ready to give root access to any of the appropriate person (as I already did it to Gleb Smirnoff when we were investigate pf state problem). -- Best regards, Dennis Yusupoff, network engineer of Smart-Telecom ISP Russia, Saint-Petersburg ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00
Hi, since folks are playing with Midori's DCTCP patch, I wanted to make sure that you were also aware of the patches that Aris did for PRR and NewCWV... Lars On 2014-2-4, at 10:38, Eggert, Lars l...@netapp.com wrote: Hi, below are two patches that implement RFC6937 (Proportional Rate Reduction for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to support Rate-Limited Traffic). They were done by Aris Angelogiannopoulos for his MS thesis, which is at https://eggert.org/students/angelogiannopoulos-thesis.pdf. The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the delay in sending them, we'd been trying to get some feedback from committers first, without luck.) Please note that newcwv is still a work in progress in the IETF, and the patch has some limitations with regards to the pipeACK Sampling Period mentioned in the Internet-Draft. Aris says this in his thesis about what exactly he implemented: The second implementation choice, is in regards with the measurement of pipeACK. This variable is the most important introduced by the method and is used to compute the phase that the sender currently lies in. In order to compute pipeACK the approach suggested by the Internet Draft (ID) is followed [ncwv]. During initialization, pipeACK is set to the maximum possible value. A helper variable prevHighACK is introduced that is initialized to the initial sequence number (iss). prevHighACK holds the value of the highest acknowledged byte so far. pipeACK is measured once per RTT meaning that when an ACK covering prevHighACK is received, pipeACK becomes the difference between the current ACK and prevHighACK. This is called a pipeACK sample. A newer version of the draft suggests that multiple pipeACK samples can be used during the pipeACK sampling period. Lars prr.patchnewcwv.patch signature.asc Description: Message signed with OpenPGP using GPGMail
Re: dummynet/ipfw high load?
Hi, I had similar problem on the past and it turned to be the ammount of rules in ipfe. Using reduced subset with tables actually reduced the load. Sami בתאריך יום שישי, 11 באפריל 2014, Dennis Yusupoff d...@smartspb.net כתב: Good day, gurus! We have a servers on the FreeBSD. They do NAT, shaping and traffic accounting for our home (mainly) customers. NAT realized with pf nat, shaping with ipfw dummynet and traffic accounting with ng_netflow via ipfw ng_tee. The problem is performance on (relatively) high traffic. On Xeon E3-1270, whereas use Intel 10Gbit/sec 82599-based NIC(ix) or Intel I350 (82579) in lagg transit traffic in 800 Mbit/sec and 100 kpps [to customers] cause CPU load almost at 100% by interrupts from NIC or, in case of net.isr.dispatch=deferred and net.inet.ip.fastforwarding=0. Deleting ipfw pipe decrease load at ~30% per cpu. Deleting ipfw ng_tee (to ng_netflow) decrease load at 15% per cpu. Turning off ipfw (sysctl net.inet.ip.fw.enable=0) decrease load more, so what server can pass (nat'ed!) traffic on 1600 Mbit/sec and 200 kpps with only load ~40% per cpu. So my questions are: 1. Are there any way to decrease system load caused by dummynet/ipfw? 2. Why dummynet/ipfw increase *interrupts* load, not kernel or something like that? 3. Are there any way to profiling that kind of load? Existing DTrace and pmcstat examples almost useless or I just doesn't know how to do it properly. Huge size of debugging info (including dtrace and pmcstat samples), sysctl settings and so on, I opened appropriate topic at russian network operator's forum: http://forum.nag.ru/forum/index.php?showtopic=93674 In english it's available via google translate: http://translate.google.com/translate?hl=ensl=autotl=enu=http%3A%2F%2Fforum.nag.ru%2Fforum%2Findex.php%3Fshowtopic%3D93674 Feel free to ask me any question and do actions on the server! I would be VERY appreciate for any help and can take any measuring and debugging on the one server. Moreover, I'm ready to give root access to any of the appropriate person (as I already did it to Gleb Smirnoff when we were investigate pf state problem). -- Best regards, Dennis Yusupoff, network engineer of Smart-Telecom ISP Russia, Saint-Petersburg ___ freebsd-net@freebsd.org javascript:; mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.orgjavascript:; -- Sami Halabi Information Systems Engineer NMS Projects Expert FreeBSD SysAdmin Expert ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00
On Fri, Apr 11, 2014 at 4:15 AM, Eggert, Lars l...@netapp.com wrote: Hi, since folks are playing with Midori's DCTCP patch, I wanted to make sure that you were also aware of the patches that Aris did for PRR and NewCWV... prr.patchnewcwv.patch Lars, There are no actual patches attached here. (Or the mailing-list dropped them.) cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netisr observations
On Fri, Apr 11, 2014 at 2:48 AM, Adrian Chadd adr...@freebsd.org wrote: [snip] So, hm, the thing that comes to mind is the flowid. What's the various flowid's for flows? Are they all mapping to CPU 3 somehow The output of netstat -Q shows IP dispatch is set to default, which is direct (NETISR_DISPATCH_DIRECT). That means each IP packet will be processed on the same CPU that the Ethernet processing for that packet was performed on, so CPU selection for IP packets will not be based on flowid. The output of netstat -Q shows Ethernet dispatch is set to direct (NETISR_DISPATCH_DIRECT if you wind up reading the code), so the Ethernet processing for each packet will take place on the same CPU that the driver receives that packet on. For the igb driver with queues autoconfigured and msix enabled, as the sysctl output shows you have, the driver will create a number of queues subject to device limitations, msix message limitations, and the number of CPUs in the system, establish a separate interrupt handler for each one, and bind each of those interrupt handlers to a separate CPU. It also creates a separate single-threaded taskqueue for each queue. Each queue interrupt handler sends work to its associated taskqueue when the interrupt fires. Those taskqueues are where the Ethernet packets are received and processed by the driver. The question is where those taskqueue threads will be run. I don't see anything in the driver that makes an attempt to bind those taskqueue threads to specific CPUs, so really the location of all of the packet processing is up to the scheduler (i.e., arbitrary). The summary is: 1. the hardware schedules each received packet to one of its queues and raises the interrupt for that queue 2. that queue interrupt is serviced on the same CPU all the time, which is different from the CPUs for all other queues on that interface 3. the interrupt handler notifies the corresponding task queue, which runs its task in a thread on whatever CPU the scheduler chooses 4. that task dispatches the packet for Ethernet processing via netisr, which processes it on whatever the current CPU is 5. Ethernet processing dispatches that packet for IP processing via netisr, which processes it on whatever the current CPU is You might want to try changing the default netisr dispatch policy to 'deferred' (sysctl net.isr.dispatch=deferred). If you do that, the Ethernet processing will still happen on an arbitrary CPU chosen by the scheduler, but the IP processing should then get mapped to a CPU based on the flowid assigned by the driver. Since igb assigns flowids based on received queue number, all IP (and above) processing for that packet should then be performed on the same CPU the queue interrupt was bound to. Unfortunately, I don't have a system with igb interfaces to try that on. -Patrick ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Preventing ng_callout() timeouts to trigger packet queuing
disclaimer: I'm not looking at the code now.. I want to go to bed: :-) When I wrote that code, the idea was that even a direct node execution should become a queuing operation if there was already something else on the queue. so in that model packets were not supposed to get re-ordered. does that not still work? Either that, or you need to explain the problem to me a bit better.. On 4/10/14, 5:25 AM, Karim Fodil-Lemelin wrote: Hi, Below is a revised patch for this issue. It accounts for nodes or hooks that explicitly need to be queuing: @@ -3632,7 +3632,12 @@ ng_callout(struct callout *c, node_p node, hook_p hook, int ticks, if ((item = ng_alloc_item(NGQF_FN, NG_NOFLAGS)) == NULL) return (ENOMEM); - item-el_flags |= NGQF_WRITER; + if ((node-nd_flags NGF_FORCE_WRITER) || + (hook (hook-hk_flags HK_FORCE_WRITER))) + item-el_flags |= NGQF_WRITER; + else + item-el_flags |= NGQF_READER; + NG_NODE_REF(node); /* and one for the item */ NGI_SET_NODE(item, node); if (hook) { Regards, Karim. On 09/04/2014 3:16 PM, Karim Fodil-Lemelin wrote: Hi List, I'm calling out to the general wisdom ... I have seen an issue in netgraph where, if called, a callout routine registered by ng_callout() will trigger packet queuing inside the worklist of netgraph since ng_callout() makes my node suddenly a WRITER node (therefore non reentrant) for the duration of the call. So as soon as the callout function returns, all following packets will get directly passed to the node again and when the ngintr thread gets executed then only then will I get the queued packets. This introduces out of order packets in the flow. I am using the current patch below to solve the issue and I am wondering if there is anything wrong with it (and maybe contribute back :): @@ -3632,7 +3632,7 @@ ng_callout(struct callout *c, node_p node, hook_p hook, int ticks, if ((item = ng_alloc_item(NGQF_FN, NG_NOFLAGS)) == NULL) return (ENOMEM); - item-el_flags |= NGQF_WRITER; + item-el_flags = NGQF_READER; NG_NODE_REF(node); /* and one for the item */ NGI_SET_NODE(item, node); if (hook) { Best regards, Karim. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Preventing ng_callout() timeouts to trigger packet queuing
Well, ethernet drivers nowdays seem to be doing: * always queue * then pop the head item off the queue and transmit that. -a On 11 April 2014 11:59, Julian Elischer jul...@freebsd.org wrote: disclaimer: I'm not looking at the code now.. I want to go to bed: :-) When I wrote that code, the idea was that even a direct node execution should become a queuing operation if there was already something else on the queue. so in that model packets were not supposed to get re-ordered. does that not still work? Either that, or you need to explain the problem to me a bit better.. On 4/10/14, 5:25 AM, Karim Fodil-Lemelin wrote: Hi, Below is a revised patch for this issue. It accounts for nodes or hooks that explicitly need to be queuing: @@ -3632,7 +3632,12 @@ ng_callout(struct callout *c, node_p node, hook_p hook, int ticks, if ((item = ng_alloc_item(NGQF_FN, NG_NOFLAGS)) == NULL) return (ENOMEM); - item-el_flags |= NGQF_WRITER; + if ((node-nd_flags NGF_FORCE_WRITER) || + (hook (hook-hk_flags HK_FORCE_WRITER))) + item-el_flags |= NGQF_WRITER; + else + item-el_flags |= NGQF_READER; + NG_NODE_REF(node); /* and one for the item */ NGI_SET_NODE(item, node); if (hook) { Regards, Karim. On 09/04/2014 3:16 PM, Karim Fodil-Lemelin wrote: Hi List, I'm calling out to the general wisdom ... I have seen an issue in netgraph where, if called, a callout routine registered by ng_callout() will trigger packet queuing inside the worklist of netgraph since ng_callout() makes my node suddenly a WRITER node (therefore non reentrant) for the duration of the call. So as soon as the callout function returns, all following packets will get directly passed to the node again and when the ngintr thread gets executed then only then will I get the queued packets. This introduces out of order packets in the flow. I am using the current patch below to solve the issue and I am wondering if there is anything wrong with it (and maybe contribute back :): @@ -3632,7 +3632,7 @@ ng_callout(struct callout *c, node_p node, hook_p hook, int ticks, if ((item = ng_alloc_item(NGQF_FN, NG_NOFLAGS)) == NULL) return (ENOMEM); - item-el_flags |= NGQF_WRITER; + item-el_flags = NGQF_READER; NG_NODE_REF(node); /* and one for the item */ NGI_SET_NODE(item, node); if (hook) { Best regards, Karim. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00
On Fri, Apr 11, 2014 at 4:16 PM, hiren panchasara hiren.panchas...@gmail.com wrote: On Fri, Apr 11, 2014 at 4:15 AM, Eggert, Lars l...@netapp.com wrote: Hi, since folks are playing with Midori's DCTCP patch, I wanted to make sure that you were also aware of the patches that Aris did for PRR and NewCWV... prr.patchnewcwv.patch Lars, There are no actual patches attached here. (Or the mailing-list dropped them.) Ah, my bad. I think you are referring to the patches in original email. I can see them. cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netisr observations
On Fri, Apr 11, 2014 at 11:30 AM, Patrick Kelsey kel...@ieee.org wrote: The output of netstat -Q shows IP dispatch is set to default, which is direct (NETISR_DISPATCH_DIRECT). That means each IP packet will be processed on the same CPU that the Ethernet processing for that packet was performed on, so CPU selection for IP packets will not be based on flowid. The output of netstat -Q shows Ethernet dispatch is set to direct (NETISR_DISPATCH_DIRECT if you wind up reading the code), so the Ethernet processing for each packet will take place on the same CPU that the driver receives that packet on. For the igb driver with queues autoconfigured and msix enabled, as the sysctl output shows you have, the driver will create a number of queues subject to device limitations, msix message limitations, and the number of CPUs in the system, establish a separate interrupt handler for each one, and bind each of those interrupt handlers to a separate CPU. It also creates a separate single-threaded taskqueue for each queue. Each queue interrupt handler sends work to its associated taskqueue when the interrupt fires. Those taskqueues are where the Ethernet packets are received and processed by the driver. The question is where those taskqueue threads will be run. I don't see anything in the driver that makes an attempt to bind those taskqueue threads to specific CPUs, so really the location of all of the packet processing is up to the scheduler (i.e., arbitrary). The summary is: 1. the hardware schedules each received packet to one of its queues and raises the interrupt for that queue 2. that queue interrupt is serviced on the same CPU all the time, which is different from the CPUs for all other queues on that interface 3. the interrupt handler notifies the corresponding task queue, which runs its task in a thread on whatever CPU the scheduler chooses 4. that task dispatches the packet for Ethernet processing via netisr, which processes it on whatever the current CPU is 5. Ethernet processing dispatches that packet for IP processing via netisr, which processes it on whatever the current CPU is I really appreciate you taking time and explaining this. Thank you. I am specially confused with ip Queued column from netstat -Q showing 203888563 only for cpu3. Does this mean that cpu3 queues everything and then distributes among other cpus? Where does this queuing on cpu3 happens out of 5 stages you mentioned above? This value gets populated in snwp-snw_queued field for each cpu inside sysctl_netisr_work(). You might want to try changing the default netisr dispatch policy to 'deferred' (sysctl net.isr.dispatch=deferred). If you do that, the Ethernet processing will still happen on an arbitrary CPU chosen by the scheduler, but the IP processing should then get mapped to a CPU based on the flowid assigned by the driver. Since igb assigns flowids based on received queue number, all IP (and above) processing for that packet should then be performed on the same CPU the queue interrupt was bound to. I will give this a try and see how things behave. I was also thinking about net.isr.bindthreads. netisr_start_swi() does intr_event_bind() if we have it bindthreads set to 1. What would that gain me, if anything? Would it stop moving intr{swi1: netisr 3} on to different cpus (as I am seeing in 'top' o/p) and bind it to a single cpu? I've came across a thread discussing some side-effects of this though: http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037597.html Thanks a ton, again. cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netisr observations
On Fri, Apr 11, 2014 at 8:23 PM, hiren panchasara hiren.panchas...@gmail.com wrote: On Fri, Apr 11, 2014 at 11:30 AM, Patrick Kelsey kel...@ieee.org wrote: The output of netstat -Q shows IP dispatch is set to default, which is direct (NETISR_DISPATCH_DIRECT). That means each IP packet will be processed on the same CPU that the Ethernet processing for that packet was performed on, so CPU selection for IP packets will not be based on flowid. The output of netstat -Q shows Ethernet dispatch is set to direct (NETISR_DISPATCH_DIRECT if you wind up reading the code), so the Ethernet processing for each packet will take place on the same CPU that the driver receives that packet on. For the igb driver with queues autoconfigured and msix enabled, as the sysctl output shows you have, the driver will create a number of queues subject to device limitations, msix message limitations, and the number of CPUs in the system, establish a separate interrupt handler for each one, and bind each of those interrupt handlers to a separate CPU. It also creates a separate single-threaded taskqueue for each queue. Each queue interrupt handler sends work to its associated taskqueue when the interrupt fires. Those taskqueues are where the Ethernet packets are received and processed by the driver. The question is where those taskqueue threads will be run. I don't see anything in the driver that makes an attempt to bind those taskqueue threads to specific CPUs, so really the location of all of the packet processing is up to the scheduler (i.e., arbitrary). The summary is: 1. the hardware schedules each received packet to one of its queues and raises the interrupt for that queue 2. that queue interrupt is serviced on the same CPU all the time, which is different from the CPUs for all other queues on that interface 3. the interrupt handler notifies the corresponding task queue, which runs its task in a thread on whatever CPU the scheduler chooses 4. that task dispatches the packet for Ethernet processing via netisr, which processes it on whatever the current CPU is 5. Ethernet processing dispatches that packet for IP processing via netisr, which processes it on whatever the current CPU is I really appreciate you taking time and explaining this. Thank you. Sure thing. I've had my head in the netisr code frequently lately, and it's nice to be able to share :) I am specially confused with ip Queued column from netstat -Q showing 203888563 only for cpu3. Does this mean that cpu3 queues everything and then distributes among other cpus? Where does this queuing on cpu3 happens out of 5 stages you mentioned above? This value gets populated in snwp-snw_queued field for each cpu inside sysctl_netisr_work(). The way your system is configured, all inbound packets are being direct-dispatched. Those packets will bump the dispatched and handled counters, but not the queued counter. The queued counter only gets bumped when something is queued to a netisr thread. You can figure out where that is happening, despite everything apparently being configured for direct dispatch, by looking at where netisr_queue() and netisr_queue_src() are being called from. netisr_queue() is called during ipv6 forwarding and output and ipv4 output when the destination is a local address, gre processing, route socket processing, if_simloop() (which is called to loop back multicast packets, for example)... netisr_queue_src() is called during ipsec and divert processing. One thing to consider also when thinking about what the netisr per-cpu counters represent is that netisr really maintains per-cpu workstream context, not per-netisr-thread. Direct-dispatched packets contribute to the statistics of the workstream context of whichever CPU they are being direct-dispatched on. Packets handled by a netisr thread contribute to the statistics of the workstream context of the CPU it was created for, whether or not it was bound to, or is currently running on, that CPU. So when you look at the statistics in netstat -Q output for CPU 3, dispatched is the number of packets direct-dispatched on CPU 3, queued is the number of packets queued to the netisr thread associated with CPU 3 (but that may be running all over the place if net.isr.bindthreads is 0), and handled is the number of packets processed directly on CPU 3 or in the netisr thread associated with CPU3. You might want to try changing the default netisr dispatch policy to 'deferred' (sysctl net.isr.dispatch=deferred). If you do that, the Ethernet processing will still happen on an arbitrary CPU chosen by the scheduler, but the IP processing should then get mapped to a CPU based on the flowid assigned by the driver. Since igb assigns flowids based on received queue number, all IP (and above) processing for that packet should then be performed on the same CPU the queue interrupt was bound to. I