Re: Spreading NIC interrupts across multiple CPUs
On 3/26/2014 10:30 PM, Stan Hoeppner wrote: This is an 8 core machine with HT enabled, 16 logical CPUs, so right off the bat it is dramatically different than the Compaq machine below as far as the kernel is concerned and how scheduling is performed. The current mask may or may not be correct for this configuration. I never use HT and I can't find any docs about HT and /proc/irq/xx/smp_affinity. Agreed on finding the docs, it was nigh impossible. I found a way to offload the traffic for that server, made a few changes to the BIOS (c-states, HT, etc), and booted it back up. Didn't seem to change much on the spreading, but that's fine. And to this point, it's not usually a good idea to spread interrupts round robin from any device evenly across all cores in a system. This is inefficient as each core must load the ISR for every interrupt. This decreases the effectiveness of L1/L2 caches on all cores, causing additional cache misses for other processes executing on those cores. This is precisely why irqbalance was created. A couple things on this, I did see what you're talking about WRT spreading the interrupts about the processors. However, I did notice one thing, irqbalance is set to specifically exempt ethernet/network interfaces from its balancing. I'm not sure if it's to make sure what I was seeing with the HP system doesn't inadvertently happen, or to make sure the queues all stay on the same processor. This would lead me to my next question, in the case of a NIC with multiple queues, should all queues for a given interface be on a single CPU (actual cpu, not HT)? (answered next paragraph) However, the Dell is using CPU0 exclusively for the ethernet device interrupts, while the HP spreads them pretty evenly. This could be as simple at HT being enabled on the Dell. If not, the contents of your /proc/interrupts files should help me narrow this down for you. Unfortunately it didn't change anything on the Dell, no idea why. Could be as simple as the driver differences for the 5708 and 5709. Looking at https://we.riseup.net/riseup+tech/balancing-hardware-interrupts and more specifically http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load, it looks like the queues enabled on the 5709 (which is on the Dell) would enable me to manually balance the queues across multiple cores without problems. I'd been under the impression that MSI-X was what was to blame for the HP spreading things about, but I see that's not the case. So far, under one day including a typical peak load, it looks like this was rather successful, as I hit normal traffic patterns without dropping any outbound packets. For future reference, kernel scheduler problems such as this should be posted on LKML, not a distro list, no matter which distro you use. There are very few people on debian-user or any of the distro general help lists with significant knowledge of the kernel, let alone the scheduler. You typically get help with this kind of thing much faster, and with more thorough knowledge transfer on LKML. Will do. I'm sorry, but I thought this would have been a pretty standard question for anyone operating in a production environment where 100k pps is typical (at least, that's what set it off for me). Either way, I've definitely learned a lot more about this sort of thing and have a solution that seems to be working well without any real hocus pocus going on. Thank you for steering me in the right direction. -Aaron -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/5335b242.9060...@eltopia.com
Spreading NIC interrupts across multiple CPUs
I have a question regarding interrupt balancing for a NIC across CPUs. I have a Dell R710 (dual quad core) with embedded broadcom 5709 that seems to put everything on the CPU0. I even threw an Intel Pro/1000 PT in the Dell, but this is showing the same problem. For a test system, I have an HP DL360-G5 (also dual quad core) with embedded broadcom 5708 that balances across all cores. I've also thrown in an identical Intel NIC, and it seems to balance across the cores properly. This leads me to believe that there's something wrong with my BIOS setup, or there's something inherently wrong with the R710, though I'm leading towards the former, as I'm seeing this on two R710s, and doubt I'd hit a magic breakage across two chassis. Also, this is with no massaging on my part, both running up to date debian wheezy 7.4, with the Dell being installed originally with 7.1 My question is this, what option(s) could be present with the R710 bios that would cause something like this to happen? If not the bios, where/what else should I look at? Thanks, -Aaron -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/53331c52.3060...@eltopia.com
Re: Spreading NIC interrupts across multiple CPUs
On Wed, 26 Mar 2014 11:28:34 -0700 Aaron Seelye aseelye-li...@eltopia.com wrote: My question is this, what option(s) could be present with the R710 bios that would cause something like this to happen? If not the bios, where/what else should I look at? You don't have irqbalance running by chance do you? Because this sounds exactly what it's designed to do. https://github.com/Irqbalance/irqbalance Irqbalance is a daemon to help balance the cpu load generated by interrupts across all of a systems cpus. Irqbalance identifies the highest volume interrupt sources, and isolates them to a single unique cpu, so that load is spread as much as possible over an entire processor set, while minimizing cache hit rates for irq handlers. -- It is wrong always, everywhere and for everyone to believe anything upon insufficient evidence. - W. K. Clifford, British philosopher, circa 1876 -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140326140933.175aa...@mrqueue.com
Re: Spreading NIC interrupts across multiple CPUs
I don't on either the Dell or HP. I tried it on the Dells, but it didn't do anything on one, and just moved the interrupts from CPU0 to CPU1 on the other. On the HP that is balancing perfectly, I don't have the irqbalance package installed, it just worked from the get-go. -Aaron On 3/26/2014 12:09 PM, Mr Queue wrote: On Wed, 26 Mar 2014 11:28:34 -0700 Aaron Seelye aseelye-li...@eltopia.com wrote: My question is this, what option(s) could be present with the R710 bios that would cause something like this to happen? If not the bios, where/what else should I look at? You don't have irqbalance running by chance do you? Because this sounds exactly what it's designed to do. https://github.com/Irqbalance/irqbalance Irqbalance is a daemon to help balance the cpu load generated by interrupts across all of a systems cpus. Irqbalance identifies the highest volume interrupt sources, and isolates them to a single unique cpu, so that load is spread as much as possible over an entire processor set, while minimizing cache hit rates for irq handlers. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/53334186.2060...@eltopia.com
Re: Spreading NIC interrupts across multiple CPUs
On 3/26/2014 1:28 PM, Aaron Seelye wrote: I have a question regarding interrupt balancing for a NIC across CPUs. I have a Dell R710 (dual quad core) with embedded broadcom 5709 that seems to put everything on the CPU0. I even threw an Intel Pro/1000 PT in the Dell, but this is showing the same problem. For a test system, I have an HP DL360-G5 (also dual quad core) with embedded broadcom 5708 that balances across all cores. I've also thrown in an identical Intel NIC, and it seems to balance across the cores properly. This leads me to believe that there's something wrong with my BIOS setup, or there's something inherently wrong with the R710, though I'm leading towards the former, as I'm seeing this on two R710s, and doubt I'd hit a magic breakage across two chassis. Also, this is with no massaging on my part, both running up to date debian wheezy 7.4, with the Dell being installed originally with 7.1 My question is this, what option(s) could be present with the R710 bios that would cause something like this to happen? If not the bios, where/what else should I look at? Please read this for educational background, especially the Note at the bottom of the page. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html Then ask an intelligent question about IRQ balancing and steering, WRT the two specific and different hardware systems, and Debian kernel versions, being used on each. Cheers, Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/53334a40.8020...@hardwarefreak.com
Re: Spreading NIC interrupts across multiple CPUs
On 3/26/2014 2:44 PM, Stan Hoeppner wrote: Please read this for educational background, especially the Note at the bottom of the page. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html Then ask an intelligent question about IRQ balancing and steering, WRT the two specific and different hardware systems, and Debian kernel versions, being used on each. I'd seen other things similar to that, however, it doesn't seem to get me any closer to the solution. The output from one of the Dell (not balanced) systems: root@conf-2:~# uname -a Linux conf-2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux root@conf-2:~# grep eth /proc/interrupts 79: 704642666 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0 root@conf-2:~# cat /proc/irq/79/smp_affinity root@conf-2:~# cat /proc/irq/79/smp_affinity_list 0-15 The output from the HP (balanced) system: root@deb-test:~# grep eth /proc/interrupts 68: 4251 4190 4212 4264 4226 4257 4251 4214 PCI-MSI-edge eth0 root@deb-test:~# cat /proc/irq/68/smp_affinity ff root@deb-test:~# cat /proc/irq/68/smp_affinity_list 0-7 As you can see, both systems are running identical kernels, and both have affinity set to spread across all CPUs. However, the Dell is using CPU0 exclusively for the ethernet device interrupts, while the HP spreads them pretty evenly. Thanks, -Aaron -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/5333534a.1070...@eltopia.com
Re: Spreading NIC interrupts across multiple CPUs
On 3/26/2014 5:23 PM, Aaron Seelye wrote: On 3/26/2014 2:44 PM, Stan Hoeppner wrote: Please read this for educational background, especially the Note at the bottom of the page. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html Then ask an intelligent question about IRQ balancing and steering, WRT the two specific and different hardware systems, and Debian kernel versions, being used on each. I'd seen other things similar to that, however, it doesn't seem to get me any closer to the solution. Please post the full output of cat /proc/interrupts without line wrapping. The output from one of the Dell (not balanced) systems: root@conf-2:~# uname -a Linux conf-2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux root@conf-2:~# grep eth /proc/interrupts 79: 704642666 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0 root@conf-2:~# cat /proc/irq/79/smp_affinity root@conf-2:~# cat /proc/irq/79/smp_affinity_list 0-15 This is an 8 core machine with HT enabled, 16 logical CPUs, so right off the bat it is dramatically different than the Compaq machine below as far as the kernel is concerned and how scheduling is performed. The current mask may or may not be correct for this configuration. I never use HT and I can't find any docs about HT and /proc/irq/xx/smp_affinity. If this is a production machine and you can't easily reboot it to disable HT, first try a mask that includes only the physical CPUs and not the logical: ~# echo ff /proc/irq/79/smp_affinity This should schedule IRQs only on the 1st logical processor (physical CPU) of each core. If that doesn't do the trick reboot the box and disable HT. If that doesn't do it I'll dig further into the scheduler to figure out what's going on. The output from the HP (balanced) system: root@deb-test:~# grep eth /proc/interrupts 68: 4251 4190 4212 4264 4226 4257 4251 4214 PCI-MSI-edge eth0 root@deb-test:~# cat /proc/irq/68/smp_affinity ff root@deb-test:~# cat /proc/irq/68/smp_affinity_list 0-7 This is an 8 core machine without HyperThreading. The mask is correct for 8 physical CPUs. Oddly though, one box outputs the leading zeros of the mask while the other does not. Or did you mung either output? As you can see, both systems are running identical kernels, and both have affinity set to spread across all CPUs. The latter may not be a correct statement, as HT logical processors are not CPUs. Also, the smp_affinity mask on the Dell implies 32 processors. Many, but not all, of the functional units are duplicated. Just as you do not want to schedule two compute intensive tasks to both logical processors on a core leaving the other cores idle, you also do not want to assign assign any interrupts to the 2nd logical processor in a given core. All this does is pile up context and state switches on said core. The net effect is decreasing the overall work that can be performed. And to this point, it's not usually a good idea to spread interrupts round robin from any device evenly across all cores in a system. This is inefficient as each core must load the ISR for every interrupt. This decreases the effectiveness of L1/L2 caches on all cores, causing additional cache misses for other processes executing on those cores. This is precisely why irqbalance was created. However, the Dell is using CPU0 exclusively for the ethernet device interrupts, while the HP spreads them pretty evenly. This could be as simple at HT being enabled on the Dell. If not, the contents of your /proc/interrupts files should help me narrow this down for you. For future reference, kernel scheduler problems such as this should be posted on LKML, not a distro list, no matter which distro you use. There are very few people on debian-user or any of the distro general help lists with significant knowledge of the kernel, let alone the scheduler. You typically get help with this kind of thing much faster, and with more thorough knowledge transfer on LKML. Cheers, Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/5333b78c.9090...@hardwarefreak.com