Re: Spreading NIC interrupts across multiple CPUs

2014-03-28 Thread Aaron Seelye

On 3/26/2014 10:30 PM, Stan Hoeppner wrote:

This is an 8 core machine with HT enabled, 16 logical CPUs, so right off
the bat it is dramatically different than the Compaq machine below as
far as the kernel is concerned and how scheduling is performed.  The
current mask may or may not be correct for this configuration.  I never
use HT and I can't find any docs about HT and /proc/irq/xx/smp_affinity.


Agreed on finding the docs, it was nigh impossible.  I found a way to 
offload the traffic for that server, made a few changes to the BIOS 
(c-states, HT, etc), and booted it back up.  Didn't seem to change much 
on the spreading, but that's fine.



And to this point, it's not usually a good idea to spread interrupts
round robin from any device evenly across all cores in a system.  This
is inefficient as each core must load the ISR for every interrupt.  This
decreases the effectiveness of L1/L2 caches on all cores, causing
additional cache misses for other processes executing on those cores.
This is precisely why irqbalance was created.


A couple things on this, I did see what you're talking about WRT 
spreading the interrupts about the processors.  However, I did notice 
one thing, irqbalance is set to specifically exempt ethernet/network 
interfaces from its balancing.  I'm not sure if it's to make sure what I 
was seeing with the HP system doesn't inadvertently happen, or to make 
sure the queues all stay on the same processor.  This would lead me to 
my next question, in the case of a NIC with multiple queues, should all 
queues for a given interface be on a single CPU (actual cpu, not HT)? 
(answered next paragraph)





However, the Dell is using
CPU0 exclusively for the ethernet device interrupts, while the HP
spreads them pretty evenly.


This could be as simple at HT being enabled on the Dell.  If not, the
contents of your /proc/interrupts files should help me narrow this down
for you.


Unfortunately it didn't change anything on the Dell, no idea why.  Could 
be as simple as the driver differences for the 5708 and 5709.


Looking at 
https://we.riseup.net/riseup+tech/balancing-hardware-interrupts and more 
specifically 
http://www.alexonlinux.com/msi-x-the-right-way-to-spread-interrupt-load, 
it looks like the queues enabled on the 5709 (which is on the Dell) 
would enable me to manually balance the queues across multiple cores 
without problems.  I'd been under the impression that MSI-X was what was 
to blame for the HP spreading things about, but I see that's not the case.


So far, under one day including a typical peak load, it looks like this 
was rather successful, as I hit normal traffic patterns without dropping 
any outbound packets.




For future reference, kernel scheduler problems such as this should be
posted on LKML, not a distro list, no matter which distro you use.
There are very few people on debian-user or any of the distro general
help lists with significant knowledge of the kernel, let alone the
scheduler.  You typically get help with this kind of thing much faster,
and with more thorough knowledge transfer on LKML.


Will do.  I'm sorry, but I thought this would have been a pretty 
standard question for anyone operating in a production environment where 
100k pps is typical (at least, that's what set it off for me).  Either 
way, I've definitely learned a lot more about this sort of thing and 
have a solution that seems to be working well without any real hocus 
pocus going on.  Thank you for steering me in the right direction.


-Aaron


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/5335b242.9060...@eltopia.com



Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Aaron Seelye
I have a question regarding interrupt balancing for a NIC across CPUs. 
I have a Dell R710 (dual quad core) with embedded broadcom 5709 that 
seems to put everything on the CPU0.  I even threw an Intel Pro/1000 PT 
in the Dell, but this is showing the same problem.


For a test system, I have an HP DL360-G5 (also dual quad core) with 
embedded broadcom 5708 that balances across all cores.  I've also thrown 
in an identical Intel NIC, and it seems to balance across the cores 
properly.  This leads me to believe that there's something wrong with my 
BIOS setup, or there's something inherently wrong with the R710, though 
I'm leading towards the former, as I'm seeing this on two R710s, and 
doubt I'd hit a magic breakage across two chassis.


Also, this is with no massaging on my part, both running up to date 
debian wheezy 7.4, with the Dell being installed originally with 7.1


My question is this, what option(s) could be present with the R710 bios 
that would cause something like this to happen?  If not the bios, 
where/what else should I look at?


Thanks,

-Aaron


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/53331c52.3060...@eltopia.com



Re: Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Mr Queue
On Wed, 26 Mar 2014 11:28:34 -0700
Aaron Seelye aseelye-li...@eltopia.com wrote:

 My question is this, what option(s) could be present with the R710 bios 
 that would cause something like this to happen?  If not the bios, 
 where/what else should I look at?

You don't have irqbalance running by chance do you? Because this sounds exactly 
what it's designed to do.

https://github.com/Irqbalance/irqbalance

Irqbalance is a daemon to help balance the cpu load generated by interrupts
across all of a systems cpus.  Irqbalance identifies the highest volume
interrupt sources, and isolates them to a single unique cpu, so that load is
spread as much as possible over an entire processor set, while minimizing cache
hit rates for irq handlers.

-- 
It is wrong always, everywhere and for everyone to believe anything upon
insufficient evidence.
- W. K. Clifford, British philosopher, circa 1876


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140326140933.175aa...@mrqueue.com



Re: Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Aaron Seelye
I don't on either the Dell or HP.  I tried it on the Dells, but it 
didn't do anything on one, and just moved the interrupts from CPU0 to 
CPU1 on the other.


On the HP that is balancing perfectly, I don't have the irqbalance 
package installed, it just worked from the get-go.


-Aaron

On 3/26/2014 12:09 PM, Mr Queue wrote:

On Wed, 26 Mar 2014 11:28:34 -0700
Aaron Seelye aseelye-li...@eltopia.com wrote:


My question is this, what option(s) could be present with the R710 bios
that would cause something like this to happen?  If not the bios,
where/what else should I look at?


You don't have irqbalance running by chance do you? Because this sounds exactly 
what it's designed to do.

https://github.com/Irqbalance/irqbalance

Irqbalance is a daemon to help balance the cpu load generated by interrupts
across all of a systems cpus.  Irqbalance identifies the highest volume
interrupt sources, and isolates them to a single unique cpu, so that load is
spread as much as possible over an entire processor set, while minimizing cache
hit rates for irq handlers.




--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/53334186.2060...@eltopia.com



Re: Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Stan Hoeppner
On 3/26/2014 1:28 PM, Aaron Seelye wrote:
 I have a question regarding interrupt balancing for a NIC across CPUs. I
 have a Dell R710 (dual quad core) with embedded broadcom 5709 that seems
 to put everything on the CPU0.  I even threw an Intel Pro/1000 PT in the
 Dell, but this is showing the same problem.
 
 For a test system, I have an HP DL360-G5 (also dual quad core) with
 embedded broadcom 5708 that balances across all cores.  I've also thrown
 in an identical Intel NIC, and it seems to balance across the cores
 properly.  This leads me to believe that there's something wrong with my
 BIOS setup, or there's something inherently wrong with the R710, though
 I'm leading towards the former, as I'm seeing this on two R710s, and
 doubt I'd hit a magic breakage across two chassis.
 
 Also, this is with no massaging on my part, both running up to date
 debian wheezy 7.4, with the Dell being installed originally with 7.1
 
 My question is this, what option(s) could be present with the R710 bios
 that would cause something like this to happen?  If not the bios,
 where/what else should I look at?

Please read this for educational background, especially the Note at the
bottom of the page.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html

Then ask an intelligent question about IRQ balancing and steering, WRT
the two specific and different hardware systems, and Debian kernel
versions, being used on each.


Cheers,

Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/53334a40.8020...@hardwarefreak.com



Re: Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Aaron Seelye

On 3/26/2014 2:44 PM, Stan Hoeppner wrote:


Please read this for educational background, especially the Note at the
bottom of the page.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html

Then ask an intelligent question about IRQ balancing and steering, WRT
the two specific and different hardware systems, and Debian kernel
versions, being used on each.


I'd seen other things similar to that, however, it doesn't seem to get 
me any closer to the solution.


The output from one of the Dell (not balanced) systems:

root@conf-2:~# uname -a
Linux conf-2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
root@conf-2:~# grep eth /proc/interrupts
  79:  704642666  0  0  0  0  0 
 0  0  0  0  0  0 
0  0  0  0   PCI-MSI-edge  eth0

root@conf-2:~# cat /proc/irq/79/smp_affinity

root@conf-2:~# cat /proc/irq/79/smp_affinity_list
0-15

The output from the HP (balanced) system:

root@deb-test:~# grep eth /proc/interrupts
  68:   4251   4190   4212   4264   4226   4257 
  4251   4214   PCI-MSI-edge  eth0

root@deb-test:~# cat /proc/irq/68/smp_affinity
ff
root@deb-test:~# cat /proc/irq/68/smp_affinity_list
0-7


As you can see, both systems are running identical kernels, and both 
have affinity set to spread across all CPUs.  However, the Dell is using 
CPU0 exclusively for the ethernet device interrupts, while the HP 
spreads them pretty evenly.


Thanks,

-Aaron


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/5333534a.1070...@eltopia.com



Re: Spreading NIC interrupts across multiple CPUs

2014-03-26 Thread Stan Hoeppner
On 3/26/2014 5:23 PM, Aaron Seelye wrote:
 On 3/26/2014 2:44 PM, Stan Hoeppner wrote:

 Please read this for educational background, especially the Note at the
 bottom of the page.

 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html


 Then ask an intelligent question about IRQ balancing and steering, WRT
 the two specific and different hardware systems, and Debian kernel
 versions, being used on each.
 
 I'd seen other things similar to that, however, it doesn't seem to get
 me any closer to the solution.

Please post the full output of cat /proc/interrupts without line wrapping.

 The output from one of the Dell (not balanced) systems:
 
 root@conf-2:~# uname -a
 Linux conf-2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
 root@conf-2:~# grep eth /proc/interrupts
   79:  704642666  0  0  0  0  0
  0  0  0  0  0  0
 0  0  0  0   PCI-MSI-edge  eth0
 root@conf-2:~# cat /proc/irq/79/smp_affinity
 
 root@conf-2:~# cat /proc/irq/79/smp_affinity_list
 0-15

This is an 8 core machine with HT enabled, 16 logical CPUs, so right off
the bat it is dramatically different than the Compaq machine below as
far as the kernel is concerned and how scheduling is performed.  The
current mask may or may not be correct for this configuration.  I never
use HT and I can't find any docs about HT and /proc/irq/xx/smp_affinity.

If this is a production machine and you can't easily reboot it to
disable HT, first try a mask that includes only the physical CPUs and
not the logical:

~# echo ff  /proc/irq/79/smp_affinity

This should schedule IRQs only on the 1st logical processor (physical
CPU) of each core.  If that doesn't do the trick reboot the box and
disable HT.  If that doesn't do it I'll dig further into the scheduler
to figure out what's going on.

 The output from the HP (balanced) system:
 
 root@deb-test:~# grep eth /proc/interrupts
   68:   4251   4190   4212   4264   4226   4257
   4251   4214   PCI-MSI-edge  eth0
 root@deb-test:~# cat /proc/irq/68/smp_affinity
 ff
 root@deb-test:~# cat /proc/irq/68/smp_affinity_list
 0-7

This is an 8 core machine without HyperThreading.  The mask is correct
for 8 physical CPUs.  Oddly though, one box outputs the leading zeros of
the mask while the other does not.  Or did you mung either output?

 As you can see, both systems are running identical kernels, and both
 have affinity set to spread across all CPUs.  

The latter may not be a correct statement, as HT logical processors are
not CPUs.  Also, the smp_affinity mask on the Dell implies 32
processors.  Many, but not all, of the functional units are duplicated.
 Just as you do not want to schedule two compute intensive tasks to both
logical processors on a core leaving the other cores idle, you also do
not want to assign assign any interrupts to the 2nd logical processor in
a given core.  All this does is pile up context and state switches on
said core.  The net effect is decreasing the overall work that can be
performed.

And to this point, it's not usually a good idea to spread interrupts
round robin from any device evenly across all cores in a system.  This
is inefficient as each core must load the ISR for every interrupt.  This
decreases the effectiveness of L1/L2 caches on all cores, causing
additional cache misses for other processes executing on those cores.
This is precisely why irqbalance was created.

 However, the Dell is using
 CPU0 exclusively for the ethernet device interrupts, while the HP
 spreads them pretty evenly.

This could be as simple at HT being enabled on the Dell.  If not, the
contents of your /proc/interrupts files should help me narrow this down
for you.

For future reference, kernel scheduler problems such as this should be
posted on LKML, not a distro list, no matter which distro you use.
There are very few people on debian-user or any of the distro general
help lists with significant knowledge of the kernel, let alone the
scheduler.  You typically get help with this kind of thing much faster,
and with more thorough knowledge transfer on LKML.

Cheers,

Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5333b78c.9090...@hardwarefreak.com