Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-06 Thread Christian Borntraeger
On 02/03/2017 07:19 AM, Ben Serebrin wrote:
[...]
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info 
> *vi)
>* queue pairs, we let the queue pairs to be private to one cpu by
>* setting the affinity hint to eliminate the contention.
>*/
> - if (vi->curr_queue_pairs == 1 ||
> - vi->max_queue_pairs != num_online_cpus()) {
> + if (vi->curr_queue_pairs == 1) {
>   virtnet_clean_affinity(vi, -1);
>   return;
>   }
> 
> + /* If there are more cpus than queues, then assign the queues'
> +  * interrupts to the first cpus until we run out.
> +  */
>   i = 0;
>   for_each_online_cpu(cpu) {
> + if (i == vi->max_queue_pairs)
> + break;
>   virtqueue_set_affinity(vi->rq[i].vq, cpu);
>   virtqueue_set_affinity(vi->sq[i].vq, cpu);
> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
>   i++;
>   }
> 
> + /* Stripe the XPS affinities across the online CPUs.
> +  * Hyperthread pairs are typically assigned such that Linux's
> +  * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
> +  * hyperthread twins to share TX queues, in the case where there are
> +  * more cpus than queues.
> +  */

This is not always true. E.g. on s390 the SMT threads are usually paired 
even/odd.

e.g.

[cborntra@s38lp08 linux]$ lscpu -e
CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS
0   000  00:0:0:0 yesyeshorizontal   0
1   000  01:1:1:1 yesyeshorizontal   1
2   000  12:2:2:2 yesyeshorizontal   2
3   000  13:3:3:3 yesyeshorizontal   3
4   000  24:4:4:4 yesyeshorizontal   4
5   000  25:5:5:5 yesyeshorizontal   5
6   000  36:6:6:6 yesyeshorizontal   6

This does not matter yet for s390 (as virtio is usally doen via the ccw bus)
but maybe we should consider an future patch to provide some arch-specific
striping hints.

Or would it make sense to change the s390 layout for SMT twins because there
is more code that expects all threads 0 at the front and all threads 1 at the
end?


> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + struct cpumask mask;
> + int skip = i;
> +
> + cpumask_clear();
> + for_each_online_cpu(cpu) {
> + while (skip--)
> + cpu = cpumask_next(cpu, cpu_online_mask);
> + if (cpu < num_possible_cpus())
> + cpumask_set_cpu(cpu, );
> + skip = vi->max_queue_pairs - 1;
> + }
> + netif_set_xps_queue(vi->dev, , i);
> + }
> +
>   vi->affinity_hint_set = true;
>  }
> 
> 



Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-05 Thread Jason Wang



On 2017年02月06日 15:28, Benjamin Serebrin wrote:

On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang  wrote:

On 2017年02月03日 14:19, Ben Serebrin wrote:

From: Benjamin Serebrin

If the number of virtio queue pairs is not equal to the
number of VCPUs, the virtio guest driver doesn't assign
any CPU affinity for the queue interrupts or the xps
aggregation interrupt.

So this in fact is not a affinity fixing for #cpus > 32 but adding  affinity
for #cpus != #queue pairs.

Fair enough.  I'll adjust the title line in the subsequent version.



Google Compute Engine currently provides 1 queue pair for
every VCPU, but limits that at a maximum of 32 queue pairs.

This code assigns interrupt affinity even when there are more than
32 VCPUs.

Tested:

(on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)

Without the fix we see all queues affinitized to all CPUs:

[...]


   + /* If there are more cpus than queues, then assign the queues'
+* interrupts to the first cpus until we run out.
+*/
 i = 0;
 for_each_online_cpu(cpu) {
+   if (i == vi->max_queue_pairs)
+   break;
 virtqueue_set_affinity(vi->rq[i].vq, cpu);
 virtqueue_set_affinity(vi->sq[i].vq, cpu);
-   netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
 i++;
 }
   + /* Stripe the XPS affinities across the online CPUs.
+* Hyperthread pairs are typically assigned such that Linux's
+* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
+* hyperthread twins to share TX queues, in the case where there
are
+* more cpus than queues.

Since we use combined queue pairs, why not use the same policy for RX?

XPS is for transmit only.




Yes, but I mean, e.g consider you let hyperthread twins to share TX 
queues (XPS), why not share TX and RX queue interrupts (affinity)?


Thanks


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-05 Thread Benjamin Serebrin
On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang  wrote:
>
>
> On 2017年02月03日 14:19, Ben Serebrin wrote:
>>
>> From: Benjamin Serebrin 
>>
>> If the number of virtio queue pairs is not equal to the
>> number of VCPUs, the virtio guest driver doesn't assign
>> any CPU affinity for the queue interrupts or the xps
>> aggregation interrupt.
>
>
> So this in fact is not a affinity fixing for #cpus > 32 but adding  affinity
> for #cpus != #queue pairs.

Fair enough.  I'll adjust the title line in the subsequent version.


>
>> Google Compute Engine currently provides 1 queue pair for
>> every VCPU, but limits that at a maximum of 32 queue pairs.
>>
>> This code assigns interrupt affinity even when there are more than
>> 32 VCPUs.
>>
>> Tested:
>>
>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
>>
>> Without the fix we see all queues affinitized to all CPUs:
>
>
> [...]
>
>>   + /* If there are more cpus than queues, then assign the queues'
>> +* interrupts to the first cpus until we run out.
>> +*/
>> i = 0;
>> for_each_online_cpu(cpu) {
>> +   if (i == vi->max_queue_pairs)
>> +   break;
>> virtqueue_set_affinity(vi->rq[i].vq, cpu);
>> virtqueue_set_affinity(vi->sq[i].vq, cpu);
>> -   netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
>> i++;
>> }
>>   + /* Stripe the XPS affinities across the online CPUs.
>> +* Hyperthread pairs are typically assigned such that Linux's
>> +* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
>> +* hyperthread twins to share TX queues, in the case where there
>> are
>> +* more cpus than queues.
>
>
> Since we use combined queue pairs, why not use the same policy for RX?

XPS is for transmit only.


> Thanks
>
>
>> +*/
>> +   for (i = 0; i < vi->max_queue_pairs; i++) {
>> +   struct cpumask mask;
>> +   int skip = i;
>> +
>> +   cpumask_clear();
>> +   for_each_online_cpu(cpu) {
>> +   while (skip--)
>> +   cpu = cpumask_next(cpu, cpu_online_mask);
>> +   if (cpu < num_possible_cpus())
>> +   cpumask_set_cpu(cpu, );
>> +   skip = vi->max_queue_pairs - 1;
>> +   }
>> +   netif_set_xps_queue(vi->dev, , i);
>> +   }
>> +
>> vi->affinity_hint_set = true;
>>   }
>>
>
>


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-05 Thread Jason Wang



On 2017年02月03日 14:19, Ben Serebrin wrote:

From: Benjamin Serebrin 

If the number of virtio queue pairs is not equal to the
number of VCPUs, the virtio guest driver doesn't assign
any CPU affinity for the queue interrupts or the xps
aggregation interrupt.


So this in fact is not a affinity fixing for #cpus > 32 but adding  
affinity for #cpus != #queue pairs.



Google Compute Engine currently provides 1 queue pair for
every VCPU, but limits that at a maximum of 32 queue pairs.

This code assigns interrupt affinity even when there are more than
32 VCPUs.

Tested:

(on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)

Without the fix we see all queues affinitized to all CPUs:


[...]

  
+	/* If there are more cpus than queues, then assign the queues'

+* interrupts to the first cpus until we run out.
+*/
i = 0;
for_each_online_cpu(cpu) {
+   if (i == vi->max_queue_pairs)
+   break;
virtqueue_set_affinity(vi->rq[i].vq, cpu);
virtqueue_set_affinity(vi->sq[i].vq, cpu);
-   netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
i++;
}
  
+	/* Stripe the XPS affinities across the online CPUs.

+* Hyperthread pairs are typically assigned such that Linux's
+* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
+* hyperthread twins to share TX queues, in the case where there are
+* more cpus than queues.


Since we use combined queue pairs, why not use the same policy for RX?

Thanks


+*/
+   for (i = 0; i < vi->max_queue_pairs; i++) {
+   struct cpumask mask;
+   int skip = i;
+
+   cpumask_clear();
+   for_each_online_cpu(cpu) {
+   while (skip--)
+   cpu = cpumask_next(cpu, cpu_online_mask);
+   if (cpu < num_possible_cpus())
+   cpumask_set_cpu(cpu, );
+   skip = vi->max_queue_pairs - 1;
+   }
+   netif_set_xps_queue(vi->dev, , i);
+   }
+
vi->affinity_hint_set = true;
  }
  





Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Willem de Bruijn
On Fri, Feb 3, 2017 at 1:34 PM, Rick Jones  wrote:
> On 02/03/2017 10:31 AM, Willem de Bruijn wrote:
>>
>> Configuring interrupts and xps from userspace at boot is more robust,
>> as device driver defaults can change. But especially for customers who
>> are unaware of these settings, choosing sane defaults won't hurt.
>
>
> The devil is in finding the sane defaults.

Agreed, of course.

>  For example, the issues we've
> seen with VMs sending traffic getting reordered when the driver took it upon
> itself to enable xps.

That particular reordering issue [1] was with xps on the host, where vm thread
migration causes flows to exit on different tx queues because ooo_okay is
not supported for flows from tap devices.

Within a guest, on the other hand, xps settings should not cause reordering of
packets from the same flow, as ooo_okay is active for tcp flows.

[1] https://patchwork.ozlabs.org/patch/662913/


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Rick Jones

On 02/03/2017 10:31 AM, Willem de Bruijn wrote:

Configuring interrupts and xps from userspace at boot is more robust,
as device driver defaults can change. But especially for customers who
are unaware of these settings, choosing sane defaults won't hurt.


The devil is in finding the sane defaults.  For example, the issues 
we've seen with VMs sending traffic getting reordered when the driver 
took it upon itself to enable xps.


rick jones


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Rick Jones

On 02/03/2017 10:22 AM, Benjamin Serebrin wrote:

Thanks, Michael, I'll put this text in the commit log:

XPS settings aren't write-able from userspace, so the only way I know
to fix XPS is in the driver.


??

root@np-cp1-c0-m1-mgmt:/home/stack# cat 
/sys/devices/pci:00/:00:02.0/:04:00.0/net/hed0/queues/tx-0/xps_cpus

,0001
root@np-cp1-c0-m1-mgmt:/home/stack# echo 0 > 
/sys/devices/pci:00/:00:02.0/:04:00.0/net/hed0/queues/tx-0/xps_cpus
root@np-cp1-c0-m1-mgmt:/home/stack# cat 
/sys/devices/pci:00/:00:02.0/:04:00.0/net/hed0/queues/tx-0/xps_cpus

,



Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Willem de Bruijn
> XPS settings aren't write-able from userspace, so the only way I know
> to fix XPS is in the driver.

It is writable by privileged users.

>> I wonder why not just do it in userspace though.

Configuring interrupts and xps from userspace at boot is more robust,
as device driver defaults can change. But especially for customers who
are unaware of these settings, choosing sane defaults won't hurt.

>> It would be nice to mention this in the commit log.
>> Are we sure this distribution is best for all workloads?

Overall, this minimizes contention between cores, so it should be
broadly preferable.


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Benjamin Serebrin
Thanks, Michael, I'll put this text in the commit log:

XPS settings aren't write-able from userspace, so the only way I know
to fix XPS is in the driver.

The interrupt affinity settings could be set by irqbalancer, yes, but
it's not always enabled
(by intention) in all guests.  I just confirmed that on an unpatched
64VCPU VM with no irqbalancer,
all virtio-net interrupts end up on core 0, and on a patched 64VCPU,
interrupts end up on the core whose
number matches the queue number.  This patch is just extending the
existing behavior in the interrupt affinity
assignment loop, and I think it stands on its own.

It actually sparked some internal discussion about further
intelligence in XPS and interrupt affinity, and
we'll do some experiments and come back with further patches.

On Fri, Feb 3, 2017 at 7:07 AM, Michael S. Tsirkin  wrote:
> On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote:
>> From: Benjamin Serebrin 
>>
>> If the number of virtio queue pairs is not equal to the
>> number of VCPUs, the virtio guest driver doesn't assign
>> any CPU affinity for the queue interrupts or the xps
>> aggregation interrupt.
>>
>> Google Compute Engine currently provides 1 queue pair for
>> every VCPU, but limits that at a maximum of 32 queue pairs.
>>
>> This code assigns interrupt affinity even when there are more than
>> 32 VCPUs.
>>
>> Tested:
>>
>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
>>
>> Without the fix we see all queues affinitized to all CPUs:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-63
>> [...]
>> 0-63
>>
>> and we see all TX queues' xps_cpus affinitzed to no cores:
>>
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
>> ,
>> [...]
>> ,
>>
>> With the fix, we see each queue assigned to the a single core,
>> and xps affinity set to 1 unique cpu per TX queue.
>>
>> 64 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>>
>> 0-63
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29
>> 29
>> 30
>> 30
>> 31
>> 31
>> 0-63
>> 0-63
>> 0-63
>> 0-63
>>
>> cd /sys/class/net/eth0/queues
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
>>
>> 0001,0001
>> 0002,0002
>> 0004,0004
>> 0008,0008
>> 0010,0010
>> 0020,0020
>> 0040,0040
>> 0080,0080
>> 0100,0100
>> 0200,0200
>> 0400,0400
>> 0800,0800
>> 1000,1000
>> 2000,2000
>> 4000,4000
>> 8000,8000
>> 0001,0001
>> 0002,0002
>> 0004,0004
>> 0008,0008
>> 0010,0010
>> 0020,0020
>> 0040,0040
>> 0080,0080
>> 0100,0100
>> 0200,0200
>> 0400,0400
>> 0800,0800
>> 1000,1000
>> 2000,2000
>> 4000,4000
>> 8000,8000
>>
>> 48 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-47
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29
>> 29
>> 30
>> 30
>> 31
>> 31
>> 0-47
>> 0-47
>> 0-47
>> 0-47
>>
>> cd /sys/class/net/eth0/queues
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
>>
>> 0001,0001
>> 0002,0002
>> 0004,0004
>> 0008,0008
>> 0010,0010
>> 0020,0020
>> 0040,0040
>> 0080,0080
>> 0100,0100
>> 0200,0200
>> 0400,0400
>> 0800,0800
>> 1000,1000
>> 2000,2000
>> 4000,4000
>> 8000,8000
>> ,0001
>> ,0002
>> ,0004
>> ,0008
>> ,0010
>> ,0020
>> ,0040
>> ,0080
>> ,0100
>> ,0200
>> ,0400
>> ,0800
>> ,1000
>> ,2000
>> ,4000
>> ,8000
>>
>> Signed-off-by: Ben Serebrin 
>> Acked-by: Willem de Bruijn 
>> Acked-by: Jim Mattson 
>> Acked-by: Venkatesh Srinivas 
>
> I wonder why not just do it in userspace though.
> It would be nice to mention this in the commit log.
> Are we sure this distribution is best for all workloads?
> While irqbalancer is hardly a perfect oracle it does
> manage to balance the load somewhat, and IIUC kernel
> affinities would break that.
> Thoughts?
>
>
>> Effort: kvm
>> ---
>>  drivers/net/virtio_net.c | 30 

Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Michael S. Tsirkin
On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote:
> From: Benjamin Serebrin 
> 
> If the number of virtio queue pairs is not equal to the
> number of VCPUs, the virtio guest driver doesn't assign
> any CPU affinity for the queue interrupts or the xps
> aggregation interrupt.
> 
> Google Compute Engine currently provides 1 queue pair for
> every VCPU, but limits that at a maximum of 32 queue pairs.
> 
> This code assigns interrupt affinity even when there are more than
> 32 VCPUs.
> 
> Tested:
> 
> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
> 
> Without the fix we see all queues affinitized to all CPUs:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 0-63
> [...]
> 0-63
> 
> and we see all TX queues' xps_cpus affinitzed to no cores:
> 
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
> ,
> [...]
> ,
> 
> With the fix, we see each queue assigned to the a single core,
> and xps affinity set to 1 unique cpu per TX queue.
> 
> 64 VCPU:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 
> 0-63
> 0
> 0
> 1
> 1
> 2
> 2
> 3
> 3
> 4
> 4
> 5
> 5
> 6
> 6
> 7
> 7
> 8
> 8
> 9
> 9
> 10
> 10
> 11
> 11
> 12
> 12
> 13
> 13
> 14
> 14
> 15
> 15
> 16
> 16
> 17
> 17
> 18
> 18
> 19
> 19
> 20
> 20
> 21
> 21
> 22
> 22
> 23
> 23
> 24
> 24
> 25
> 25
> 26
> 26
> 27
> 27
> 28
> 28
> 29
> 29
> 30
> 30
> 31
> 31
> 0-63
> 0-63
> 0-63
> 0-63
> 
> cd /sys/class/net/eth0/queues
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
> 
> 0001,0001
> 0002,0002
> 0004,0004
> 0008,0008
> 0010,0010
> 0020,0020
> 0040,0040
> 0080,0080
> 0100,0100
> 0200,0200
> 0400,0400
> 0800,0800
> 1000,1000
> 2000,2000
> 4000,4000
> 8000,8000
> 0001,0001
> 0002,0002
> 0004,0004
> 0008,0008
> 0010,0010
> 0020,0020
> 0040,0040
> 0080,0080
> 0100,0100
> 0200,0200
> 0400,0400
> 0800,0800
> 1000,1000
> 2000,2000
> 4000,4000
> 8000,8000
> 
> 48 VCPU:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 0-47
> 0
> 0
> 1
> 1
> 2
> 2
> 3
> 3
> 4
> 4
> 5
> 5
> 6
> 6
> 7
> 7
> 8
> 8
> 9
> 9
> 10
> 10
> 11
> 11
> 12
> 12
> 13
> 13
> 14
> 14
> 15
> 15
> 16
> 16
> 17
> 17
> 18
> 18
> 19
> 19
> 20
> 20
> 21
> 21
> 22
> 22
> 23
> 23
> 24
> 24
> 25
> 25
> 26
> 26
> 27
> 27
> 28
> 28
> 29
> 29
> 30
> 30
> 31
> 31
> 0-47
> 0-47
> 0-47
> 0-47
> 
> cd /sys/class/net/eth0/queues
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
> 
> 0001,0001
> 0002,0002
> 0004,0004
> 0008,0008
> 0010,0010
> 0020,0020
> 0040,0040
> 0080,0080
> 0100,0100
> 0200,0200
> 0400,0400
> 0800,0800
> 1000,1000
> 2000,2000
> 4000,4000
> 8000,8000
> ,0001
> ,0002
> ,0004
> ,0008
> ,0010
> ,0020
> ,0040
> ,0080
> ,0100
> ,0200
> ,0400
> ,0800
> ,1000
> ,2000
> ,4000
> ,8000
> 
> Signed-off-by: Ben Serebrin 
> Acked-by: Willem de Bruijn 
> Acked-by: Jim Mattson 
> Acked-by: Venkatesh Srinivas 

I wonder why not just do it in userspace though.
It would be nice to mention this in the commit log.
Are we sure this distribution is best for all workloads?
While irqbalancer is hardly a perfect oracle it does
manage to balance the load somewhat, and IIUC kernel
affinities would break that.
Thoughts?


> Effort: kvm
> ---
>  drivers/net/virtio_net.c | 30 +++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 765c2d6358da..0dc3a102bfc4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info 
> *vi)
>* queue pairs, we let the queue pairs to be private to one cpu by
>* setting the affinity hint to eliminate the contention.
>*/
> - if (vi->curr_queue_pairs == 1 ||
> - vi->max_queue_pairs != num_online_cpus()) {
> + if (vi->curr_queue_pairs == 1) {
>   virtnet_clean_affinity(vi, -1);
>   return;
>   }
>  
> + /* If there are more cpus than queues, then assign the queues'
> +  * interrupts to the first cpus until we run out.
> +  */
>   i = 0;
>   for_each_online_cpu(cpu) {
> + if (i == vi->max_queue_pairs)
> + break;
>   virtqueue_set_affinity(vi->rq[i].vq, cpu);
>   virtqueue_set_affinity(vi->sq[i].vq, cpu);
> - 

[PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-02 Thread Ben Serebrin
From: Benjamin Serebrin 

If the number of virtio queue pairs is not equal to the
number of VCPUs, the virtio guest driver doesn't assign
any CPU affinity for the queue interrupts or the xps
aggregation interrupt.

Google Compute Engine currently provides 1 queue pair for
every VCPU, but limits that at a maximum of 32 queue pairs.

This code assigns interrupt affinity even when there are more than
32 VCPUs.

Tested:

(on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)

Without the fix we see all queues affinitized to all CPUs:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
0-63
[...]
0-63

and we see all TX queues' xps_cpus affinitzed to no cores:

for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
,
[...]
,

With the fix, we see each queue assigned to the a single core,
and xps affinity set to 1 unique cpu per TX queue.

64 VCPU:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done

0-63
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
0-63
0-63
0-63
0-63

cd /sys/class/net/eth0/queues
for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done

0001,0001
0002,0002
0004,0004
0008,0008
0010,0010
0020,0020
0040,0040
0080,0080
0100,0100
0200,0200
0400,0400
0800,0800
1000,1000
2000,2000
4000,4000
8000,8000
0001,0001
0002,0002
0004,0004
0008,0008
0010,0010
0020,0020
0040,0040
0080,0080
0100,0100
0200,0200
0400,0400
0800,0800
1000,1000
2000,2000
4000,4000
8000,8000

48 VCPU:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
0-47
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
0-47
0-47
0-47
0-47

cd /sys/class/net/eth0/queues
for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done

0001,0001
0002,0002
0004,0004
0008,0008
0010,0010
0020,0020
0040,0040
0080,0080
0100,0100
0200,0200
0400,0400
0800,0800
1000,1000
2000,2000
4000,4000
8000,8000
,0001
,0002
,0004
,0008
,0010
,0020
,0040
,0080
,0100
,0200
,0400
,0800
,1000
,2000
,4000
,8000

Signed-off-by: Ben Serebrin 
Acked-by: Willem de Bruijn 
Acked-by: Jim Mattson 
Acked-by: Venkatesh Srinivas 

Effort: kvm
---
 drivers/net/virtio_net.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 765c2d6358da..0dc3a102bfc4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info 
*vi)
 * queue pairs, we let the queue pairs to be private to one cpu by
 * setting the affinity hint to eliminate the contention.
 */
-   if (vi->curr_queue_pairs == 1 ||
-   vi->max_queue_pairs != num_online_cpus()) {
+   if (vi->curr_queue_pairs == 1) {
virtnet_clean_affinity(vi, -1);
return;
}
 
+   /* If there are more cpus than queues, then assign the queues'
+* interrupts to the first cpus until we run out.
+*/
i = 0;
for_each_online_cpu(cpu) {
+   if (i == vi->max_queue_pairs)
+   break;
virtqueue_set_affinity(vi->rq[i].vq, cpu);
virtqueue_set_affinity(vi->sq[i].vq, cpu);
-   netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
i++;
}
 
+   /* Stripe the XPS affinities across the online CPUs.
+* Hyperthread pairs are typically assigned such that Linux's
+* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
+* hyperthread twins to share TX queues, in the case where there are
+* more cpus than queues.
+*/
+   for (i = 0; i < vi->max_queue_pairs; i++) {
+   struct cpumask mask;
+   int skip = i;
+
+   cpumask_clear();
+   for_each_online_cpu(cpu) {
+   while (skip--)
+   cpu = cpumask_next(cpu, cpu_online_mask);
+   if (cpu < num_possible_cpus())
+   cpumask_set_cpu(cpu, );
+   skip = vi->max_queue_pairs - 1;
+   }
+