Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-21 Thread Stefan Hajnoczi
On Wed, Mar 20, 2019 at 08:53:33PM +0800, Jason Wang wrote:
> 
> On 2019/3/19 上午10:22, Dongli Zhang wrote:
> > Hi Jason,
> > 
> > On 3/18/19 3:47 PM, Jason Wang wrote:
> > > On 2019/3/15 下午8:41, Cornelia Huck wrote:
> > > > On Fri, 15 Mar 2019 12:50:11 +0800
> > > > Jason Wang  wrote:
> > option 3:
> > We should allow more vectors even the block layer would support at most
> > nr_cpu_ids queues.
> > 
> > 
> > I understand a new policy for queue-vector mapping is very helpful. I am 
> > just
> > asking the question from block layer's point of view.
> > 
> > Thank you very much!
> > 
> > Dongli Zhang
> 
> 
> Don't know much for block, cc Stefan for more idea.

Thanks for CCing me.  I don't have much input at this stage.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-20 Thread Dongli Zhang


On 3/20/19 8:53 PM, Jason Wang wrote:
> 
> On 2019/3/19 上午10:22, Dongli Zhang wrote:
>> Hi Jason,
>>
>> On 3/18/19 3:47 PM, Jason Wang wrote:
>>> On 2019/3/15 下午8:41, Cornelia Huck wrote:
 On Fri, 15 Mar 2019 12:50:11 +0800
 Jason Wang  wrote:

> Or something like I proposed several years ago?
> https://do-db2.lkml.org/lkml/2014/12/25/169
>
> Btw, for virtio-net, I think we actually want to go for having a maximum
> number of supported queues like what hardware did. This would be useful
> for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current
> vector allocation doesn't support this which will results all virtqueues
> to share a single vector. We may indeed need more flexible policy here.
 I think it should be possible for the driver to give the transport
 hints how to set up their queues/interrupt structures. (The driver
 probably knows best about its requirements.) Perhaps whether a queue is
 high or low frequency, or whether it should be low latency, or even
 whether two queues could share a notification mechanism without
 drawbacks. It's up to the transport to make use of that information, if
 possible.
>>>
>>> Exactly and it was what the above series tried to do by providing hints of 
>>> e.g
>>> which queues want to share a notification.
>>>
>> I read about your patch set on providing more flexibility of queue-to-vector
>> mapping.
>>
>> One use case of the patch set is we would be able to enable more queues when
>> there is limited number of vectors.
>>
>> Another use case we may classify queues as hight priority or low priority as
>> mentioned by Cornelia.
>>
>> For virtio-blk, we may extend virtio-blk based on this patch set to enable
>> something similar to write_queues/poll_queues in nvme, when (set->nr_maps != 
>> 1).
>>
>>
>> Yet, the question I am asking in this email thread is for a difference 
>> scenario.
>>
>> The issue is not we are not having enough vectors (although this is why only 
>> 1
>> vector is allocated for all virtio-blk queues). As so far virtio-blk has
>> (set->nr_maps == 1), block layer would limit the number of hw queues by
>> nr_cpu_ids, we indeed do not need more than nr_cpu_ids hw queues in 
>> virtio-blk.
>>
>> That's why I ask why not change the flow as below options when the number of
>> supported hw queues is more than nr_cpu_ids (and set->nr_maps == 1. 
>> virtio-blk
>> does not set nr_maps and block layer would set it to 1 when the driver does 
>> not
>> specify with a value):
>>
>> option 1:
>> As what nvme and xen-netfront do, limit the hw queue number by nr_cpu_ids.
> 
> 
> How do they limit the hw queue number? A command?

The max #queue is also limited by other factors, e.g., kernel param
configuration, xen dom0 configuration or nvme hardware support.

Here we would ignore those factors for simplicity and only talk about the
relation between #queue and #cpu.


About nvme pci:

Regardless about new write_queues and poll_queues, the default queue type number
is limited by num_possible_cpus() as line 2120 and 252.

2113 static int nvme_setup_io_queues(struct nvme_dev *dev)
2114 {
2115 struct nvme_queue *adminq = >queues[0];
2116 struct pci_dev *pdev = to_pci_dev(dev->dev);
2117 int result, nr_io_queues;
2118 unsigned long size;
2119
2120 nr_io_queues = max_io_queues();
2121 result = nvme_set_queue_count(>ctrl, _io_queues);

 250 static unsigned int max_io_queues(void)
 251 {
 252 return num_possible_cpus() + write_queues + poll_queues;
 253 }

The cons of this is there might be many unused hw queues and vectors when
num_possible_cpus() is very very large while only a small number of cpu are
online. I am looking if there is way to improve this.



About xen-blkfront:

Indeed the max #queue is limited by num_online_cpus() when xen-blkfront module
is loaded as line 2733 and 2736.

2707 static int __init xlblk_init(void)
... ...
2710 int nr_cpus = num_online_cpus();
... ...
2733 if (xen_blkif_max_queues > nr_cpus) {
2734 pr_info("Invalid max_queues (%d), will use default max: 
%d.\n",
2735 xen_blkif_max_queues, nr_cpus);
2736 xen_blkif_max_queues = nr_cpus;
2737 }

The cons of this is the number of hw-queue/hctx is limited and cannot increase
after cpu hotplug. I am looking if there is way to improve this.


While both have cons for cpu hotplug, they are trying to make #vector
proportional to the number of cpu.


For xen-blkfront and virtio-blk, as (set=nr_maps == 1), the number of hw queue
is limited by nr_cpu_ids again at block layer.


As virtio-blk is a PCI device, can we use the solution in nvme, that is, to use
num_possible_cpus to limited the max queues in virtio-blk?

Thank you very much!

Dongli Zhang

> 
> 
>>
>> option 2:
>> If the vectors is not enough, use the max number vector (indeed nr_cpu_ids) 
>> as
>> number of hw queues.
> 

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-20 Thread Jason Wang


On 2019/3/19 上午10:22, Dongli Zhang wrote:

Hi Jason,

On 3/18/19 3:47 PM, Jason Wang wrote:

On 2019/3/15 下午8:41, Cornelia Huck wrote:

On Fri, 15 Mar 2019 12:50:11 +0800
Jason Wang  wrote:


Or something like I proposed several years ago?
https://do-db2.lkml.org/lkml/2014/12/25/169

Btw, for virtio-net, I think we actually want to go for having a maximum
number of supported queues like what hardware did. This would be useful
for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current
vector allocation doesn't support this which will results all virtqueues
to share a single vector. We may indeed need more flexible policy here.

I think it should be possible for the driver to give the transport
hints how to set up their queues/interrupt structures. (The driver
probably knows best about its requirements.) Perhaps whether a queue is
high or low frequency, or whether it should be low latency, or even
whether two queues could share a notification mechanism without
drawbacks. It's up to the transport to make use of that information, if
possible.


Exactly and it was what the above series tried to do by providing hints of e.g
which queues want to share a notification.


I read about your patch set on providing more flexibility of queue-to-vector
mapping.

One use case of the patch set is we would be able to enable more queues when
there is limited number of vectors.

Another use case we may classify queues as hight priority or low priority as
mentioned by Cornelia.

For virtio-blk, we may extend virtio-blk based on this patch set to enable
something similar to write_queues/poll_queues in nvme, when (set->nr_maps != 1).


Yet, the question I am asking in this email thread is for a difference scenario.

The issue is not we are not having enough vectors (although this is why only 1
vector is allocated for all virtio-blk queues). As so far virtio-blk has
(set->nr_maps == 1), block layer would limit the number of hw queues by
nr_cpu_ids, we indeed do not need more than nr_cpu_ids hw queues in virtio-blk.

That's why I ask why not change the flow as below options when the number of
supported hw queues is more than nr_cpu_ids (and set->nr_maps == 1. virtio-blk
does not set nr_maps and block layer would set it to 1 when the driver does not
specify with a value):

option 1:
As what nvme and xen-netfront do, limit the hw queue number by nr_cpu_ids.



How do they limit the hw queue number? A command?




option 2:
If the vectors is not enough, use the max number vector (indeed nr_cpu_ids) as
number of hw queues.



We can share vectors in this case.




option 3:
We should allow more vectors even the block layer would support at most
nr_cpu_ids queues.


I understand a new policy for queue-vector mapping is very helpful. I am just
asking the question from block layer's point of view.

Thank you very much!

Dongli Zhang



Don't know much for block, cc Stefan for more idea.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-18 Thread Dongli Zhang
Hi Jason,

On 3/18/19 3:47 PM, Jason Wang wrote:
> 
> On 2019/3/15 下午8:41, Cornelia Huck wrote:
>> On Fri, 15 Mar 2019 12:50:11 +0800
>> Jason Wang  wrote:
>>
>>> Or something like I proposed several years ago?
>>> https://do-db2.lkml.org/lkml/2014/12/25/169
>>>
>>> Btw, for virtio-net, I think we actually want to go for having a maximum
>>> number of supported queues like what hardware did. This would be useful
>>> for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current
>>> vector allocation doesn't support this which will results all virtqueues
>>> to share a single vector. We may indeed need more flexible policy here.
>> I think it should be possible for the driver to give the transport
>> hints how to set up their queues/interrupt structures. (The driver
>> probably knows best about its requirements.) Perhaps whether a queue is
>> high or low frequency, or whether it should be low latency, or even
>> whether two queues could share a notification mechanism without
>> drawbacks. It's up to the transport to make use of that information, if
>> possible.
> 
> 
> Exactly and it was what the above series tried to do by providing hints of e.g
> which queues want to share a notification.
> 

I read about your patch set on providing more flexibility of queue-to-vector
mapping.

One use case of the patch set is we would be able to enable more queues when
there is limited number of vectors.

Another use case we may classify queues as hight priority or low priority as
mentioned by Cornelia.

For virtio-blk, we may extend virtio-blk based on this patch set to enable
something similar to write_queues/poll_queues in nvme, when (set->nr_maps != 1).


Yet, the question I am asking in this email thread is for a difference scenario.

The issue is not we are not having enough vectors (although this is why only 1
vector is allocated for all virtio-blk queues). As so far virtio-blk has
(set->nr_maps == 1), block layer would limit the number of hw queues by
nr_cpu_ids, we indeed do not need more than nr_cpu_ids hw queues in virtio-blk.

That's why I ask why not change the flow as below options when the number of
supported hw queues is more than nr_cpu_ids (and set->nr_maps == 1. virtio-blk
does not set nr_maps and block layer would set it to 1 when the driver does not
specify with a value):

option 1:
As what nvme and xen-netfront do, limit the hw queue number by nr_cpu_ids.

option 2:
If the vectors is not enough, use the max number vector (indeed nr_cpu_ids) as
number of hw queues.

option 3:
We should allow more vectors even the block layer would support at most
nr_cpu_ids queues.


I understand a new policy for queue-vector mapping is very helpful. I am just
asking the question from block layer's point of view.

Thank you very much!

Dongli Zhang
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-18 Thread Jason Wang


On 2019/3/15 下午8:41, Cornelia Huck wrote:

On Fri, 15 Mar 2019 12:50:11 +0800
Jason Wang  wrote:


Or something like I proposed several years ago?
https://do-db2.lkml.org/lkml/2014/12/25/169

Btw, for virtio-net, I think we actually want to go for having a maximum
number of supported queues like what hardware did. This would be useful
for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current
vector allocation doesn't support this which will results all virtqueues
to share a single vector. We may indeed need more flexible policy here.

I think it should be possible for the driver to give the transport
hints how to set up their queues/interrupt structures. (The driver
probably knows best about its requirements.) Perhaps whether a queue is
high or low frequency, or whether it should be low latency, or even
whether two queues could share a notification mechanism without
drawbacks. It's up to the transport to make use of that information, if
possible.



Exactly and it was what the above series tried to do by providing hints 
of e.g which queues want to share a notification.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-15 Thread Cornelia Huck
On Fri, 15 Mar 2019 12:50:11 +0800
Jason Wang  wrote:

> Or something like I proposed several years ago? 
> https://do-db2.lkml.org/lkml/2014/12/25/169
> 
> Btw, for virtio-net, I think we actually want to go for having a maximum 
> number of supported queues like what hardware did. This would be useful 
> for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current 
> vector allocation doesn't support this which will results all virtqueues 
> to share a single vector. We may indeed need more flexible policy here.

I think it should be possible for the driver to give the transport
hints how to set up their queues/interrupt structures. (The driver
probably knows best about its requirements.) Perhaps whether a queue is
high or low frequency, or whether it should be low latency, or even
whether two queues could share a notification mechanism without
drawbacks. It's up to the transport to make use of that information, if
possible.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Jason Wang


On 2019/3/14 下午2:12, Dongli Zhang wrote:


On 3/13/19 5:39 PM, Cornelia Huck wrote:

On Wed, 13 Mar 2019 11:26:04 +0800
Dongli Zhang  wrote:


On 3/13/19 1:33 AM, Cornelia Huck wrote:

On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
Dongli Zhang  wrote:
   

I observed that there is one msix vector for config and one shared vector
for all queues in below qemu cmdline, when the num-queues for virtio-blk
is more than the number of possible cpus:

qemu: "-smp 4" while "-device 
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"

# cat /proc/interrupts
CPU0   CPU1   CPU2   CPU3
... ...
  24:  0  0  0  0   PCI-MSI 65536-edge  
virtio0-config
  25:  0  0  0 59   PCI-MSI 65537-edge  
virtio0-virtqueues
... ...


However, when num-queues is the same as number of possible cpus:

qemu: "-smp 4" while "-device 
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"

# cat /proc/interrupts
CPU0   CPU1   CPU2   CPU3
... ...
  24:  0  0  0  0   PCI-MSI 65536-edge  
virtio0-config
  25:  2  0  0  0   PCI-MSI 65537-edge  
virtio0-req.0
  26:  0 35  0  0   PCI-MSI 65538-edge  
virtio0-req.1
  27:  0  0 32  0   PCI-MSI 65539-edge  
virtio0-req.2
  28:  0  0  0  0   PCI-MSI 65540-edge  
virtio0-req.3
... ...

In above case, there is one msix vector per queue.

Please note that this is pci-specific...
   


This is because the max number of queues is not limited by the number of
possible cpus.

By default, nvme (regardless about write_queues and poll_queues) and
xen-blkfront limit the number of queues with num_possible_cpus().

...and these are probably pci-specific as well.

Not pci-specific, but per-cpu as well.

Ah, I meant that those are pci devices.

   


Is this by design on purpose, or can we fix with below?


diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4bc083b..df95ce3 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
if (err)
num_vqs = 1;
  
+	num_vqs = min(num_possible_cpus(), num_vqs);

+
vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
if (!vblk->vqs)
return -ENOMEM;

virtio-blk, however, is not pci-specific.

If we are using the ccw transport on s390, a completely different
interrupt mechanism is in use ('floating' interrupts, which are not
per-cpu). A check like that should therefore not go into the generic
driver.
   

So far there seems two options.

The 1st option is to ask the qemu user to always specify "-num-queues" with the
same number of vcpus when running x86 guest with pci for virtio-blk or
virtio-scsi, in order to assign a vector for each queue.

That does seem like an extra burden for the user: IIUC, things work
even if you have too many queues, it's just not optimal. It sounds like
something that can be done by a management layer (e.g. libvirt), though.


Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops'
so that different platforms (e.g., pci or ccw) would use different ways to limit
the max number of queues in guest, with something like below?

That sounds better, as both transports and drivers can opt-in here.

However, maybe it would be even better to try to come up with a better
strategy of allocating msix vectors in virtio-pci. More vectors in the
num_queues > num_cpus case, even if they still need to be shared?
Individual vectors for n-1 cpus and then a shared one for the remaining
queues?

It might even be device-specific: Have some low-traffic status queues
share a vector, and provide an individual vector for high-traffic
queues. Would need some device<->transport interface, obviously.


This sounds a little bit similar to multiple hctx maps?

So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
nr_cpu_ids hw queues.

2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
... ...
3021 /*
3022  * There is no use for more h/w queues than cpus if we just have
3023  * a single map
3024  */
3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
3026 set->nr_hw_queues = nr_cpu_ids;

Even the block layer would limit the number of hw queues by nr_cpu_ids when
(set->nr_maps == 1).

That's why I think virtio-blk should use the similar solution as nvme
(regardless about write_queues and poll_queues) and xen-blkfront.

Added Jason again. I do not know why the mailing list of
virtualization@lists.linux-foundation.org always filters out Jason's email...


Dongli Zhang



Or something like I proposed several years ago? 

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Dongli Zhang



On 03/14/2019 08:13 PM, Cornelia Huck wrote:
> On Thu, 14 Mar 2019 14:12:32 +0800
> Dongli Zhang  wrote:
> 
>> On 3/13/19 5:39 PM, Cornelia Huck wrote:
>>> On Wed, 13 Mar 2019 11:26:04 +0800
>>> Dongli Zhang  wrote:
>>>   
 On 3/13/19 1:33 AM, Cornelia Huck wrote:  
> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> Dongli Zhang  wrote:
> 
>> Is this by design on purpose, or can we fix with below?
>>
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 4bc083b..df95ce3 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>  if (err)
>>  num_vqs = 1;
>>  
>> +num_vqs = min(num_possible_cpus(), num_vqs);
>> +
>>  vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), 
>> GFP_KERNEL);
>>  if (!vblk->vqs)
>>  return -ENOMEM;
>
> virtio-blk, however, is not pci-specific.
>
> If we are using the ccw transport on s390, a completely different
> interrupt mechanism is in use ('floating' interrupts, which are not
> per-cpu). A check like that should therefore not go into the generic
> driver.
> 

 So far there seems two options.

 The 1st option is to ask the qemu user to always specify "-num-queues" 
 with the
 same number of vcpus when running x86 guest with pci for virtio-blk or
 virtio-scsi, in order to assign a vector for each queue.  
>>>
>>> That does seem like an extra burden for the user: IIUC, things work
>>> even if you have too many queues, it's just not optimal. It sounds like
>>> something that can be done by a management layer (e.g. libvirt), though.
>>>   
 Or, is it fine for virtio folks to add a new hook to 'struct 
 virtio_config_ops'
 so that different platforms (e.g., pci or ccw) would use different ways to 
 limit
 the max number of queues in guest, with something like below?  
>>>
>>> That sounds better, as both transports and drivers can opt-in here.
>>>
>>> However, maybe it would be even better to try to come up with a better
>>> strategy of allocating msix vectors in virtio-pci. More vectors in the
>>> num_queues > num_cpus case, even if they still need to be shared?
>>> Individual vectors for n-1 cpus and then a shared one for the remaining
>>> queues?
>>>
>>> It might even be device-specific: Have some low-traffic status queues
>>> share a vector, and provide an individual vector for high-traffic
>>> queues. Would need some device<->transport interface, obviously.
>>>   
>>
>> This sounds a little bit similar to multiple hctx maps?
>>
>> So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
>> queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
>> nr_cpu_ids hw queues.
>>
>> 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>> ... ...
>> 3021 /*
>> 3022  * There is no use for more h/w queues than cpus if we just have
>> 3023  * a single map
>> 3024  */
>> 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
>> 3026 set->nr_hw_queues = nr_cpu_ids;
>>
>> Even the block layer would limit the number of hw queues by nr_cpu_ids when
>> (set->nr_maps == 1).
> 
> Correct me if I'm wrong, but there seem to be two kinds of limitations
> involved here:
> - Allocation of msix vectors by the virtio-pci transport. We end up
>   with shared vectors if we have more virtqueues than vcpus. Other
>   transports may or may not have similar issues, but essentially, this
>   is something that applies to all kind of virtio devices attached via
>   the virtio-pci transport.

It depends.

For virtio-net, we need to specify the number of available vectors on qemu side,
e.g.,:

-device virtio-net-pci,netdev=tapnet,mq=true,vectors=16

This parameter is specific for virtio-net.

Suppose if 'queues=8' while 'vectors=16', as 2*8+1 > 16, there will be lack of
vectors and guest would not be able to assign one vector for each queue.

I was tortured by this long time ago and it seems qemu would minimize the memory
allocation and the default 'vectors' is 3.

BTW, why cannot we have a more consistent configuration for most qemu devices,
e.g., so far:

virtio-blk use 'num-queues'
nvme use 'num_queues'
virtio-net use 'queues' for tap :)

> - The block layer limits the number of hw queues to the number of
>   vcpus. This applies only to virtio devices that interact with the
>   block layer, but regardless of the virtio transport.

Yes: virtio-blk and virtio-scsi.

> 
>> That's why I think virtio-blk should use the similar solution as nvme
>> (regardless about write_queues and poll_queues) and xen-blkfront.
> 
> Ok, the hw queues limit from above would be an argument to limit to
> #vcpus in the virtio-blk driver, regardless of the transport used. (No
> idea if 

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Dongli Zhang



On 03/14/2019 08:32 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 12, 2019 at 10:22:46AM -0700, Dongli Zhang wrote:
>> I observed that there is one msix vector for config and one shared vector
>> for all queues in below qemu cmdline, when the num-queues for virtio-blk
>> is more than the number of possible cpus:
>>
>> qemu: "-smp 4" while "-device 
>> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
> 
> So why do this?

I observed this when I was testing virtio-blk and block layer.

I just assign a very large 'num-queues' to virtio-blk and keep changing the
number of vcpu in order to study blk-mq.

The num-queues for nvme (qemu) is by default 64, while it is 1 for virtio-blk.

> 
>> # cat /proc/interrupts 
>>CPU0   CPU1   CPU2   CPU3
>> ... ...
>>  24:  0  0  0  0   PCI-MSI 65536-edge  
>> virtio0-config
>>  25:  0  0  0 59   PCI-MSI 65537-edge  
>> virtio0-virtqueues
>> ... ...
>>
>>
>> However, when num-queues is the same as number of possible cpus:
>>
>> qemu: "-smp 4" while "-device 
>> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
>>
>> # cat /proc/interrupts 
>>CPU0   CPU1   CPU2   CPU3
>> ... ... 
>>  24:  0  0  0  0   PCI-MSI 65536-edge  
>> virtio0-config
>>  25:  2  0  0  0   PCI-MSI 65537-edge  
>> virtio0-req.0
>>  26:  0 35  0  0   PCI-MSI 65538-edge  
>> virtio0-req.1
>>  27:  0  0 32  0   PCI-MSI 65539-edge  
>> virtio0-req.2
>>  28:  0  0  0  0   PCI-MSI 65540-edge  
>> virtio0-req.3
>> ... ...
>>
>> In above case, there is one msix vector per queue.
>>
>>
>> This is because the max number of queues is not limited by the number of
>> possible cpus.
>>
>> By default, nvme (regardless about write_queues and poll_queues) and
>> xen-blkfront limit the number of queues with num_possible_cpus().
>>
>>
>> Is this by design on purpose, or can we fix with below?
>>
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 4bc083b..df95ce3 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>  if (err)
>>  num_vqs = 1;
>>  
>> +num_vqs = min(num_possible_cpus(), num_vqs);
>> +
>>  vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>>  if (!vblk->vqs)
>>  return -ENOMEM;
>> --
>>
>>
>> PS: The same issue is applicable to virtio-scsi as well.
>>
>> Thank you very much!
>>
>> Dongli Zhang
> 
> I don't think this will address the issue if there's vcpu hotplug though.
> Because it's not about num_possible_cpus it's about the # of active VCPUs,
> right? Does block hangle CPU hotplug generally?
> We could maybe address that by switching vq to msi vector mapping in
> a cpu hotplug notifier...
> 

It looks it is about num_possible_cpus/nr_cpu_ids for cpu hotplug.


For instance, below is when only 2 out of 6 cpus are initialized while
virtio-blk has 6 queues.

"-smp 2,maxcpus=6" and "-device
virtio-blk-pci,drive=drive0,id=disk0,num-queues=6,iothread=io1"

# cat /sys/devices/system/cpu/present
0-1
# cat /sys/devices/system/cpu/possible
0-5
# cat /proc/interrupts  | grep virtio
 24:  0  0   PCI-MSI 65536-edge  virtio0-config
 25:   1864  0   PCI-MSI 65537-edge  virtio0-req.0
 26:  0   1069   PCI-MSI 65538-edge  virtio0-req.1
 27:  0  0   PCI-MSI 65539-edge  virtio0-req.2
 28:  0  0   PCI-MSI 65540-edge  virtio0-req.3
 29:  0  0   PCI-MSI 65541-edge  virtio0-req.4
 30:  0  0   PCI-MSI 65542-edge  virtio0-req.5

6 + 1 irqs are assigned even 4 out of 6 cpus are still offline.


Below is about the nvme emulated by qemu. While 2 out of 6 cpus are initial
assigned, nvme has 64 queues by default.

"-smp 2,maxcpus=6" and "-device nvme,drive=drive1,serial=deadbeaf1"

# cat /sys/devices/system/cpu/present
0-1
# cat /sys/devices/system/cpu/possible
0-5
# cat /proc/interrupts | grep nvme
 31:  0 16   PCI-MSI 81920-edge  nvme0q0
 32: 35  0   PCI-MSI 81921-edge  nvme0q1
 33:  0 42   PCI-MSI 81922-edge  nvme0q2
 34:  0  0   PCI-MSI 81923-edge  nvme0q3
 35:  0  0   PCI-MSI 81924-edge  nvme0q4
 36:  0  0   PCI-MSI 81925-edge  nvme0q5
 37:  0  0   PCI-MSI 81926-edge  nvme0q6

6 io queues are assigned with irq, although only 2 cpus are online.


When only 2 out of 48 cpus are online, there are 48 hctx created by block layer.

"-smp 2,maxcpus=48" and "-device
virtio-blk-pci,drive=drive0,id=disk0,num-queues=48,iothread=io1"

# ls /sys/kernel/debug/block/vda/ | grep hctx | wc -l

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Michael S. Tsirkin
On Tue, Mar 12, 2019 at 10:22:46AM -0700, Dongli Zhang wrote:
> I observed that there is one msix vector for config and one shared vector
> for all queues in below qemu cmdline, when the num-queues for virtio-blk
> is more than the number of possible cpus:
> 
> qemu: "-smp 4" while "-device 
> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"

So why do this?

> # cat /proc/interrupts 
>CPU0   CPU1   CPU2   CPU3
> ... ...
>  24:  0  0  0  0   PCI-MSI 65536-edge  
> virtio0-config
>  25:  0  0  0 59   PCI-MSI 65537-edge  
> virtio0-virtqueues
> ... ...
> 
> 
> However, when num-queues is the same as number of possible cpus:
> 
> qemu: "-smp 4" while "-device 
> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
> 
> # cat /proc/interrupts 
>CPU0   CPU1   CPU2   CPU3
> ... ... 
>  24:  0  0  0  0   PCI-MSI 65536-edge  
> virtio0-config
>  25:  2  0  0  0   PCI-MSI 65537-edge  
> virtio0-req.0
>  26:  0 35  0  0   PCI-MSI 65538-edge  
> virtio0-req.1
>  27:  0  0 32  0   PCI-MSI 65539-edge  
> virtio0-req.2
>  28:  0  0  0  0   PCI-MSI 65540-edge  
> virtio0-req.3
> ... ...
> 
> In above case, there is one msix vector per queue.
> 
> 
> This is because the max number of queues is not limited by the number of
> possible cpus.
> 
> By default, nvme (regardless about write_queues and poll_queues) and
> xen-blkfront limit the number of queues with num_possible_cpus().
> 
> 
> Is this by design on purpose, or can we fix with below?
> 
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4bc083b..df95ce3 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>   if (err)
>   num_vqs = 1;
>  
> + num_vqs = min(num_possible_cpus(), num_vqs);
> +
>   vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>   if (!vblk->vqs)
>   return -ENOMEM;
> --
> 
> 
> PS: The same issue is applicable to virtio-scsi as well.
> 
> Thank you very much!
> 
> Dongli Zhang

I don't think this will address the issue if there's vcpu hotplug though.
Because it's not about num_possible_cpus it's about the # of active VCPUs,
right? Does block hangle CPU hotplug generally?
We could maybe address that by switching vq to msi vector mapping in
a cpu hotplug notifier...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Cornelia Huck
On Thu, 14 Mar 2019 14:12:32 +0800
Dongli Zhang  wrote:

> On 3/13/19 5:39 PM, Cornelia Huck wrote:
> > On Wed, 13 Mar 2019 11:26:04 +0800
> > Dongli Zhang  wrote:
> >   
> >> On 3/13/19 1:33 AM, Cornelia Huck wrote:  
> >>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> >>> Dongli Zhang  wrote:

>  Is this by design on purpose, or can we fix with below?
> 
> 
>  diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>  index 4bc083b..df95ce3 100644
>  --- a/drivers/block/virtio_blk.c
>  +++ b/drivers/block/virtio_blk.c
>  @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>   if (err)
>   num_vqs = 1;
>   
>  +num_vqs = min(num_possible_cpus(), num_vqs);
>  +
>   vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), 
>  GFP_KERNEL);
>   if (!vblk->vqs)
>   return -ENOMEM;
> >>>
> >>> virtio-blk, however, is not pci-specific.
> >>>
> >>> If we are using the ccw transport on s390, a completely different
> >>> interrupt mechanism is in use ('floating' interrupts, which are not
> >>> per-cpu). A check like that should therefore not go into the generic
> >>> driver.
> >>> 
> >>
> >> So far there seems two options.
> >>
> >> The 1st option is to ask the qemu user to always specify "-num-queues" 
> >> with the
> >> same number of vcpus when running x86 guest with pci for virtio-blk or
> >> virtio-scsi, in order to assign a vector for each queue.  
> > 
> > That does seem like an extra burden for the user: IIUC, things work
> > even if you have too many queues, it's just not optimal. It sounds like
> > something that can be done by a management layer (e.g. libvirt), though.
> >   
> >> Or, is it fine for virtio folks to add a new hook to 'struct 
> >> virtio_config_ops'
> >> so that different platforms (e.g., pci or ccw) would use different ways to 
> >> limit
> >> the max number of queues in guest, with something like below?  
> > 
> > That sounds better, as both transports and drivers can opt-in here.
> > 
> > However, maybe it would be even better to try to come up with a better
> > strategy of allocating msix vectors in virtio-pci. More vectors in the
> > num_queues > num_cpus case, even if they still need to be shared?
> > Individual vectors for n-1 cpus and then a shared one for the remaining
> > queues?
> > 
> > It might even be device-specific: Have some low-traffic status queues
> > share a vector, and provide an individual vector for high-traffic
> > queues. Would need some device<->transport interface, obviously.
> >   
> 
> This sounds a little bit similar to multiple hctx maps?
> 
> So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
> queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
> nr_cpu_ids hw queues.
> 
> 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> ... ...
> 3021 /*
> 3022  * There is no use for more h/w queues than cpus if we just have
> 3023  * a single map
> 3024  */
> 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
> 3026 set->nr_hw_queues = nr_cpu_ids;
> 
> Even the block layer would limit the number of hw queues by nr_cpu_ids when
> (set->nr_maps == 1).

Correct me if I'm wrong, but there seem to be two kinds of limitations
involved here:
- Allocation of msix vectors by the virtio-pci transport. We end up
  with shared vectors if we have more virtqueues than vcpus. Other
  transports may or may not have similar issues, but essentially, this
  is something that applies to all kind of virtio devices attached via
  the virtio-pci transport.
- The block layer limits the number of hw queues to the number of
  vcpus. This applies only to virtio devices that interact with the
  block layer, but regardless of the virtio transport.

> That's why I think virtio-blk should use the similar solution as nvme
> (regardless about write_queues and poll_queues) and xen-blkfront.

Ok, the hw queues limit from above would be an argument to limit to
#vcpus in the virtio-blk driver, regardless of the transport used. (No
idea if there are better ways to deal with this, I'm not familiar with
the interface.)

For virtio devices that don't interact with the block layer and are
attached via the virtio-pci transport, it might still make sense to
revisit vector allocation.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-14 Thread Dongli Zhang



On 3/13/19 5:39 PM, Cornelia Huck wrote:
> On Wed, 13 Mar 2019 11:26:04 +0800
> Dongli Zhang  wrote:
> 
>> On 3/13/19 1:33 AM, Cornelia Huck wrote:
>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
>>> Dongli Zhang  wrote:
>>>   
 I observed that there is one msix vector for config and one shared vector
 for all queues in below qemu cmdline, when the num-queues for virtio-blk
 is more than the number of possible cpus:

 qemu: "-smp 4" while "-device 
 virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"

 # cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3
 ... ...
  24:  0  0  0  0   PCI-MSI 65536-edge  
 virtio0-config
  25:  0  0  0 59   PCI-MSI 65537-edge  
 virtio0-virtqueues
 ... ...


 However, when num-queues is the same as number of possible cpus:

 qemu: "-smp 4" while "-device 
 virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"

 # cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3
 ... ... 
  24:  0  0  0  0   PCI-MSI 65536-edge  
 virtio0-config
  25:  2  0  0  0   PCI-MSI 65537-edge  
 virtio0-req.0
  26:  0 35  0  0   PCI-MSI 65538-edge  
 virtio0-req.1
  27:  0  0 32  0   PCI-MSI 65539-edge  
 virtio0-req.2
  28:  0  0  0  0   PCI-MSI 65540-edge  
 virtio0-req.3
 ... ...

 In above case, there is one msix vector per queue.  
>>>
>>> Please note that this is pci-specific...
>>>   


 This is because the max number of queues is not limited by the number of
 possible cpus.

 By default, nvme (regardless about write_queues and poll_queues) and
 xen-blkfront limit the number of queues with num_possible_cpus().  
>>>
>>> ...and these are probably pci-specific as well.  
>>
>> Not pci-specific, but per-cpu as well.
> 
> Ah, I meant that those are pci devices.
> 
>>
>>>   


 Is this by design on purpose, or can we fix with below?


 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 4bc083b..df95ce3 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
if (err)
num_vqs = 1;
  
 +  num_vqs = min(num_possible_cpus(), num_vqs);
 +
vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
if (!vblk->vqs)
return -ENOMEM;  
>>>
>>> virtio-blk, however, is not pci-specific.
>>>
>>> If we are using the ccw transport on s390, a completely different
>>> interrupt mechanism is in use ('floating' interrupts, which are not
>>> per-cpu). A check like that should therefore not go into the generic
>>> driver.
>>>   
>>
>> So far there seems two options.
>>
>> The 1st option is to ask the qemu user to always specify "-num-queues" with 
>> the
>> same number of vcpus when running x86 guest with pci for virtio-blk or
>> virtio-scsi, in order to assign a vector for each queue.
> 
> That does seem like an extra burden for the user: IIUC, things work
> even if you have too many queues, it's just not optimal. It sounds like
> something that can be done by a management layer (e.g. libvirt), though.
> 
>> Or, is it fine for virtio folks to add a new hook to 'struct 
>> virtio_config_ops'
>> so that different platforms (e.g., pci or ccw) would use different ways to 
>> limit
>> the max number of queues in guest, with something like below?
> 
> That sounds better, as both transports and drivers can opt-in here.
> 
> However, maybe it would be even better to try to come up with a better
> strategy of allocating msix vectors in virtio-pci. More vectors in the
> num_queues > num_cpus case, even if they still need to be shared?
> Individual vectors for n-1 cpus and then a shared one for the remaining
> queues?
> 
> It might even be device-specific: Have some low-traffic status queues
> share a vector, and provide an individual vector for high-traffic
> queues. Would need some device<->transport interface, obviously.
> 

This sounds a little bit similar to multiple hctx maps?

So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
nr_cpu_ids hw queues.

2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
... ...
3021 /*
3022  * There is no use for more h/w queues than cpus if we just have
3023  * a single map
3024  */
3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
3026 set->nr_hw_queues = nr_cpu_ids;

Even the block layer would limit the number of hw queues by 

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-13 Thread Cornelia Huck
On Wed, 13 Mar 2019 11:26:04 +0800
Dongli Zhang  wrote:

> On 3/13/19 1:33 AM, Cornelia Huck wrote:
> > On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> > Dongli Zhang  wrote:
> >   
> >> I observed that there is one msix vector for config and one shared vector
> >> for all queues in below qemu cmdline, when the num-queues for virtio-blk
> >> is more than the number of possible cpus:
> >>
> >> qemu: "-smp 4" while "-device 
> >> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
> >>
> >> # cat /proc/interrupts 
> >>CPU0   CPU1   CPU2   CPU3
> >> ... ...
> >>  24:  0  0  0  0   PCI-MSI 65536-edge  
> >> virtio0-config
> >>  25:  0  0  0 59   PCI-MSI 65537-edge  
> >> virtio0-virtqueues
> >> ... ...
> >>
> >>
> >> However, when num-queues is the same as number of possible cpus:
> >>
> >> qemu: "-smp 4" while "-device 
> >> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
> >>
> >> # cat /proc/interrupts 
> >>CPU0   CPU1   CPU2   CPU3
> >> ... ... 
> >>  24:  0  0  0  0   PCI-MSI 65536-edge  
> >> virtio0-config
> >>  25:  2  0  0  0   PCI-MSI 65537-edge  
> >> virtio0-req.0
> >>  26:  0 35  0  0   PCI-MSI 65538-edge  
> >> virtio0-req.1
> >>  27:  0  0 32  0   PCI-MSI 65539-edge  
> >> virtio0-req.2
> >>  28:  0  0  0  0   PCI-MSI 65540-edge  
> >> virtio0-req.3
> >> ... ...
> >>
> >> In above case, there is one msix vector per queue.  
> > 
> > Please note that this is pci-specific...
> >   
> >>
> >>
> >> This is because the max number of queues is not limited by the number of
> >> possible cpus.
> >>
> >> By default, nvme (regardless about write_queues and poll_queues) and
> >> xen-blkfront limit the number of queues with num_possible_cpus().  
> > 
> > ...and these are probably pci-specific as well.  
> 
> Not pci-specific, but per-cpu as well.

Ah, I meant that those are pci devices.

> 
> >   
> >>
> >>
> >> Is this by design on purpose, or can we fix with below?
> >>
> >>
> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >> index 4bc083b..df95ce3 100644
> >> --- a/drivers/block/virtio_blk.c
> >> +++ b/drivers/block/virtio_blk.c
> >> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
> >>if (err)
> >>num_vqs = 1;
> >>  
> >> +  num_vqs = min(num_possible_cpus(), num_vqs);
> >> +
> >>vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
> >>if (!vblk->vqs)
> >>return -ENOMEM;  
> > 
> > virtio-blk, however, is not pci-specific.
> > 
> > If we are using the ccw transport on s390, a completely different
> > interrupt mechanism is in use ('floating' interrupts, which are not
> > per-cpu). A check like that should therefore not go into the generic
> > driver.
> >   
> 
> So far there seems two options.
> 
> The 1st option is to ask the qemu user to always specify "-num-queues" with 
> the
> same number of vcpus when running x86 guest with pci for virtio-blk or
> virtio-scsi, in order to assign a vector for each queue.

That does seem like an extra burden for the user: IIUC, things work
even if you have too many queues, it's just not optimal. It sounds like
something that can be done by a management layer (e.g. libvirt), though.

> Or, is it fine for virtio folks to add a new hook to 'struct 
> virtio_config_ops'
> so that different platforms (e.g., pci or ccw) would use different ways to 
> limit
> the max number of queues in guest, with something like below?

That sounds better, as both transports and drivers can opt-in here.

However, maybe it would be even better to try to come up with a better
strategy of allocating msix vectors in virtio-pci. More vectors in the
num_queues > num_cpus case, even if they still need to be shared?
Individual vectors for n-1 cpus and then a shared one for the remaining
queues?

It might even be device-specific: Have some low-traffic status queues
share a vector, and provide an individual vector for high-traffic
queues. Would need some device<->transport interface, obviously.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-12 Thread Dongli Zhang



On 3/13/19 1:33 AM, Cornelia Huck wrote:
> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> Dongli Zhang  wrote:
> 
>> I observed that there is one msix vector for config and one shared vector
>> for all queues in below qemu cmdline, when the num-queues for virtio-blk
>> is more than the number of possible cpus:
>>
>> qemu: "-smp 4" while "-device 
>> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
>>
>> # cat /proc/interrupts 
>>CPU0   CPU1   CPU2   CPU3
>> ... ...
>>  24:  0  0  0  0   PCI-MSI 65536-edge  
>> virtio0-config
>>  25:  0  0  0 59   PCI-MSI 65537-edge  
>> virtio0-virtqueues
>> ... ...
>>
>>
>> However, when num-queues is the same as number of possible cpus:
>>
>> qemu: "-smp 4" while "-device 
>> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
>>
>> # cat /proc/interrupts 
>>CPU0   CPU1   CPU2   CPU3
>> ... ... 
>>  24:  0  0  0  0   PCI-MSI 65536-edge  
>> virtio0-config
>>  25:  2  0  0  0   PCI-MSI 65537-edge  
>> virtio0-req.0
>>  26:  0 35  0  0   PCI-MSI 65538-edge  
>> virtio0-req.1
>>  27:  0  0 32  0   PCI-MSI 65539-edge  
>> virtio0-req.2
>>  28:  0  0  0  0   PCI-MSI 65540-edge  
>> virtio0-req.3
>> ... ...
>>
>> In above case, there is one msix vector per queue.
> 
> Please note that this is pci-specific...
> 
>>
>>
>> This is because the max number of queues is not limited by the number of
>> possible cpus.
>>
>> By default, nvme (regardless about write_queues and poll_queues) and
>> xen-blkfront limit the number of queues with num_possible_cpus().
> 
> ...and these are probably pci-specific as well.

Not pci-specific, but per-cpu as well.

> 
>>
>>
>> Is this by design on purpose, or can we fix with below?
>>
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 4bc083b..df95ce3 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>  if (err)
>>  num_vqs = 1;
>>  
>> +num_vqs = min(num_possible_cpus(), num_vqs);
>> +
>>  vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>>  if (!vblk->vqs)
>>  return -ENOMEM;
> 
> virtio-blk, however, is not pci-specific.
> 
> If we are using the ccw transport on s390, a completely different
> interrupt mechanism is in use ('floating' interrupts, which are not
> per-cpu). A check like that should therefore not go into the generic
> driver.
> 

So far there seems two options.

The 1st option is to ask the qemu user to always specify "-num-queues" with the
same number of vcpus when running x86 guest with pci for virtio-blk or
virtio-scsi, in order to assign a vector for each queue.

Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops'
so that different platforms (e.g., pci or ccw) would use different ways to limit
the max number of queues in guest, with something like below?

So far both virtio-scsi and virtio-blk would benefit from the new hook.

---
 drivers/block/virtio_blk.c |  2 ++
 drivers/virtio/virtio_pci_common.c |  6 ++
 drivers/virtio/virtio_pci_common.h |  2 ++
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 include/linux/virtio_config.h  | 11 +++
 6 files changed, 24 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4bc083b..93cfeda 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
if (err)
num_vqs = 1;

+   num_vqs = virtio_calc_num_vqs(vdev, num_vqs);
+
vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
if (!vblk->vqs)
return -ENOMEM;
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index d0584c0..ce021d1 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -409,6 +409,12 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
 }

+/* the config->calc_num_vqs() implementation */
+unsigned short vp_calc_num_vqs(unsigned short num_vqs)
+{
+   return min_t(unsigned short, num_possible_cpus(), num_vqs);
+}
+
 const char *vp_bus_name(struct virtio_device *vdev)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 0227100..cc5ac80 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -134,6 +134,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-12 Thread Cornelia Huck
On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
Dongli Zhang  wrote:

> I observed that there is one msix vector for config and one shared vector
> for all queues in below qemu cmdline, when the num-queues for virtio-blk
> is more than the number of possible cpus:
> 
> qemu: "-smp 4" while "-device 
> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
> 
> # cat /proc/interrupts 
>CPU0   CPU1   CPU2   CPU3
> ... ...
>  24:  0  0  0  0   PCI-MSI 65536-edge  
> virtio0-config
>  25:  0  0  0 59   PCI-MSI 65537-edge  
> virtio0-virtqueues
> ... ...
> 
> 
> However, when num-queues is the same as number of possible cpus:
> 
> qemu: "-smp 4" while "-device 
> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
> 
> # cat /proc/interrupts 
>CPU0   CPU1   CPU2   CPU3
> ... ... 
>  24:  0  0  0  0   PCI-MSI 65536-edge  
> virtio0-config
>  25:  2  0  0  0   PCI-MSI 65537-edge  
> virtio0-req.0
>  26:  0 35  0  0   PCI-MSI 65538-edge  
> virtio0-req.1
>  27:  0  0 32  0   PCI-MSI 65539-edge  
> virtio0-req.2
>  28:  0  0  0  0   PCI-MSI 65540-edge  
> virtio0-req.3
> ... ...
> 
> In above case, there is one msix vector per queue.

Please note that this is pci-specific...

> 
> 
> This is because the max number of queues is not limited by the number of
> possible cpus.
> 
> By default, nvme (regardless about write_queues and poll_queues) and
> xen-blkfront limit the number of queues with num_possible_cpus().

...and these are probably pci-specific as well.

> 
> 
> Is this by design on purpose, or can we fix with below?
> 
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4bc083b..df95ce3 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>   if (err)
>   num_vqs = 1;
>  
> + num_vqs = min(num_possible_cpus(), num_vqs);
> +
>   vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>   if (!vblk->vqs)
>   return -ENOMEM;

virtio-blk, however, is not pci-specific.

If we are using the ccw transport on s390, a completely different
interrupt mechanism is in use ('floating' interrupts, which are not
per-cpu). A check like that should therefore not go into the generic
driver.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization