Re: [PATCH v2 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-25 Thread Łukasz Gieryk
On Wed, Nov 24, 2021 at 03:26:30PM +0100, Łukasz Gieryk wrote:
> On Wed, Nov 24, 2021 at 09:04:31AM +0100, Klaus Jensen wrote:
> > On Nov 16 16:34, Łukasz Gieryk wrote:
> > > With four new properties:
> > >  - sriov_v{i,q}_flexible,
> > >  - sriov_max_v{i,q}_per_vf,
> > > one can configure the number of available flexible resources, as well as
> > > the limits. The primary and secondary controller capability structures
> > > are initialized accordingly.
> > > 
> > > Since the number of available queues (interrupts) now varies between
> > > VF/PF, BAR size calculation is also adjusted.
> > > 
> > > Signed-off-by: Łukasz Gieryk 
> > > ---
> > >  hw/nvme/ctrl.c   | 138 ---
> > >  hw/nvme/nvme.h   |   4 ++
> > >  include/block/nvme.h |   5 ++
> > >  3 files changed, 140 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > index f8f5dfe204..f589ffde59 100644
> > > --- a/hw/nvme/ctrl.c
> > > +++ b/hw/nvme/ctrl.c
> > > @@ -6358,13 +6444,40 @@ static void nvme_init_state(NvmeCtrl *n)
> > >  n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> > >  n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
> > >  
> > > -list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
> > > -for (i = 0; i < n->params.sriov_max_vfs; i++) {
> > > +list->numcntl = cpu_to_le16(max_vfs);
> > > +for (i = 0; i < max_vfs; i++) {
> > >  sctrl = >sec[i];
> > >  sctrl->pcid = cpu_to_le16(n->cntlid);
> > >  }
> > >  
> > >  cap->cntlid = cpu_to_le16(n->cntlid);
> > > +cap->crt = NVME_CRT_VQ | NVME_CRT_VI;
> > > +
> > > +if (pci_is_vf(>parent_obj)) {
> > > +cap->vqprt = cpu_to_le16(1 + n->conf_ioqpairs);
> > > +} else {
> > > +cap->vqprt = cpu_to_le16(1 + n->params.max_ioqpairs -
> > > + n->params.sriov_vq_flexible);
> > > +cap->vqfrt = cpu_to_le32(n->params.sriov_vq_flexible);
> > > +cap->vqrfap = cap->vqfrt;
> > > +cap->vqgran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
> > > +cap->vqfrsm = n->params.sriov_max_vq_per_vf ?
> > > +cpu_to_le16(n->params.sriov_max_vq_per_vf) :
> > > +cap->vqprt;
> > 
> > That this defaults to VQPRT doesn't seem right. It should default to
> > VQFRT. Does not make sense to report a maximum number of assignable
> > flexible resources that are bigger than the number of flexible resources
> > available.
> 
> I’ve explained in on of v1 threads why I think using the current default
> is better than VQPRT.
> 
> What you’ve noticed is indeed an inconvenience, but it’s – at least in
> my opinion – part of the design. What matters is the current number of
> unassigned flexible resources. It may be lower than VQFRSM due to
> multiple reasons:
>  1) resources are bound to PF, 
>  2) resources are bound to other VFs,
>  3) resources simply don’t exist (not baked in silicone: VQFRT < VQFRSM).
> 
> If 1) and 2) are allowed to happen, and the user must be aware of that,
> then why 3) shouldn’t?
> 

I’ve done some more thinking, and now I’m not happy with my version, nor
the suggested VQPRT.

How about using this formula instead?:

v{q,i}frsm = sriov_max_v{I,q}_per_vf ? sriov_max_v{I,q}_per_vf :
 floor(sriov_v{i,q}_flexible / sriov_max_vfs)

v{q,i}frsm would end up with values similar/proportional to those
reported by and actual SR-IOV-capable device available on the market.




Re: [PATCH v2 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-24 Thread Łukasz Gieryk
On Wed, Nov 24, 2021 at 09:04:31AM +0100, Klaus Jensen wrote:
> On Nov 16 16:34, Łukasz Gieryk wrote:
> > With four new properties:
> >  - sriov_v{i,q}_flexible,
> >  - sriov_max_v{i,q}_per_vf,
> > one can configure the number of available flexible resources, as well as
> > the limits. The primary and secondary controller capability structures
> > are initialized accordingly.
> > 
> > Since the number of available queues (interrupts) now varies between
> > VF/PF, BAR size calculation is also adjusted.
> > 
> > Signed-off-by: Łukasz Gieryk 
> > ---
> >  hw/nvme/ctrl.c   | 138 ---
> >  hw/nvme/nvme.h   |   4 ++
> >  include/block/nvme.h |   5 ++
> >  3 files changed, 140 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f8f5dfe204..f589ffde59 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -6358,13 +6444,40 @@ static void nvme_init_state(NvmeCtrl *n)
> >  n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> >  n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
> >  
> > -list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
> > -for (i = 0; i < n->params.sriov_max_vfs; i++) {
> > +list->numcntl = cpu_to_le16(max_vfs);
> > +for (i = 0; i < max_vfs; i++) {
> >  sctrl = >sec[i];
> >  sctrl->pcid = cpu_to_le16(n->cntlid);
> >  }
> >  
> >  cap->cntlid = cpu_to_le16(n->cntlid);
> > +cap->crt = NVME_CRT_VQ | NVME_CRT_VI;
> > +
> > +if (pci_is_vf(>parent_obj)) {
> > +cap->vqprt = cpu_to_le16(1 + n->conf_ioqpairs);
> > +} else {
> > +cap->vqprt = cpu_to_le16(1 + n->params.max_ioqpairs -
> > + n->params.sriov_vq_flexible);
> > +cap->vqfrt = cpu_to_le32(n->params.sriov_vq_flexible);
> > +cap->vqrfap = cap->vqfrt;
> > +cap->vqgran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
> > +cap->vqfrsm = n->params.sriov_max_vq_per_vf ?
> > +cpu_to_le16(n->params.sriov_max_vq_per_vf) :
> > +cap->vqprt;
> 
> That this defaults to VQPRT doesn't seem right. It should default to
> VQFRT. Does not make sense to report a maximum number of assignable
> flexible resources that are bigger than the number of flexible resources
> available.

I’ve explained in on of v1 threads why I think using the current default
is better than VQPRT.

What you’ve noticed is indeed an inconvenience, but it’s – at least in
my opinion – part of the design. What matters is the current number of
unassigned flexible resources. It may be lower than VQFRSM due to
multiple reasons:
 1) resources are bound to PF, 
 2) resources are bound to other VFs,
 3) resources simply don’t exist (not baked in silicone: VQFRT < VQFRSM).

If 1) and 2) are allowed to happen, and the user must be aware of that,
then why 3) shouldn’t?




Re: [PATCH v2 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-24 Thread Klaus Jensen
On Nov 16 16:34, Łukasz Gieryk wrote:
> With four new properties:
>  - sriov_v{i,q}_flexible,
>  - sriov_max_v{i,q}_per_vf,
> one can configure the number of available flexible resources, as well as
> the limits. The primary and secondary controller capability structures
> are initialized accordingly.
> 
> Since the number of available queues (interrupts) now varies between
> VF/PF, BAR size calculation is also adjusted.
> 
> Signed-off-by: Łukasz Gieryk 
> ---
>  hw/nvme/ctrl.c   | 138 ---
>  hw/nvme/nvme.h   |   4 ++
>  include/block/nvme.h |   5 ++
>  3 files changed, 140 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f8f5dfe204..f589ffde59 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -6358,13 +6444,40 @@ static void nvme_init_state(NvmeCtrl *n)
>  n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
>  n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
>  
> -list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
> -for (i = 0; i < n->params.sriov_max_vfs; i++) {
> +list->numcntl = cpu_to_le16(max_vfs);
> +for (i = 0; i < max_vfs; i++) {
>  sctrl = >sec[i];
>  sctrl->pcid = cpu_to_le16(n->cntlid);
>  }
>  
>  cap->cntlid = cpu_to_le16(n->cntlid);
> +cap->crt = NVME_CRT_VQ | NVME_CRT_VI;
> +
> +if (pci_is_vf(>parent_obj)) {
> +cap->vqprt = cpu_to_le16(1 + n->conf_ioqpairs);
> +} else {
> +cap->vqprt = cpu_to_le16(1 + n->params.max_ioqpairs -
> + n->params.sriov_vq_flexible);
> +cap->vqfrt = cpu_to_le32(n->params.sriov_vq_flexible);
> +cap->vqrfap = cap->vqfrt;
> +cap->vqgran = cpu_to_le16(NVME_VF_RES_GRANULARITY);
> +cap->vqfrsm = n->params.sriov_max_vq_per_vf ?
> +cpu_to_le16(n->params.sriov_max_vq_per_vf) :
> +cap->vqprt;

That this defaults to VQPRT doesn't seem right. It should default to
VQFRT. Does not make sense to report a maximum number of assignable
flexible resources that are bigger than the number of flexible resources
available.


signature.asc
Description: PGP signature


[PATCH v2 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-16 Thread Łukasz Gieryk
With four new properties:
 - sriov_v{i,q}_flexible,
 - sriov_max_v{i,q}_per_vf,
one can configure the number of available flexible resources, as well as
the limits. The primary and secondary controller capability structures
are initialized accordingly.

Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 138 ---
 hw/nvme/nvme.h   |   4 ++
 include/block/nvme.h |   5 ++
 3 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f8f5dfe204..f589ffde59 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -36,6 +36,10 @@
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
+ *  sriov_vq_flexible= \
+ *  sriov_vi_flexible= \
+ *  sriov_max_vi_per_vf= \
+ *  sriov_max_vq_per_vf= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -113,6 +117,26 @@
  *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
  *   Virtual function controllers will not report SR-IOV capability.
  *
+ * - `sriov_vq_flexible`
+ *   Indicates the total number of flexible queue resources assignable to all
+ *   the secondary controllers. Implicitly sets the number of PF-private
+ *   resources to (max_ioqpairs - sriov_vq_flexible).
+ *
+ * - `sriov_vi_flexible`
+ *   Indicates the total number of flexible interrupt resources assignable to
+ *   all the secondary controllers. Implicitly sets the number of PF-private
+ *   resources to (msix_qsize - sriov_vi_flexible).
+ *
+ * - `sriov_max_vi_per_vf`
+ *   Indicates the maximum number of virtual interrupt resources assignable
+ *   to a secondary controller. The default 0 resolves to the number of private
+ *   interrupt resources configured for PF.
+ *
+ * - `sriov_max_vq_per_vf`
+ *   Indicates the maximum number of virtual queue resources assignable to
+ *   a secondary controller. The default 0 resolves to the number of private
+ *   queue resources configured for PF.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -185,6 +209,7 @@
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_VFS 127
+#define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
 
@@ -6338,6 +6363,58 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "PMR is not supported with SR-IOV");
 return;
 }
+
+if (!params->sriov_vq_flexible || !params->sriov_vi_flexible) {
+error_setg(errp, "both sriov_vq_flexible and sriov_vi_flexible"
+   " must be set for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_vq_flexible < params->sriov_max_vfs * 2) {
+error_setg(errp, "sriov_vq_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs * 2) and be a multiple of %d"
+   " (sriov_max_vfs)", params->sriov_max_vfs * 2,
+   params->sriov_max_vfs);
+return;
+}
+
+if (params->max_ioqpairs < params->sriov_vq_flexible + 2) {
+error_setg(errp, "sriov_vq_flexible - max_ioqpairs (PF-private"
+   " queue resources) must be greater than or equal to 2");
+return;
+}
+
+if (params->sriov_vi_flexible < params->sriov_max_vfs) {
+error_setg(errp, "sriov_vi_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs) and be a multiple of %d"
+   " (sriov_max_vfs)", params->sriov_max_vfs,
+   params->sriov_max_vfs);
+return;
+}
+
+if (params->msix_qsize < params->sriov_vi_flexible + 1) {
+error_setg(errp, "sriov_vi_flexible - msix_qsize (PF-private"
+   " interrupt resources) must be greater than or equal"
+   " to 1");
+return;
+}
+
+if (params->sriov_max_vi_per_vf &&
+(params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+error_setg(errp, "sriov_max_vi_per_vf must meet:"
+   " (X - 1) %% %d == 0 and X >= 1",
+   NVME_VF_RES_GRANULARITY);
+return;
+}
+
+if (params->sriov_max_vq_per_vf &&
+(params->sriov_max_vq_per_vf < 2 ||
+ (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY)) {
+error_setg(errp, "sriov_max_vq_per_vf must meet:"
+   " (X - 1) %% %d == 0 and X >= 2",
+   NVME_VF_RES_GRANULARITY);
+return;
+}
 }
 }
 
@@ -6346,10 +6423,19 @@ static void